Skip to main content

 Advanced

Advanced topics

Word DOCX - import schema <split> - deprecated

<split>

This element is deprecated as of PageSeeder v5.98. Use PSML split config instead.

Is a container for the elements that determines how each imported document is stored. Options include:

  • Keep each Word file as a single PageSeeder document.
  • Split the source document into a linked collection of references and component documents.
  • Give the imported PSML documents a @type of default.
  • Give the imported PSML documents a custom type.
<xs:element  name="split">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="main"      minOccurs="0" maxOccurs="1" />
      <xs:element ref="document"  minOccurs="0" maxOccurs="1" />
      <xs:element ref="section"   minOccurs="0" maxOccurs="1" />
      <xs:element ref="mathml"    minOccurs="0" maxOccurs="1" />
      <xs:element ref="footnotes" minOccurs="0" maxOccurs="1" />
      <xs:element ref="endnotes"  minOccurs="0" maxOccurs="1" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

<main>

Determines the nature of the main PSML file generated by the import process.  

type a document type for the primary document: the reference document (default is references).

label whether labels are attached to the primary document: the reference document.

<xs:element  name="main">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="type"
                  minOccurs="0"
                  maxOccurs="1"
                  type="xs:string" />
      <xs:element name="label"
                  minOccurs="0"
                  maxOccurs="1"
                  type="xs:string" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

Example

<main>
  <type>legislation</type>
  <label>production,test</label>
</main>

<mathml>

Controls the processing of any MathML objects in the Word file.

select use a value of true or false to determine if MathML content is processed or ignored.

convert-to-mml use a value of true or false to determine whether MathML objects are converted to the original math ml (mml) syntax or left as OfficeOpenXML syntax (always true for generate-fragments option).

output use a value of generate-files or generate-fragments to determine whether each MathML object is placed in a separate file, under a mathml folder or in a fragment inside it’s own document with the path mathml/mathml-[n].psml (requires pso-docx version 0.7.8 or higher).

<xs:element name="mathml">
  <xs:complexType>
    <xs:attribute  name="select"         type="xs:boolean" />
    <xs:attribute  name="convert-to-mml" type="xs:boolean" />
    <xs:attribute  name="output">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="generate-files" />
          <xs:enumeration value="generate-fragments" />
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Example

<mathml select="true" output="generate-files" convert-to-mml="true"/>

See use of:

mathml-generate-files 

<footnotes>

Controls the processing of Word footnote markers.

select use a value of true or false to determine if Word footnotes are processed or ignored.

output use a value of generate-files or generate-fragments to determine whether each footnote object is placed in a separate file, under a footnotes folder or in a fragment in a footnotes/footnotes.psml file.

<xs:element name="footnotes">
  <xs:complexType>
    <xs:attribute name="select" type="xs:boolean" />
    <xs:attribute name="output">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="generate-files" />
          <xs:enumeration value="generate-fragments" />
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Example

<footnotes select="true" output="generate-files"/>

<endnotes>

Controls the processing of Word endnote markers.

select use a value of true or false to determine if Word endnotes are processed or ignored

output use a value of generate-files or generate-fragments to determine whether each endnote object is placed in a separate file, under a endnotes folder or in a fragment in a endnotes/endnotes.psml file.

<xs:element name="endnotes">
  <xs:complexType>
    <xs:attribute name="select" type="xs:boolean" />
    <xs:attribute name="output">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="generate-files" />
          <xs:enumeration value="generate-fragments" />
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Example

<endnotes select="true" output="generate-files"/>

<document>

This determines what markup splits the Word document into component documents. The components are bound in order, by a references document, which is considered the <main> document of the conversion.

select use a value of true or false determines whether or not the Word file gets split into component documents.

use-real-titles use a value of true or false if the PSML filename of each split document is to be extracted from the Word content. If false, the split document filename is the Word filename plus an incrementing number. If true, the psml filenames for the component/secondary documents are generated from the text in the first paragraph of each split document.

<xs:element name="document">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="sectionbreak"
                  minOccurs="0"
                  maxOccurs="unbounded" />
      <xs:element ref="outlinelevel"
                  minOccurs="0"
                  maxOccurs="unbounded" />
      <xs:element ref="wordstyle"
                  minOccurs="0"
                  maxOccurs="unbounded" />
      <xs:element ref="splitstyle"
                  minOccurs="0"
                  maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="select"
                  type="xs:boolean" />
    <xs:attribute name="use-real-titles"
                  type="xs:boolean" />
  </xs:complexType>
</xs:element>

Examples

<split>
  <document select="false">
    <outlinelevel select="0" />
  </document>
</split>

See use of:

document-split-document-false 

In the GitHub example, using the default <document select="false">, the Word document doesn’t split into component documents even though there are child elements specified under the <document> element.

<document select="true" use-real-titles="true" />

Using select="true", the Word document splits into component documents at the <wordstyle>’s or <outlinelevel>’s specified.

If importing using the default import config, or, if no <wordstyle> or <outlinelevel> is specified, the Word document splits at standard Word styles Heading 1 and Heading 2.

The default value for use-real-titles attribute is false. The PSML filename of the split document is the Word filename plus an incrementing number.
If  true,  the PSML filenames for the component/secondary documents are generated from the text in the first paragraph of each split document.

See use of:

document-split-outline-level 

The GitHub example uses select="true" and specifies to split the Word document at all styles that have an outline level of zero.  Standard Heading 1 style has an outline level of zero, Heading 2 has an outline level of 1 etc. The value of use-real-titles="true" , so the PSML filenames of the component documents are generated from the text in the the first paragraph of each split document.

See use of: 

document-split-paragraph-style 

The GitHub example uses select="true" and specifies to split the Word document at all styles that have Word style Heading 1.

document-split-multiple-paragraph-style 

The GitHub example uses select="true" and specifies to split the Word document at all styles that have Word style Heading 1 and Heading 2.

<split>
  <document select="true">
    <outlinelevel select="0" />
    <wordstyle select="Heading2" />
  </document>
</split>

See use of:

document-split-multiple-split-values-1 

The GitHub example uses select="true" and specifies to split the Word document at all styles that have an outline level of zero and all styles that are Heading 1.

document-split-multiple-split-values-2 

The GitHub example uses select="true" and specifies to split the Word document at all styles that have an outline level of zero and all styles have an outline level of 1 and also all styles that are Heading 1 and all styles that are Heading 2.

<section>

Determines how the PSML  component document is split into fragments.

select use a value of true or false to determine if the Component document is split into fragments.

use-real-titles use a value of true or false.

<xs:element name="section">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="sectionbreak" minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="outlinelevel" minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="wordstyle"    minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="splitstyle"   minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute  name="select"          type="xs:boolean" />
    <xs:attribute  name="use-real-titles" type="xs:boolean" />
  </xs:complexType>
</xs:element>

Example

<section select="false" />

See use of:

section-split-document-false 

The GitHub example uses section select="false". This means that each component document is not split into fragments, even though there are child elements contained in it. It has all content in the second fragment of the component document. The first fragment contains the content of the element that the <document> was split at.

<section select="true" />

See use of:

section-split-outline-level 

The GitHub example uses section select="true". The component documents are split into fragments only at all styles that have an outline level of zero. Standard Word style Heading 1 has an outline level of zero.

section-split-multiple-outline-level 

Using section select="true", the component documents are split into fragments only at all styles that have an outline level of zero. Standard Word style Heading 1 has an outline level of zero.

section-split-paragraph-style 

Using section select="true", the component documents are split into fragments only at all Word styles that have an outline level of zero and all styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero, Heading 2 has an outline level of 1 etc.

section-split-multiple-paragraph-style 

Using section select="true", the component documents are split into fragments only at all Word styles are Heading 1.

section-split-multiple-split-values-1 

Using section select="true", the component documents are split into fragments only at all Word styles are Heading 1 and Heading 2.

section-split-multiple-split-values-2 

Using section select="true", the component documents are split into fragments only at all Word styles have an outline level of zero, an outline level of 1, a Word style Heading 1 and a Word style Heading 2.

section-split-splitstyle 

Using section select="true", the component documents are split into fragments only at all Word styles that are Word style Heading 1.

<sectionbreak>

Controls new sections in PSML.

select use a value of true or false to determine if Word section breaks are used to create sections in PageSeeder or whether they are ignored.

<xs:element name="sectionbreak">
  <xs:complexType>
    <xs:attribute  name="select">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="evenPage" />
          <xs:enumeration value="oddPage" />
          <xs:enumeration value="continuous" />
          <xs:enumeration value="nextColumn" />
          <xs:enumeration value="nextPage" />
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Example

<split>
  <document select="true">
    <sectionbreak select="evenPage" />
    <sectionbreak select="oddPage" />
  </document>
</split>

<outlinelevel>

Processes the outline levels that are attached to styles or directly to paragraphs.

select use a value of true or false to determine if Word outline levels are used in the conversion.

<xs:element name="outlinelevel">
  <xs:complexType>
    <xs:attribute name="select">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:pattern value="[0-8]" />
          <xs:pattern value="[0-8]-[0-8]" />
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
</xs:element>

Example

<split>
  <document select="true">
    <outlinelevel select="0" />
    <outlinelevel select="1" />
  </document>
</split>

See use of:

document-split-outline-level 

The GitHub example is splitting the Word document at all styles that have an outlinelevel="0". Standard Word style Heading 1 has an outline level of zero.

document-split-multiple-outline-level 

The Word document is split at all styles that have an outlinelevel="0" and all Word styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero and Heading 2 has an outline level of 1.

section-split-outline-level 

The GitHub example is splitting the component document at all styles that have an outlinelevel="0". Standard Word style Heading 1 has an outline level of zero.

section-split-multiple-outline-level 

The component document is split at all styles that have an outlinelevel="0" and all Word styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero and Heading 2 has an outline level of 1.

<splitstyle>

Processes a style name with an explicit purpose of splitting fragments.

select use the value of a Word paragraph style ID to determine if the Word document is split into fragments at the Word paragraph style (note: the ID is different from Word paragraph style name).

<xs:element  name="splitstyle">
  <xs:complexType>
    <xs:attribute name="select" type="xs:string" />
  </xs:complexType>
</xs:element>

Example

<split>
  <document select="true">     
    <splitstyle select="splittingStyle1" />
  </document>
</split>

Given a custom Word style splittingStyle1, which has been created in a Word document, the Word document is split into component documents at all instances of the custom Word style.

See use of:

document-split-splitstyle 

The GitHub example is splitting the Word document into component documents at all instances of Word style Heading 1.

Created on , last edited on