Word DOCX - import schema <split> (obsolete)
<split>
This element is obsolete and no longer supported in PageSeeder v6. It was deprecated as of PageSeeder v5.98. Use PSML split config instead.
Is a container for the elements that determines how each imported document is stored. Options include:
- Keep each Word file as a single PageSeeder document.
- Split the source document into a linked collection of references and component documents.
- Give the imported PSML documents a
@type
ofdefault
. - Give the imported PSML documents a custom type.
<xs:element name="split"> <xs:complexType> <xs:sequence> <xs:element ref="main" minOccurs="0" maxOccurs="1" /> <xs:element ref="document" minOccurs="0" maxOccurs="1" /> <xs:element ref="section" minOccurs="0" maxOccurs="1" /> <xs:element ref="mathml" minOccurs="0" maxOccurs="1" /> <xs:element ref="footnotes" minOccurs="0" maxOccurs="1" /> <xs:element ref="endnotes" minOccurs="0" maxOccurs="1" /> </xs:sequence> </xs:complexType> </xs:element>
<main>
Determines the nature of the main PSML file generated by the import process.
type a document type for the primary document: the reference document (default is references
).
label whether labels are attached to the primary document: the reference document.
<xs:element name="main"> <xs:complexType> <xs:sequence> <xs:element name="type" minOccurs="0" maxOccurs="1" type="xs:string" /> <xs:element name="label" minOccurs="0" maxOccurs="1" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element>
Example
<main> <type>legislation</type> <label>production,test</label> </main>
<mathml>
Controls the processing of any MathML objects in the Word file.
select use a value of true
or false
to determine if MathML content is processed or ignored.
convert-to-mml use a value of true
or false
to determine whether MathML objects are converted to the original math ml (mml
) syntax or left as OfficeOpenXML syntax (always true
for generate-fragments
option).
output use a value of generate-files
or generate-fragments
to determine whether each MathML object is placed in a separate file, under a mathml
folder or in a fragment inside it’s own document with the path mathml/mathml-[n].psml
(requires pso-docx version 0.7.8
or higher).
<xs:element name="mathml"> <xs:complexType> <xs:attribute name="select" type="xs:boolean" /> <xs:attribute name="convert-to-mml" type="xs:boolean" /> <xs:attribute name="output"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="generate-files" /> <xs:enumeration value="generate-fragments" /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element>
Example
<mathml select="true" output="generate-files" convert-to-mml="true"/>
See use of:
<footnotes>
Controls the processing of Word footnote markers.
select use a value of true
or false
to determine if Word footnotes are processed or ignored.
output use a value of generate-files
or generate-fragments
to determine whether each footnote object is placed in a separate file, under a footnotes
folder or in a fragment in a footnotes/footnotes.psml
file.
<xs:element name="footnotes"> <xs:complexType> <xs:attribute name="select" type="xs:boolean" /> <xs:attribute name="output"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="generate-files" /> <xs:enumeration value="generate-fragments" /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element>
Example
<footnotes select="true" output="generate-files"/>
<endnotes>
Controls the processing of Word endnote markers.
select use a value of true
or false
to determine if Word endnotes are processed or ignored
output use a value of generate-files
or generate-fragments
to determine whether each endnote object is placed in a separate file, under a endnotes
folder or in a fragment in a endnotes/endnotes.psml
file.
<xs:element name="endnotes"> <xs:complexType> <xs:attribute name="select" type="xs:boolean" /> <xs:attribute name="output"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="generate-files" /> <xs:enumeration value="generate-fragments" /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element>
Example
<endnotes select="true" output="generate-files"/>
<document>
This determines what markup splits the Word document into component documents. The components are bound in order, by a references document, which is considered the <main> document of the conversion.
select use a value of true
or false
determines whether or not the Word file gets split into component documents.
use-real-titles use a value of true
or false
if the PSML filename of each split document is to be extracted from the Word content. If false
, the split document filename is the Word filename plus an incrementing number. If true
, the psml filenames for the component/secondary documents are generated from the text in the first paragraph of each split document.
<xs:element name="document"> <xs:complexType> <xs:sequence> <xs:element ref="sectionbreak" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="outlinelevel" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="wordstyle" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="splitstyle" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="select" type="xs:boolean" /> <xs:attribute name="use-real-titles" type="xs:boolean" /> </xs:complexType> </xs:element>
Examples
<split> <document select="false"> <outlinelevel select="0" /> </document> </split>
See use of:
In the GitHub example, using the default <document select="false">
, the Word document doesn’t split into component documents even though there are child elements specified under the <document>
element.
<document select="true" use-real-titles="true" />
Using select="true"
, the Word document splits into component documents at the <wordstyle>’s or <outlinelevel>’s specified.
If importing using the default import config, or, if no <wordstyle>
or <outlinelevel>
is specified, the Word document splits at standard Word styles Heading 1 and Heading 2.
The default value for use-real-titles
attribute is false
. The PSML filename of the split document is the Word filename plus an incrementing number.
If true
, the PSML filenames for the component/secondary documents are generated from the text in the first paragraph of each split document.
See use of:
The GitHub example uses select="true"
and specifies to split the Word document at all styles that have an outline level of zero. Standard Heading 1 style has an outline level of zero, Heading 2 has an outline level of 1 etc. The value of use-real-titles="true"
, so the PSML filenames of the component documents are generated from the text in the the first paragraph of each split document.
See use of:
document-split-paragraph-style
The GitHub example uses select="true"
and specifies to split the Word document at all styles that have Word style Heading 1.
document-split-multiple-paragraph-style
The GitHub example uses select="true"
and specifies to split the Word document at all styles that have Word style Heading 1 and Heading 2.
<split> <document select="true"> <outlinelevel select="0" /> <wordstyle select="Heading2" /> </document> </split>
See use of:
document-split-multiple-split-values-1
The GitHub example uses select="true"
and specifies to split the Word document at all styles that have an outline level of zero and all styles that are Heading 1.
document-split-multiple-split-values-2
The GitHub example uses select="true"
and specifies to split the Word document at all styles that have an outline level of zero and all styles have an outline level of 1 and also all styles that are Heading 1 and all styles that are Heading 2.
<section>
Determines how the PSML component document is split into fragments.
select use a value of true
or false
to determine if the Component document is split into fragments.
use-real-titles use a value of true
or false.
<xs:element name="section"> <xs:complexType> <xs:sequence> <xs:element ref="sectionbreak" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="outlinelevel" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="wordstyle" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="splitstyle" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="select" type="xs:boolean" /> <xs:attribute name="use-real-titles" type="xs:boolean" /> </xs:complexType> </xs:element>
Example
<section select="false" />
See use of:
The GitHub example uses section select="false"
. This means that each component document is not split into fragments, even though there are child elements contained in it. It has all content in the second fragment of the component document. The first fragment contains the content of the element that the <document>
was split at.
<section select="true" />
See use of:
The GitHub example uses section select="true"
. The component documents are split into fragments only at all styles that have an outline level of zero. Standard Word style Heading 1 has an outline level of zero.
section-split-multiple-outline-level
Using section select="true",
the component documents are split into fragments only at all styles that have an outline level of zero. Standard Word style Heading 1 has an outline level of zero.
Using section select="true"
, the component documents are split into fragments only at all Word styles that have an outline level of zero and all styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero, Heading 2 has an outline level of 1 etc.
section-split-multiple-paragraph-style
Using section select="true"
, the component documents are split into fragments only at all Word styles are Heading 1.
section-split-multiple-split-values-1
Using section select="true"
, the component documents are split into fragments only at all Word styles are Heading 1 and Heading 2.
section-split-multiple-split-values-2
Using section select="true"
, the component documents are split into fragments only at all Word styles have an outline level of zero, an outline level of 1, a Word style Heading 1 and a Word style Heading 2.
Using section select="true"
, the component documents are split into fragments only at all Word styles that are Word style Heading 1.
<sectionbreak>
Controls new sections in PSML.
select use a value of true
or false
to determine if Word section breaks are used to create sections in PageSeeder or whether they are ignored.
<xs:element name="sectionbreak"> <xs:complexType> <xs:attribute name="select"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="evenPage" /> <xs:enumeration value="oddPage" /> <xs:enumeration value="continuous" /> <xs:enumeration value="nextColumn" /> <xs:enumeration value="nextPage" /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element>
Example
<split> <document select="true"> <sectionbreak select="evenPage" /> <sectionbreak select="oddPage" /> </document> </split>
<outlinelevel>
Processes the outline levels that are attached to styles or directly to paragraphs.
select use a value of true
or false
to determine if Word outline levels are used in the conversion.
<xs:element name="outlinelevel"> <xs:complexType> <xs:attribute name="select"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-8]" /> <xs:pattern value="[0-8]-[0-8]" /> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element>
Example
<split> <document select="true"> <outlinelevel select="0" /> <outlinelevel select="1" /> </document> </split>
See use of:
The GitHub example is splitting the Word document at all styles that have an outlinelevel="0"
. Standard Word style Heading 1 has an outline level of zero.
document-split-multiple-outline-level
The Word document is split at all styles that have an outlinelevel="0"
and all Word styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero and Heading 2 has an outline level of 1.
The GitHub example is splitting the component document at all styles that have an outlinelevel="0"
. Standard Word style Heading 1 has an outline level of zero.
section-split-multiple-outline-level
The component document is split at all styles that have an outlinelevel="0"
and all Word styles that have an outline level of 1. Standard Word style Heading 1 has an outline level of zero and Heading 2 has an outline level of 1.
<splitstyle>
Processes a style name with an explicit purpose of splitting fragments.
select use the value of a Word paragraph style ID to determine if the Word document is split into fragments at the Word paragraph style (note: the ID
is different from Word paragraph style name).
<xs:element name="splitstyle"> <xs:complexType> <xs:attribute name="select" type="xs:string" /> </xs:complexType> </xs:element>
Example
<split> <document select="true"> <splitstyle select="splittingStyle1" /> </document> </split>
Given a custom Word style splittingStyle1
, which has been created in a Word document, the Word document is split into component documents at all instances of the custom Word style.
See use of:
The GitHub example is splitting the Word document into component documents at all instances of Word style Heading 1.