Advanced

Advanced topics

Word import config XSD

Version: 0.6.2

This file controls how Word docx files are converted into PSML. There are a number of ways to access it for editing. For users with admin rights on the server, the best option is through the same Developer interface used to manage publish scripts.

<xs:schema  elementFormDefault="unqualified"  version="0.6.2"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xml="http://www.w3.org/XML/1998/namespace" />

<config>

The root element of the instance, <config> is a container for the three key elements.

version provides a value that can be used for configuration management or technical support. 

<xs:element  name="config">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="split"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="lists"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="styles"
               minOccurs="0"  maxOccurs="1" />
      </xs:sequence>
      <xs:attribute  name="version"
                     type="xs:string" />
    </xs:complexType>
  </xs:element>

<split>

is a container for the elements that determines how each imported document is stored. Options include:

  • Keep each Word file as a single PageSeeder document,
  • Split the source document into a linked collection of references and component documents,
  • Give the imported PSML documents a @type of "default"
  • Give the imported PSML documents a custom type.
<xs:element  name="split">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="main"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="document"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="section"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="mathml"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="footnotes"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="endnotes"
               minOccurs="0"  maxOccurs="1" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<main>

Determines the nature of the main PSML file generated by the import process.  

type Specifies the document type for the primary file. When the incoming document is split to multiple files, the type will be references. For single file documents, it can be default or a custom type.

label Whether document labels are attached.

<xs:element  name="main">
  <xs:complexType>
    <xs:sequence>
     <xs:element  name="type"
             minOccurs="0"
             maxOccurs="1"
                  type="xs:string" />
     <xs:element  name="label"
             minOccurs="0"
             maxOccurs="1"
                  type="xs:string" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

Example

<main>
  <type>references</type>
  <label>production,test</label>
</main>

<mathml>

Controls the processing of any MathML objects in the Word file.

select use a value of "true" or "false" to determine if MathML content will be processed or ignored.

convert-to-mml use a value of "true" or "false" to determine whether MathML objects will be converted to the original math ml (mml) syntax or left as OfficeOpenXML syntax.

output use a value of "generate-files" or "generate-fragments" to determine whether each MathML object will be placed in a separate file, under a mathml folder or in a fragment in a mathml/mathm.psml file,

  generate-files

  generate-fragments

<xs:element  name="mathml">
    <xs:complexType>
      <xs:attribute  name="select"
                     type="xs:boolean" />
      <xs:attribute  name="convert-to-mml"
                     type="xs:boolean" />
      <xs:attribute  name="output">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="generate-files" />
            <xs:enumeration  value="generate-fragments" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<mathml select="true" 
        output="generate-files" 
        convert-to-mml="true"/>

<footnotes>

Controls the processing of Word footnote markers.

select use a value of "true" or "false" to determine if Word footnotes will be processed or ignored.

output use a value of "generate-files" or "generate-fragments" to determine whether each footnote object will be placed in a separate file, under a footnotes folder or in a fragment in a footnotes/footnotes.psml file,

<xs:element  name="footnotes">
    <xs:complexType>
      <xs:attribute  name="select"  type="xs:boolean" />
      <xs:attribute  name="output">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="generate-files" />
            <xs:enumeration  value="generate-fragments" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<footnotes select="true" 
           output="generate-files"/>

<endnotes>

Controls the processing of Word endnote markers.

select use a value of "true" or "false" to determine if Word endnotes will be processed or ignored

output use a value of "generate-files" or "generate-fragments" to determine whether each footnote object will be placed in a separate file, under a endnotes folder or in a fragment in a endnotes/endnotes.psml file,

<xs:element  name="endnotes">
    <xs:complexType>
      <xs:attribute  name="select"  type="xs:boolean" />
      <xs:attribute  name="output">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="generate-files" />
            <xs:enumeration  value="generate-fragments" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<endnotes select="true" 
           output="generate-files"/>

<document>

Is used to determine how the Word document gets split into component documents, which are complied in a reference document .

select use a value of "true" or "false" to determine if the Word document will be split into component psml documents or ignored.

use-real-titles use a value of "true" or "false" to determine whether the PSML Document title is extracted from the text of the Word document rather than generated from the docx filename.

<xs:element  name="document">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="sectionbreak"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="outlinelevel"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="wordstyle"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="splitstyle"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute  name="select"
                     type="xs:boolean" />
      <xs:attribute  name="use-real-titles"
                     type="xs:boolean" />
    </xs:complexType>
  </xs:element>

<section>

Determines how the PSML component document is split into fragments.

select use a value of "true" or "false" to determine if the Component document will be split into fragments.

use-real-titles use a value of "true" or "false"

<xs:element  name="section">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="sectionbreak"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="outlinelevel"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="wordstyle"
               minOccurs="0"  maxOccurs="unbounded" />
        <xs:element  ref="splitstyle"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute  name="select" 
                     type="xs:boolean" />
      <xs:attribute  name="use-real-titles"
                     type="xs:boolean" />
    </xs:complexType>
  </xs:element>

<sectionbreak>

Controls new sections in PSML.

select use a value of "true" or "false" to determine if Word section breaks will used to create sections in PageSeeder or whether they will be ignored.

<xs:element  name="sectionbreak">
    <xs:complexType>
      <xs:attribute  name="select">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="evenPage" />
            <xs:enumeration  value="oddPage" />
            <xs:enumeration  value="continuous" />
            <xs:enumeration  value="nextColumn" />
            <xs:enumeration  value="nextPage" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<split>
    <document select="true">
      <sectionbreak select="evenPage" />
      <sectionbreak select="oddPage" />
    </document>
</split>

<outlinelevel>

Processes the outline levels that are attached to styles or directly to paragraphs.

select use a value of "true" or "false" to determine if Word outline levels will be used in the conversion.

<xs:element  name="outlinelevel">
    <xs:complexType>
      <xs:attribute  name="select">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:pattern  value="[0-8]" />
            <xs:pattern  value="[0-8]-[0-8]" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<split>
    <document select="true">
      <outlinelevel select="0" />
      <outlinelevel select="1" />
    </document>
  </split>

<splitstyle>

a style name with an explicit purpose of splitting fragments.

select use the value of a stylename to determine if the Word document will be split into fragments at the Word Style paragraph style ID (note: the ID is different from Word paragraph style name).

<xs:element  name="splitstyle">
    <xs:complexType>
      <xs:attribute  name="select"  type="xs:string" />
    </xs:complexType>
  </xs:element>

Example

<split>
    <document select="true">     
      <splitstyle select="splittingStyle1"/>
    </document>
  </split>

<lists>

Determines how heading and list numbering is interpreted.

<xs:element  name="lists">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="add-numbering-to-document-titles"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="convert-to-list-roles"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="convert-to-numbered-paragraphs"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="convert-manual-numbering"
               minOccurs="0"  maxOccurs="1" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<add-numbering-to-document-titles>

To add numbering to the titles in the references document.

select use a value "true" or "false" to determine whether numbering is adding to the references document link title or whether it is not.

<xs:element  name="add-numbering-to-document-titles">
    <xs:complexType>
      <xs:attribute  name="select"  type="xs:boolean" />
    </xs:complexType>
  </xs:element>

Example

<add-numbering-to-document-titles select="true"/>

<convert-to-list-roles>

Allows lists to contain a @role attribute set with the value of the Word paragraph style.

select use a value of "true" or "false" to determine whether PSML lists inherit the name of the Word list style as a @role or not. By default, this value is "false".

<xs:element  name="convert-to-list-roles">
    <xs:complexType>
      <xs:attribute  name="select"
                     type="xs:boolean" />
    </xs:complexType>
  </xs:element>

Example

<convert-to-list-roles select="false"/>

<convert-to-numbered-paragraphs>

Is used to control the conversion of  numbered paragraph styles to numbered paragraphs or lists in PageSeeder. To convert to numbered paragraphs the @select attribute must be set to "true". If it contains any other value it will convert to <list> or <nlist> depending on the type of numbered value.

select use a value of "true" or "false" to determine whether numbered Word paragraph styles are converted to numbered paragraphs or lists, or whether they are not.

<xs:element  name="convert-to-numbered-paragraphs">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="level"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute  name="select"
                     type="xs:boolean" />
    </xs:complexType>
  </xs:element>

<convert-manual-numbering>

To convert non-automated numbering values that can exist in a Word document.

select use a value of "true" or "false" to determine whether manual numbering in the Word document is converted or whether it is not.

<xs:element  name="convert-manual-numbering">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="value"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute  name="select"  type="xs:boolean" />
    </xs:complexType>
  </xs:element>

<level>

Controls the processing of each level of paragraph.

value use "1" to "6" to correspond to level of the paragraph.

output use "prefix" or "text" to convert the number from the Word paragraph to a PSML prefix or text.  Use "numbering" to create numbered paragraphs in PSML and "inline=[label]" to wrap the number in a PSML inline label.

<xs:element  name="level">
    <xs:complexType>
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:integer">
            <xs:minInclusive  value="1" />
            <xs:maxInclusive  value="6" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute  name="output">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:pattern 
        value="numbering|prefix|inline=[a-zA-Z0-9_\-]+|text" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<lists>      
      <convert-to-numbered-paragraphs select="true">
        <level value="1" output="prefix"/>           
        <level value="2" output="text"/>
        <level value="3" output="prefix"/>
        <level value="4" output="numbering"/>
        <level value="5" output="numbering"/>
        <level value="6" output="inline=level6"/>
      </convert-to-numbered-paragraphs>      
  </lists>

<value>

[Insert description]

prefix use "prefix" to generate a prefix with the value of the current auto-numbering or manual numbering value for each of the Word numbered paragraphs.

autonumbering [Insert description]

match use a regular expression to mark up text in the Word document.  

<xs:element  name="value">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="inline"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  name="prefix" 
                minOccurs="0"  maxOccurs="1" />
        <xs:element  name="autonumbering"
                minOccurs="0"  maxOccurs="1" />
      </xs:sequence>
      <xs:attribute  name="match"  type="xs:string" />
    </xs:complexType>
  </xs:element>
<inline>

Used to mark up content within a block of text — like character style in Word documents.

label [Insert description]

<xs:element  name="inline">
    <xs:complexType>
      <xs:attribute  name="label"  type="xs:string" />
    </xs:complexType>
  </xs:element>

Example

<convert-manual-numbering select="true">
        <value match="^[\(|\[|\{][a-z]+[\)|\]|\}]">
          <inline label="numbering-lowercase" />
        </value>
        <value match="^[\(|\[|\{][A-Z]+[\)|\]|\}]">
          <prefix/>
        </value>
        <value match="^[\(|\[|\{][ivx]+[\)|\]|\}]">
          <list role="numbering-roman"/>
        </value>
        <value match="Part&#160;[A-Z0-9]+">
          <prefix />
         </value>
        <value match="Note:\s*">
           <prefix />
        </value>
        <value match="\s*[0-9]+[A-Z]*$">
           <prefix />
        </value>
</convert-manual-numbering>

<styles>

Controls how styles from Word are translated to PSML.

<xs:element  name="styles">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="ignore"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="default"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="wordstyle"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<ignore>

Determines which content should not be processed. For example, the Word Table of Contents paragraphs can often be discarded.

<xs:element  name="ignore">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="wordstyle"  maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

Example

        <ignore>
            <wordstyle value="TOC1" />
            <wordstyle value="TOC2" />
            <wordstyle value="TOC3" />
            <wordstyle value="TOC4" />
        </ignore>

<default>

Contains the settings for general transformations of docx to PSML.

<xs:element  name="default">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="paragraphStyles"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="characterStyles"
               minOccurs="0"  maxOccurs="1" />
        <xs:element  ref="smart-tag"
               minOccurs="0"  maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

<paragraphStyles>

Defines a mapping for a paragraph style not mapped by <wordstyle> or <lists>.

value "para" transforms all un-mapped Word paragraph styles to a PSML <para> element. "block" transforms all un-mapped Word paragraph styles to a PSML <block> element with a label equal to the Word paragraph style ID (note: the ID is different from Word paragraph style name).

<xs:element  name="paragraphStyles">
    <xs:complexType>
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="para" />
            <xs:enumeration  value="block" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<paragraphStyles value="block" />
<paragraphStyles value="para" />

<characterStyles>

Defines general rule for any character style not mapped with <wordstyle>.  

value @value supports the following:

  • "none" – strips the markup for un-mapped Word character styles.
  • "inline" – transforms un-mapped Word character styles to a PSML <inline> element with a label equal to the Word character style ID (note: the ID is different from Word character style name).
<xs:element  name="characterStyles">
    <xs:complexType>
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="inline" />
            <xs:enumeration  value="none" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<characterStyles value="none" />
<characterStyles value="inline" />

<smart-tag>

Word  smart tag information can be either discarded or captured in PageSeeder as an inline label, with a value equal to that of the smart tag. To do this, the @keep attribute must be set to "true". With any other value, the smart-tag markup will be discarded. 

keep possible values: "true" or "false". 

<xs:element  name="smart-tag">
    <xs:complexType>
      <xs:attribute  name="keep"  type="xs:boolean" />
    </xs:complexType>
  </xs:element>

Example

<smart-tag keep="true">

<wordstyle>

These rules transform Word paragraph or character styles into PSML elements.

Example PSML elements include:

  • <para>
  • <heading>
  • <monospace>
  • <preformat>
  • <caption>
  • <block>
  • <inline>

type [Insert description]

select [Insert description]

value [Insert description]

name [Insert description]

table [Insert description]

psmlelement [Insert description]

<xs:element  name="wordstyle">
    <xs:complexType>
        <xs:all>
          <xs:element  ref="label"
                 minOccurs="0"  maxOccurs="1" />
          <xs:element  name="type"
                  minOccurs="0"  maxOccurs="1" type="xs:string" />
          <xs:element  ref="level"
                 minOccurs="0"  maxOccurs="1" />
          <xs:element  ref="numbering"
                 minOccurs="0"  maxOccurs="1" />
          <xs:element  ref="indent"
                 minOccurs="0"  maxOccurs="1" />
        </xs:all>
        <xs:attribute  name="select"  type="xs:string" />
        <xs:attribute  name="value"  type="xs:string" />
        <xs:attribute  name="name"  type="xs:string" />
        <xs:attribute  name="table"  type="xs:string" />
        <xs:attribute  name="psmlelement">
            <xs:simpleType>
              <xs:restriction  base="xs:string">
                <xs:enumeration  value="title" />
                <xs:enumeration  value="para" />
                <xs:enumeration  value="heading" />
                <xs:enumeration  value="block" />
                <xs:enumeration  value="inline" />
                <xs:enumeration  value="monospace" />
                <xs:enumeration  value="preformat" />
                <xs:enumeration  value="caption" />
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
    </xs:complexType>
  </xs:element>

<label>

[Insert description]

type @type with values "block" or "inline"

value @value with value of  a "[valid label name]

<xs:element  name="label">
    <xs:complexType  mixed="true">
      <xs:attribute  name="type">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="block" />
            <xs:enumeration  value="inline" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:pattern  value="[a-zA-Z0-9_\-]+" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<label type="block" value="chapter">
<label type="inline" value="part_num">

<numbering>

supports a range of options for numbering headings and paragraphs.

select values of "true" or "false"

value possible values for @value are:

 "inline" wrap number in an inline label specified by nested <label value="[valid label name]"> element.

 "text" includes number in paragraph text.

 "prefix" insert number in @prefix attribute.

 "numbering" add @numbered="true".

<xs:element  name="numbering">
    <xs:complexType>
      <xs:sequence>
        <xs:element  ref="label"  minOccurs="0" />
      </xs:sequence>
      <xs:attribute  name="select"  type="xs:boolean" />
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:string">
            <xs:enumeration  value="numbering" />
            <xs:enumeration  value="inline" />
            <xs:enumeration  value="text" />
            <xs:enumeration  value="prefix" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<numbering select="false" />
<numbering select="true" value="inline"/>
<numbering select="true" value="text"/>
<numbering select="true" value="prefix"/>
<numbering select="true" value="numbering" numbered="true"/>

<indent>

[Insert description]

value values of "1" to "6"

<xs:element  name="indent">
    <xs:complexType>
      <xs:attribute  name="value">
        <xs:simpleType>
          <xs:restriction  base="xs:integer">
            <xs:minInclusive  value="1" />
            <xs:maxInclusive  value="6" />
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

Example

<indent value="2" />

Created on , last edited on