PSML split config usage
The default config follows.
For further information regarding the use of this file, see:
Overview
Importing a Word document involves two stages, each with their own configuration file:
- The first stage converts the DOCX format to PSML using the
word-import-config.xml
. - The second stage splits the PSML files and fragments using the
psml-split-config.xml
.
The <split-config>
element determines how the Word file is split into component documents, and how these are split into editable fragments. Splitting both objects (documents and fragments) is done with three different configuration elements:
<container>
– defines the defaultreferences
document that contain xrefs to component documents that make up the publication.<document>
– defines the creation of new documents.<fragment>
– defines the creation of new fragments.
Usage
The psml-split-config.xml
can be edited by project managers and administrators only. To change the default configuration for everyone in a project:
Click Administration menu > Template > Template configuration, then in the Media types table, Split column, docx row, select override under the default option Only available on Template configuration page.
To revert to the default config:
Click
When editing this config file in PageSeeder, press ctrl-space
to display auto-complete options for easier editing.
Default split config
The <split-config>
element is the root element and must contain at least one empty <container>
element (in other words, that has no child elements). It can have an optional @version
applied.
The default PageSeeder split config follows. It creates a main container document, a folder called "components
", a document for each heading level 1 and 2, and a fragment for each heading level 3 and 4:
<split-config version="0.7.0"> <container /> <document folder="components"> <heading level="1"/> <heading level="2"/> </document> <fragment> <heading level="3"/> <heading level="4"/> </fragment> </split-config>
Comprehensive example
The following explains in detail, a more comprehensive example with multiple containers, folders, document types and labels:
<split-config> <container type="contract" labels="contract" /> <container type="schedules" labels="schedules" contains="schedule" /> <container type="definitions" labels="clause" contains="term" folder="clauses"> <start> <block label="def-start" /> </start> <continue> <block label="def-end" /> </continue> </container> <document type="schedule" labels="schedule" folder="schedules"> <block label="schedule"/> </document> <document type="clause" labels="clause" folder="clauses" > <heading level="1"/> </document> <document type="term" labels="term" folder="terms"> <block label="definition" /> </document> <fragment type="mytype" labels="mylabel"> <heading level="2"/> <heading level="3"/> <para numbered="true"/> </fragment> <fragment type="mytype2" labels="mylabel2"> <block label="myblocklabel" /> </fragment> </split-config>
The root document
<container>
Defines the first (required) <container>
element, which is the root PSML file, or publication root. When importing a DOCX file, this is a references document that contains xrefs to component documents. To make this document a publication, select the
If there is no value for the @folder
attribute on the main <container>
element, it is created in a folder that has the same name as the DOCX file. For example, without specifying a folder, a DOCX file of “movies.docx
”, generates a references document of “movies.psml
” in a “movies
” folder.
Attributes
All attributes are optional
:
@type
– specifies a document type for the container document – the default type isreferences
.@labels
– adds a comma-separated list of labels to the container document.@contains
– determines the document type of documents for this container to reference.@folder
– specifies a folder for the container document to be created in.
The Document title of the main PSML document is taken from uri/@title
in the original PSML.
The conversion to PSML from DOCX uses the Word document title property (dc:title
) as the title of the container document. If this property is empty in the Word document, the conversion uses the name of the file, minus the extension.
Example
Following are the attributes for this element – all are optional:
<split-config> <container type="contract" labels="contract" /> <container type="schedules" labels="schedules" contains="schedule" /> ... </split-config>
In this example:
- The main container document, the publication root, is of type
contract
and also has the labelcontract
. - The main container document xrefs to (contains) documents of type
schedules
and other document types defined in the following. The xref target documents in the main container can be any types not contained by other containers. - The
schedules
container document has a label ofschedules
and xrefs to (contains) documents of typeschedule
.
<start>
Where there is a content in a document that you want to handle differently, use the <start>
element to define the creation of a new container. If no <start>
element exists, it starts with the first document in @contains
. An xref in the parent container points to the new container and all content before the first document is put in the title
section.
The <start>
element has no attributes, but can contain multiple <heading>
or <block>
elements to describe the following:
The document title of each container PSML document uses the first 250 characters of content from the <heading>
or <block>
element. If no <start>
element exists, the @type
is used with a capitalized first letter.
<split-config> ... <container type="definitions" labels="clause" contains="term" folder="clauses"> <start> <block label="def-start" /> </start> <continue> <block label="def-end" /> </continue> </container> ... </split-config>
In this example, the definitions
container document has a label of clause
, and is in a clauses
subfolder. This container xrefs to documents with a document type of term
, and to invoke the split rule there must be a block label of def-start
in the content.
<continue>
Companion to the <start> element, the <continue>
element defines where xrefs stop being inserted. If no <continue>
element exists, the xrefs stop with the beginning of the next document not in @contains
. Any content between the last xref and the next document is put in the content
section at the end.
The <continue>
element has no attributes but can contain multiple <heading>
or <block>
elements described in the following.
In the example, the xrefs stop where the block label def-end
exists in the content.
The component documents
<document>
Defines when to create a new component document. Component documents are the nodes or leaves of references documents. The import process creates these through the <document>
element in the psml-split-config.xml
.
If no <document>
elements are defined, then all the content is in the single main container document and its type defaults to default
instead of references
.
Attributes
All attributes are optional
:
@type
– A document type can be applied to the document – the default type isdefault
.@labels
– a comma-separated list of labels that can be applied to the document.@folder
– specify a folder to put the document in.
The <document>
element can contain multiple <heading>
, <block>
or <inline>
elements described in the following.
The document title of each component PSML document is based on the content in the original PSML inside the <heading>
, or <block>
element specified (truncated to 250 characters if required). The filename is generated using the format [@type]-[00N].psml
or component-[00N].psml
if no @type
.
Following is an example of options for this element.
<split-config> ... <document type="clause" labels="clause" folder="clauses" > <heading level="1"/> </document> <document type="schedule" labels="schedule" folder="schedules"> <block label="schedule"/> </document> <document type="term" labels="term" folder="terms"> <block label="definition" /> </document> ... </split-config>
In the example, a new document is created at each occurrence of <heading level="1"/>
, <block label="schedule"/>
, and <block label="definition" />
. A new document ends only when another document or container starts.
<inline>
The <inline>
element, previously available within <document>
, is now deprecated. Instead, you can post process the PSML using XSLT to insert <placeholder> elements.
Managing fragments
<fragment>
Defines the creation of a new fragment that ends at the next new fragment or document.
Attributes
All attributes are optional
:
@type
– A type that can be applied to the fragment.@labels
– a comma-separated list of labels that can be applied to the fragment.
The <fragment>
element can contain multiple <heading>
, <block>
or <para>
elements described in the following.
Following is an example of the options for this element – all attributes are optional:
<split-config> ... <fragment type="mytype" labels="mylabel"> <heading level="2" /> <heading level="3" /> <para numbered="true" /> </fragment> <fragment type="mytype2" labels="mylabel2"> <block label="myblocklabel" /> </fragment> ... </split-config>
In the example, a new fragment is created at each <heading level="2"/>
, <heading level="3"/>
, <para numbered="true"/>
and each <block label="myblocklabel" />
. In this example, the block label fragment is specified separately, as it has a different @type
and @label
than the other fragments.
<heading>
Used to specify where a container, document or fragment starts.
Attributes
@level
– The@level
the heading must have (required).@numbered
– The@numbered
the heading must have – iffalse
, the heading must not havenumbered="true"
, if not specified, then any heading is matched (optional).
<block>
Used to specify where a container, document or fragment starts.
Attributes
@label
– The@label
the block must have (required).
<para>
Used to specify where a fragment starts.
Attributes
@numbered
– The@numbered
the para must have – iffalse
the para must not havenumbered="true"
, if not specified, then any para is matched (optional).@prefix
– Whether the para must have a@prefix
attribute – iffalse
the para must not have a@prefix
attribute, if not specified, then any para is matched (optional). Requires pso-psml version0.7.0
or higher.