Skip to main content

 Publishing

Publishing PageSeeder data to print, the Web or both

PSML split config

The default config follows.

For further information regarding the use of this file, see:

Overview

Importing a Word document involves two configuration files:

  • First the DOCX document content is processed to PSML using the word-import-config.xml.
  • Then, that PSML is split using the psml-split-config.xml to give the final imported PSML.

The <split-config> element controls the points at which the Word file is split into component documents and within these, where the content is split into editable fragments.  Splitting for both of these objects (documents and fragments) can be controlled by three different PSML elements:

  • <container> – Defines when to create a new container. The default container is a references document containing xrefs to component documents that make up the publication.
  • <document> – Defines when to create a new document. 
  • <fragment> – Defines when to create a new fragment.

Usage

The psml-split-config.xml can be edited by project managers and administrators only: 

  1. To change the default configuration for everyone in a project, click Administration menu > Template > Types, then in the Media types table, Split column, docx row, select override under the default option Only available on Types page. (To revert to the default config, click delete in the Split column, docx row as above, or delete from the docx folder at Administration menu > [Project]Template > Template files .

When editing this config file in PageSeeder, pressing ctrl-space displays autocomplete options to make editing easier.

Default config

The <split-config> element is the root element and must contain at least one empty <container> element (in other words, that has no child elements). It can have an optional @version applied.

Following is the default split config used by PageSeeder. It create a single main container document, a new document for each heading level 1 or 2 and a new fragment for each heading level 3 or 4:

<split-config version="0.7.0">
  <container />
  <document folder="components">
    <heading level="1"/>
    <heading level="2"/>
  </document>
  <fragment>
    <heading level="3"/>
    <heading level="4"/>
  </fragment>
</split-config>

Comprehensive example

The following is a comprehensive example with multiple containers, folders, document types and labels. Each element is explained in more detail in the following.

<split-config>
  <container type="contract" labels="contract" />
  <container type="schedules"
             labels="schedule"
             contains="schedule" />
  <container type="definitions"
             labels="clause"
             contains="term"
             folder="clauses">
    <start>
      <block label="def-start" />
    </start>
    <continue>
      <block label="def-end" />
    </continue>
  </container>

  <document type="clause"
            labels="clause"
            folder="clauses" >
    <heading level="1"/>
  </document>
  <document type="schedule"
            labels="schedule"
            folder="schedules">
    <block label="schedule"/>
  </document>
  <document type="term"
            labels="term"
            folder="terms">
    <block label="definition" />
  </document>

  <fragment>
    <heading level="2"/>
    <heading level="3"/>
    <para numbered="true"/>
  </fragment>
  <fragment type="mytype" labels="mylabel">
    <block label="myblocklabel">
  </fragment>  
</split-config>

The document root

<container>

Defines when to create a new container. When importing a DOCX file, the root PSML file is a references document containing xrefs to component documents that make up the publication. The first (required) <container> element defines the main container (aka the root PSML file or publication root).

Attributes

All attributes are optional:

  • @type – A document type can be applied to the container—the default type is references.
  • @labels – a comma-separated list of labels that can be applied to the container.
  • @contains – A document type that this container is to reference.
  • @folder – specify a folder to put the container document in.

The Document title of the main PSML document is taken from uri/@title in the original PSML.

If the orignal PSML was converted from DOCX this means the title comes from the Word document title property (dc:title) if it exists, otherwise it is the filename without the extension.

Following is an example of the options for this element—all attributes are optional:

<split-config>
  <container type="contract" labels="contract" />
  <container type="schedules"
             labels="schedule"
             contains="schedule" />
  ...
</split-config>

In the previous example:

  • The main container document is of type contract and also has the label contract .
  • The main container xrefs to documents of type schedules and other document types defined in the following.
  • The schedules document has a label of schedule and xrefs to documents of type schedule.
<start>

A <container> can have one <start> element which defines where a new container is created. If there is no <start> element it starts where the first document in @contains begins. An xref is created in the parent container, pointing to the new container and any content before the first document goes in the title section.

The <start> element has no attributes but can contain multiple <heading> or <block> elements described in the following.

The document title of each container PSML document is based on the the content in the original PSML inside the <heading> or <block> element specified (truncated to 250 characters if required). If there is no <start> element the @type is used with a capitalized first letter.

<split-config>
  ...
  <container type="definitions"
             labels="clause"
             contains="term"
             folder="clauses">
    <start>
      <block label="def-start" />
    </start>
    <continue>
      <block label="def-end" />
    </continue>
  </container>
  ...
</split-config>

In the previous example, the definitions container document has a label of clause , is in a clauses subfolder and xrefs to documents of type term. It starts where the original content has the block label def-start.

<continue>

A <container> can have one <continue> element which defines where xrefs stop being inserted. If there is no <continue> element xrefs stop where the next document not in @contains begins. Any content between the last xref and the next document goes in the content section at the end.

The <continue> element has no attributes but can contain multiple <heading> or <block> elements described in the following.

In the previous example the xrefs stop where the original content has the block label def-end.

The component documents

<document>

Defines when to create a new component document. Component documents are the nodes or leaves of references documents. The import process creates these through the <document> element in the psml-split-config.xml.

If no <document> elements are defined then all the content is in the single main container document and its type defaults to default instead of references.

Attributes

All attributes are optional:

  • @type – A document type can be applied to the document—the default type is default.
  • @labels – a comma-separated list of labels that can be applied to the document.
  • @folder – specify a folder to put the document in.

The <document> element can contain multiple <heading>, <block> or <inline> elements described in the following.

The document title of each component PSML document is based on the the content in the original PSML inside the <heading>, <block> or <inline> element specified (truncated to 250 characters if required). The filename is generated using the format [@type]-[00N].psml or component-[00N].psml if no @type.

Following is an example of the options for this element.

<split-config>
  ...
  <document type="clause"
            labels="clause"
            folder="clauses" >
    <heading level="1"/>
  </document>
  <document type="schedule"
            labels="schedule"
            folder="schedules">
    <block label="schedule"/>
  </document>
  <document type="term"
            labels="term"
            folder="terms">
    <block label="definition" />
  </document>
  ...
</split-config>

In the previous example, a new document is created at each occurrence of  <heading level="1"/>, <block label="schedule"/>, and <block label="definition" />. A new document ends only when another document or container starts.

<inline>

The <inline> element is now deprecated. Instead you can post process the PSML using XSLT to insert <placeholder> elements.

Managing fragments

<fragment>

Defines when to create a new fragment. I new fragment continues until the next fragment or document starts.

Attributes

All attributes are optional:

  • @type – A type that can be applied to the fragment.
  • @labels – a comma-separated list of labels that can be applied to the fragment.

The <fragment> element can contain multiple <heading>, <block> or <para> elements described in the following.

Following is an example of the options for this element—all attributes are optional:

<split-config>
  ...
  <fragment type="mytype" labels="mylabel">
    <heading level="2" />
    <heading level="3" />
    <para numbered="true" />
  </fragment>
  <fragment type="mytype" labels="mylabel">
    <block label="myblocklabel">
  </fragment> 
  ...
</split-config>

In the previous example, a new fragment is created at each <heading level="2"/>, <heading level="3"/>, <para numbered="true"/> and each  <block label="myblocklabel" />.

<heading>

Used to specify where a container, document or fragment start.

Attributes
  • @level – The @level the heading must have (required).
  • @numbered – The @numbered the heading must have – if false, the heading must not have numbered="true", if not specified, then any heading is matched (optional).

<block>

Used to specify where a container, document or fragment start.

Attributes
  • @label – The @label the the block must have (required).

<para>

Used to specify where a fragment start.

Attributes
  • @numbered – The @numbered the para must have – if false the para must not have numbered="true", if not specified, then any para is matched (optional).
  • @prefix – Whether the para must have a @prefix attribute – if false the para must not have a @prefix attribute, if not specified, then any para is matched (optional). Requires pso-psml version 0.7.0 or higher.
Created on , last edited on