Skip to main content

 Publishing

Publishing PageSeeder data to print, the Web or both

PSML split config usage

The default config follows.

For further information regarding the use of this file, see:

Overview

Importing a Word document involves two stages, each with their own configuration file:

  • The first stage converts the DOCX format to PSML using the word-import-config.xml.
  • The second stage splits the PSML files and fragments using the psml-split-config.xml.

The <split-config> element determines how the Word file is split into component documents, and how these are split into editable fragments.  Splitting both objects (documents and fragments) is done with three different configuration elements:

  • <container> – defines the default references document that contain xrefs to component documents that make up the publication.
  • <document> – defines the creation of new documents. 
  • <fragment> – defines the creation of new fragments.

Usage

The psml-split-config.xml can be edited by project managers and administrators only. To change the default configuration for everyone in a project:

Click Administration menu > Template > Template configuration, then in the Media types table, Split column, docx row, select override under the default option Only available on Template configuration page.

To revert to the default config:

Click Delete in the Split column, docx row as above, or delete from the docx folder at Administration menu > [Project]Template > Template files.

When editing this config file in PageSeeder, press ctrl-space to display auto-complete options for easier editing.

Default split config

The <split-config> element is the root element and must contain at least one empty <container> element (in other words, that has no child elements). It can have an optional @version applied.

The default PageSeeder split config follows. It creates a main container document, a folder called "components", a document for each heading level 1 and 2, and a fragment for each heading level 3 and 4:

<split-config version="0.7.0">
  <container /> 
  <document folder="components">
    <heading level="1"/>
    <heading level="2"/>
  </document>
  <fragment>
    <heading level="3"/>
    <heading level="4"/>
  </fragment>
</split-config>

Comprehensive example

The following explains in detail a more comprehensive example with multiple containers, folders, document types and labels:

<split-config>
  <container type="contract" labels="contract" />
  <container type="schedules"
             labels="schedules"
             contains="schedule" />
  <container type="definitions"
             labels="clause"
             contains="term"
             folder="clauses">
    <start>
      <block label="def-start" />
    </start>
    <continue>
      <block label="def-end" />
    </continue>
  </container>
  
  <document type="schedule"
            labels="schedule"
            folder="schedules">
    <block label="schedule"/>
  </document>
  <document type="clause"
            labels="clause"
            folder="clauses" >
    <heading level="1"/>
  </document>
  <document type="term"
            labels="term"
            folder="terms">
    <block label="definition" />
  </document>

  <fragment type="mytype" labels="mylabel">
    <heading level="2"/>
    <heading level="3"/>
    <para numbered="true"/>
  </fragment>
  <fragment type="mytype2" labels="mylabel2">
    <block label="myblocklabel" />
  </fragment>  
</split-config>

The root document

<container>

Defines the first (required) <container> element, which is the root PSML file, or publication root. When importing a DOCX file, this is a references document that contains xrefs to component documents. To make this document a publication, select the  Document info and metadata panel and choose the option "Make this document a publication". 

If there is no value for the @folder attribute on the main <container> element, it is created in a folder that has the same name as the DOCX file. For example, without specifying a folder, a DOCX file of “movies.docx”, generates a references document of “movies.psml” in a “movies” folder. 

Attributes

All attributes are optional:

  • @type – specifies a document type for the container document – the default type is references.
  • @labels – adds a comma-separated list of labels to the container document.
  • @contains – determines the document type of documents for this container to reference.
  • @folder – specifies a folder for the container document to be created in.

The Document title of the main PSML document is taken from uri/@title in the original PSML.

The conversion to PSML from DOCX uses the Word document title property (dc:title) as the title of the container document. If this property is empty in the Word document, the conversion uses the name of the file, minus the extension.

Example

Following are the attributes for this element – all are optional:

<split-config>
  <container type="contract" labels="contract" />
  <container type="schedules"
             labels="schedules"
             contains="schedule" />
  ...
</split-config>

In this example:

  • The main container document, the publication root, is of type contract and also has the label contract.
  • The main container document xrefs to (contains) documents of type schedules and other document types defined in the following. The xref target documents in the main container can be any types not contained by other containers.
  • The schedules container document has a label of schedules and xrefs to (contains) documents of type schedule.

<start>

Where there is a content in a document that you want to handle differently, use the <start> element to define the creation of a new container. If no <start> element exists, it starts with the first document in @contains. An xref in the parent container points to the new container and all content before the first document is put in the title section.

The <start> element has no attributes, but can contain multiple <heading> or <block> elements to describe the following:

The document title of each container PSML document uses the first 250 characters of content from the <heading> or <block> element. If no <start> element exists, the @type is used with a capitalized first letter.

<split-config>
  ...
  <container type="definitions"
             labels="clause"
             contains="term"
             folder="clauses">
    <start>
      <block label="def-start" />
    </start>
    <continue>
      <block label="def-end" />
    </continue>
  </container>
  ...
</split-config>

In this example, the definitions container document has a label of clause, and is in a clauses subfolder. This container xrefs to documents with a document type of term, and to invoke the split rule there must be a block label of def-start in the content.

<continue>

Companion to the <start> element, the <continue> element defines where xrefs stop being inserted. If no <continue> element exists, the xrefs stop with the beginning of the next document not in @contains. Any content between the last xref and the next document is put in the content section at the end.

The <continue> element has no attributes but can contain multiple <heading> or <block> elements described in the following.

In the example, the xrefs stop where the block label def-end exists in the content.

The component documents

<document>

Defines when to create a new component document. Component documents are the nodes or leaves of references documents. The import process creates these through the <document> element in the psml-split-config.xml.

If no <document> elements are defined, then all the content is in the single main container document and its type defaults to default instead of references.

Attributes

All attributes are optional:

  • @type – A document type can be applied to the document – the default type is default.
  • @labels – a comma-separated list of labels that can be applied to the document.
  • @folder – specify a folder to put the document in.

The <document> element can contain multiple <heading>, <block> or <inline> elements described in the following.

The document title of each component PSML document is based on the content in the original PSML inside the <heading>, or <block> element specified (truncated to 250 characters if required). The filename is generated using the format [@type]-[00N].psml or component-[00N].psml if no @type.

Following is an example of options for this element.

<split-config>
  ...
  <document type="clause"
            labels="clause"
            folder="clauses" >
    <heading level="1"/>
  </document>
  <document type="schedule"
            labels="schedule"
            folder="schedules">
    <block label="schedule"/>
  </document>
  <document type="term"
            labels="term"
            folder="terms">
    <block label="definition" />
  </document>
  ...
</split-config>

In the example, a new document is created at each occurrence of  <heading level="1"/>, <block label="schedule"/>, and <block label="definition" />. A new document ends only when another document or container starts.

<inline>

The <inline> element, previously available within <document>, is now deprecated. Instead, you can post process the PSML using XSLT to insert <placeholder> elements.

Managing fragments

<fragment>

Defines the creation of a new fragment that ends at the next new fragment or document.

Attributes

All attributes are optional:

  • @type – A type that can be applied to the fragment.
  • @labels – a comma-separated list of labels that can be applied to the fragment.

The <fragment> element can contain multiple <heading>, <block> or <para> elements described in the following.

Following is an example of the options for this element – all attributes are optional:

<split-config>
  ...
  <fragment type="mytype" labels="mylabel">
    <heading level="2" />
    <heading level="3" />
    <para numbered="true" />
  </fragment>
  <fragment type="mytype2" labels="mylabel2">
    <block label="myblocklabel" />
  </fragment> 
  ...
</split-config>

In the example, a new fragment is created at each <heading level="2"/>, <heading level="3"/>, <para numbered="true"/> and each  <block label="myblocklabel" />. In this example, the block label fragment is specified separately, as it has a different @type and @label than the other fragments.

<heading>

Used to specify where a container, document or fragment starts.

Attributes
  • @level – The @level the heading must have (required).
  • @numbered – The @numbered the heading must have – if false, the heading must not have numbered="true", if not specified, then any heading is matched (optional).

<block>

Used to specify where a container, document or fragment starts.

Attributes
  • @label – The @label the block must have (required).

<para>

Used to specify where a fragment starts.

Attributes
  • @numbered – The @numbered the para must have – if false the para must not have numbered="true", if not specified, then any para is matched (optional).
  • @prefix – Whether the para must have a @prefix attribute – if false the para must not have a @prefix attribute, if not specified, then any para is matched (optional). Requires pso-psml version 0.7.0 or higher.
Created on , last edited on