Task split

This task was introduced in PageSeeder version 5.9703.

Split a single PSML Universal Processed Format file into separate documents and fragments according to the rules defined in a psml-split-config file (see PSML split config usage).

Only the content of <fragment>, <properties-fragment> and <media-fragment> elements is preserved. Only the <metadata> element on the root document is preserved. The elements <section>/<title>, <anchor>, <xref-fragment> and their content (except for embedded/transcluded content) is removed.
Any attributes on <fragment> elements are removed and replaced by those defined in the split config file.

If there are internal xrefs in the PSML file it must have been produced by the Task process with at least the <xrefs types="" /> option to ensure that the xrefs resolve after the split.

Definition

Minimal definition:

<ps:split src="[source]"
          dest="[destination]"
          config="[config]" />

Full definition:

<ps:split src="[source]"
          dest="[destination]"
          config="[config]" 
          mediafolder="[folder name]"
          working="[working]" />

Attributes

Attribute	Description	Required	Default
src	The path of the file to split. For example: `c:\psml\download\mydoc.psml`	Yes
dest	The destination folder or the main file path (if a folder, the src filename is used for the main file). For example: `c:\psml\split` `c:\psml\split\mysplitdoc.psml`	Yes
config	The path of the split config file. For example: `psml-split-config.xml`	Yes
mediafolder	The name of the folder under src that contains images (all images must be in the same folder)	No	`images`
working	The path of the working folder for the split. For example: `c:\psml\temp`	No

Examples

Split when importing a DocX document as PSML

The following example is designed to run inside PageSeeder so uses ps- properties.
The word-import-config.xml must have:

No <main>, <document> or <section> elements under <split> so that it produces a single PSML file. These elements are now obsolete.
A <references psmlelement="link" /> element under <styles><default> so that references are preserved as links to anchors (because fragments are not split yet).

The splitting is then done by the split task which is much faster and more flexible than the import-docx task. The attribute working="${temp-split}" is not required but can be useful when debugging the psml-split-config.xml to see the PSML at different stages of the split.

<property name="download"
          value="${ps-working}/download" />
<property name="output"
          value="${ps-working}/psml" />
<property name="split"
          value="${ps-working}/split" />
<property name="temp"
          value="${ps-working}/temp" />
<property name="temp-split" 
          value="${ps-working}/temp-split" />
...

<!-- replace spaces with underscores
     because import-docx does -->
<script language="javascript">
  project.setProperty("filename",
    project.getProperty("ps-uploadFilenameNoExt")
    .replaceAll(" ", "_").toLowerCase());
</script>

<psd:import-docx
          src="${download}/${ps-uploadPath}"
          dest="${output}"
          working="${temp}"
          config="word-import-config.xml" />
 
<ps:split src="${output}/${filename}.psml"
          dest="${split}"
          working="${temp-split}"
          config="psml-split-config.xml" />

Re-split an existing publication in PageSeeder

The following example is designed to run inside PageSeeder so uses ps- properties. The process task collects all the exported publication files into a single PSML file. The split task requires all images in a single folder so the line <images src="filename" location="${process}/images" /> is used to do this while preserving their filenames (-[n], is added if there is a name clash).

<property name="working"
          value="${ps.config.default.working}/download" />
<property name="download"
          value="${working}/download" />
<property name="process"
          value="${working}/process" />
<property name="split"
          value="${working}/split" />
<property name="temp-split"
          value="${working}/temp-split" />
...

<ps:process src="${download}"
            dest="${process}"
            processed="false"
            convertmarkdown="false"
            convertasciimath="false">
  <xrefs types="embed,transclude">
    <include name="${ps.config.default.uri.filename.no.ext}.psml"/>
  </xrefs>
  <!-- uncomment this to preserve publication heading levels
  <publication config="publication-config.xml"
               rootfile="${ps.config.default.uri.filename.no.ext}.psml"
               headingleveladjust="content" />
  -->
  <images src="filename"
          location="${process}/images" />
</ps:process>
 
<ps:split
     src="${process}/${ps.config.default.uri.filename.no.ext}.psml"
     dest="${split}"
     working="${temp-split}"
     config="psml-split-config.xml" />

To preserve transcludes, use <xrefs types="embed"> in the process task. This preserves the <blockxref type="transclude" ... > elements instead of replacing them with their content. However transcludes directly under <xref-fragment> must be copied inside a <fragment> element to preserve them by adding a <posttransform> on the process task with XSLT like:

<xsl:template match="blockxref[@type='transclude' and
                     parent::xref-fragment]">
  <xsl:copy>
    <fragment id="preserve">
      <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:value-of select="@urititle" />
      </xsl:copy>
    </fragment>
  </xsl:copy>
</xsl:template>

Created on 29 April 2019, last edited on 11 June 2024 at 14:44

Publishing

Task split

Definition

Attributes

Examples