Skip to main content

 Publishing

Publishing PageSeeder data to print, the Web or both

Task split

This task was introduced in PageSeeder version 5.9703.

Split a single PSML Universal Processed Format file into separate documents and fragments according to the rules defined in a psml-split-config file (see PSML split config usage).

  • Only the content of <fragment>, <properties-fragment> and <media-fragment> elements is preserved. Only the <metadata> element on the root document is preserved. The elements <section>/<title>, <anchor>, <xref-fragment> and their content (except for embedded/transcluded content) is removed.
  • Any attributes on <fragment> elements are removed and replaced by those defined in the split config file.

If there are internal xrefs in the PSML file it must have been produced by the Task process with at least the <xrefs types="" /> option to ensure that the xrefs resolve after the split.

Definition

Minimal definition:

<ps:split src="[source]"
          dest="[destination]"
          config="[config]" />

 Full definition:

<ps:split src="[source]"
          dest="[destination]"
          config="[config]" 
          mediafolder="[folder name]"
          working="[working]" />

Attributes

AttributeDescriptionRequiredDefault
src

The path of the file to split. For example:

c:\psml\download\mydoc.psml

Yes
dest

The destination folder or the main file path (if a folder, the src filename is used for the main file). For example:

c:\psml\split

c:\psml\split\mysplitdoc.psml

Yes
config

The path of the split config file. For example:

psml-split-config.xml

Yes
mediafolderThe name of the folder under src that contains images (all images must be in the same folder)Noimages
working

The path of the working folder for the split. For example:

c:\psml\temp

No

Examples

Split when importing a DocX document as PSML

The following example is designed to run inside PageSeeder so uses ps- properties.
The word-import-config.xml must have:

  • No <document> or <section> elements under <split> so that it produces a single PSML file. These elements are now deprecated.
  • A <references psmlelement="link" /> element under <styles><default> so that references are preserved as links to anchors (because fragments are not split yet).

The splitting is then done by the split task which is much faster and more flexible than the import-docx task. The attribute working="${temp-split}" is not required but can be useful when debugging the psml-split-config.xml to see the PSML at different stages of the split.

<property name="download"
          value="${ps-working}/download" />
<property name="output"
          value="${ps-working}/psml" />
<property name="split"
          value="${ps-working}/split" />
<property name="temp"
          value="${ps-working}/temp" />
<property name="temp-split" 
          value="${ps-working}/temp-split" />
...

<!-- replace spaces with underscores
     because import-docx does -->
<script language="javascript">
  project.setProperty("filename",
    project.getProperty("ps-uploadFilenameNoExt")
    .replaceAll(" ", "_").toLowerCase());
</script>

<psd:import-docx
          src="${download}/${ps-uploadPath}"
          dest="${output}"
          working="${temp}"
          config="word-import-config.xml" />
 
<ps:split src="${output}/${filename}.psml"
          dest="${split}"
          working="${temp-split}"
          config="psml-split-config.xml" />

Re-split an existing publication in PageSeeder

The following example is designed to run inside PageSeeder so uses ps- properties. The process task collects all the exported publication files into a single PSML file. The split task requires all images in a single folder so the line <images src="filename" location="${process}/images" /> is used to do this while preserving their filenames (-[n], is added if there is a name clash).

<property name="working"
          value="${ps.config.default.working}/download" />
<property name="download"
          value="${working}/download" />
<property name="process"
          value="${working}/process" />
<property name="split"
          value="${working}/split" />
<property name="temp-split"
          value="${working}/temp-split" />
...

<ps:process src="${download}"
            dest="${process}"
            processed="false"
            convertmarkdown="false"
            convertasciimath="false">
  <xrefs types="embed,transclude">
    <include name="${ps.config.default.uri.filename.no.ext}.psml"/>
  </xrefs>
  <!-- uncomment this to preserve publication heading levels
  <publication config="publication-config.xml"
               rootfile="${ps.config.default.uri.filename.no.ext}.psml"
               headingleveladjust="content" />
  -->
  <images src="filename"
          location="${process}/images" />
</ps:process>
 
<ps:split
     src="${process}/${ps.config.default.uri.filename.no.ext}.psml"
     dest="${split}"
     working="${temp-split}"
     config="psml-split-config.xml" />

To preserve transcludes, use <xrefs types="embed"> in the process task. This preserves the <blockxref type="transclude" ... > elements instead of replacing them with their content. However transcludes directly under <xref-fragment> must be copied inside a <fragment> element to preserve them by adding a <posttransform> on the process task with XSLT like:

<xsl:template match="blockxref[@type='transclude' and
                     parent::xref-fragment]">
  <xsl:copy>
    <fragment id="preserve">
      <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:value-of select="@urititle" />
      </xsl:copy>
    </fragment>
  </xsl:copy>
</xsl:template>
Created on , last edited on