Task split
This task was introduced in PageSeeder version 5.9703.
Split a single PSML Universal Processed Format file into separate documents and fragments according to the rules defined in a psml-split-config file (see PSML split config usage).
- Only the content of
<fragment>
,<properties-fragment>
and<media-fragment>
elements is preserved. Only the<metadata>
element on the root document is preserved. The elements<section>/<title>
,<anchor>
,<xref-fragment>
and their content (except for embedded/transcluded content) is removed. - Any attributes on
<fragment>
elements are removed and replaced by those defined in the split config file.
If there are internal xrefs in the PSML file it must have been produced by the Task process with at least the <xrefs types="" />
option to ensure that the xrefs resolve after the split.
Definition
Minimal definition:
<ps:split src="[source]" dest="[destination]" config="[config]" />
Full definition:
<ps:split src="[source]" dest="[destination]" config="[config]" mediafolder="[folder name]" working="[working]" />
Attributes
Attribute | Description | Required | Default |
---|---|---|---|
src |
The path of the file to split. For example:
| Yes | |
dest |
The destination folder or the main file path (if a folder, the src filename is used for the main file). For example:
| Yes | |
config |
The path of the split config file. For example:
| Yes | |
mediafolder | The name of the folder under src that contains images (all images must be in the same folder) | No | images |
working |
The path of the working folder for the split. For example:
| No |
Examples
Split when importing a DocX document as PSML
The following example is designed to run inside PageSeeder so uses ps-
properties.
The word-import-config.xml
must have:
- No
<main>
,<document>
or<section>
elements under<split>
so that it produces a single PSML file. These elements are now obsolete. - A
<references psmlelement="link" />
element under<styles><default>
so that references are preserved as links to anchors (because fragments are not split yet).
The splitting is then done by the split
task which is much faster and more flexible than the import-docx
task. The attribute working="${temp-split}"
is not required but can be useful when debugging the psml-split-config.xml
to see the PSML at different stages of the split.
<property name="download" value="${ps-working}/download" /> <property name="output" value="${ps-working}/psml" /> <property name="split" value="${ps-working}/split" /> <property name="temp" value="${ps-working}/temp" /> <property name="temp-split" value="${ps-working}/temp-split" /> ... <!-- replace spaces with underscores because import-docx does --> <script language="javascript"> project.setProperty("filename", project.getProperty("ps-uploadFilenameNoExt") .replaceAll(" ", "_").toLowerCase()); </script> <psd:import-docx src="${download}/${ps-uploadPath}" dest="${output}" working="${temp}" config="word-import-config.xml" /> <ps:split src="${output}/${filename}.psml" dest="${split}" working="${temp-split}" config="psml-split-config.xml" />
Re-split an existing publication in PageSeeder
The following example is designed to run inside PageSeeder so uses ps-
properties. The process
task collects all the exported publication files into a single PSML file. The split task requires all images in a single folder so the line <images src="filename" location="${process}/images" />
is used to do this while preserving their filenames (-[n]
, is added if there is a name clash).
<property name="working" value="${ps.config.default.working}/download" /> <property name="download" value="${working}/download" /> <property name="process" value="${working}/process" /> <property name="split" value="${working}/split" /> <property name="temp-split" value="${working}/temp-split" /> ... <ps:process src="${download}" dest="${process}" processed="false" convertmarkdown="false" convertasciimath="false"> <xrefs types="embed,transclude"> <include name="${ps.config.default.uri.filename.no.ext}.psml"/> </xrefs> <!-- uncomment this to preserve publication heading levels <publication config="publication-config.xml" rootfile="${ps.config.default.uri.filename.no.ext}.psml" headingleveladjust="content" /> --> <images src="filename" location="${process}/images" /> </ps:process> <ps:split src="${process}/${ps.config.default.uri.filename.no.ext}.psml" dest="${split}" working="${temp-split}" config="psml-split-config.xml" />
To preserve transcludes, use <xrefs types="embed">
in the process
task. This preserves the <blockxref type="transclude" ... >
elements instead of replacing them with their content. However transcludes directly under <xref-fragment>
must be copied inside a <fragment>
element to preserve them by adding a <posttransform>
on the process task with XSLT like:
<xsl:template match="blockxref[@type='transclude' and parent::xref-fragment]"> <xsl:copy> <fragment id="preserve"> <xsl:copy> <xsl:copy-of select="@*" /> <xsl:value-of select="@urititle" /> </xsl:copy> </fragment> </xsl:copy> </xsl:template>