Task process

Used to produce Universal Processed Format from Universal Portable Format, this task can be used to do the following, in order:

Create a PSML document from the manifest.xml.
Pre-process a custom transformation.
Process cross-references.
Process paragraph and list numbering and table of contents.
Process image references.
Process metadata in or out.
Post-process a custom transformation.

When processing documents that have cross-references (xrefs) with a @type of embed or transclude, the same fragment can occur multiple times in a single document. This is a legitimate use of these xref types. However, it is important to remember that for fragments, which had previously existed in only one place but now occur in more than one, the address becomes ambiguous. Where these fragments are a destination for other xrefs, ambiguity prevents them from resolving.

To address this, the process examines sub-hierarchies for a location that contains both the xref and its target. If a sub-hierarchy can be processed without ambiguity on its own, its internal xrefs won’t change when it becomes a component of a larger hierarchy.

Also, if the same content has been transcluded and embedded, the embedded content is used as the xref target in preference to the transcluded content. If the same content is embedded multiple times within in the sub-hierarchy, then the first one is used.

Definition

Minimal definition:

<ps:process src="[source]"
            dest="[destination]"/>

Full definition:

<ps:process src="[source]"
            dest="[destination]"
            embedlinkmetadata="[true|false]"
            stripmetadata="[true|false]"
            preservesrc="[true|false]"
            failonerror="[true|false]"
            placeholders="[true|false]"
            processed="[true|false]"
            convertmarkdown="[true|false]"
            convertasciimath="[true|false]"
            converttex="[true|false]"
            config="[config name]">

  <manifestdoc filename="[filename]">
    <include name="[name]" />
    <exclude name="[name]" />
  </manifestdoc>

  <pretransform xslt="[pre XSLT path]">
    <include name="[name]" />
    <exclude name="[name]" />
    <param name="[x]"
           expression="[y]" />
    ...
  </pretransform> 

  <xrefs types="[xref types]"
         xreffragment="[include,exclude,only]">
    <include name="[name]" />
    <exclude name="[name]" />
  </xrefs>

  <publication config="[publication config path]"
               rootfile="[path relative to src]"
               generatetoc="[true|false]"
               headingleveladjust="[numbering|content]" />

  <images src="[uriid|uriidfolders|
          permalink|filename|filenameencode|location]"
          location="[folder path]"
          embedmetadata="[true|false]">
    <include name="[name]" />
    <exclude name="[name]" />
  </images>

  <strip manifest="[true|false]"
         documentinfo="[all,docid,title,description,
         labels,publication,versions]"
         fragmentinfo="[all,labels]"
         xrefs="[all,docid,uriid,notfound,unresolved,reversexrefs]"
         images="uriid" />
 
  <posttransform xslt="[post XSLT path]">
    <include name="[name]" />
    <exclude name="[name]" />
    <param name="[x]"
           expression="[y]" />
    ...
  </posttransform>

  <error imagenotfound="[true|false]"
         xrefambiguous="[true|false]"
         xrefnotfound="[true|false]" />

  <warning imagenotfound="[true|false]"
           xrefambiguous="[true|false]"
           xrefnotfound="[true|false]" />

</ps:process>

Attributes

Attribute	Description	Required	Default
src	The source folder on the file system of the universal portable format input. For example: `c:\working\portable` It must not be a subfolder of the exported universal portable format files as there my be references outside the subfolder.	Yes
dest	The destination folder on the file system for the universal processed format output. For example: `c:\working\process`	Yes
embedlinkmetadata	If `true`, embed the `<document level="metadata">...` for the target URL in the `<link>` element. URLs must have been exported using `xrefdepth=1` or `allurls=true`. Requires PageSeeder v5.99 or higher.	No	`false`
stripmetadata	If `true`, strip `manifest.xml` and all `<documentinfo>` and `<fragmentinfo>` elements	No	`false`
generatetoc	If `true`, generate table of contents for every `<toc/>` element This attribute is obsolete as of PageSeeder 5.9600. Use `<publication>` instead.	No	`false`
preservesrc	If `true`, keep the original source files, if `false`, remove them	No	`false`
failonerror	If `true`, stop the build process on error	No	`true`
placeholders	If `true`, then resolve `<placeholder>` elements.	No	`true`
processed	If `true`, the value of `<document>`/`@level` is changed to `processed`, plus `blockxref/@href`, `xref/@href` and `image/@src` are URL decoded.	No	`true`
convertmarkdown	If `true`, convert the syntax of the content where `<property datatype="markdown">` from markdown to valid PageSeeder markup language (XML)	No	`true`
convertasciimath	If `true`, convert `<inline label="asciimath">` or `<media-fragment mediatype="text/asciimath">`to MathML modifying it to `@mediatype="application/mathml+xml"` and replacing the `<inline>` element with `<xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml">`	No	`true`
converttex	If `true`, convert `<inline label="tex">`, `<media-fragment mediatype="application/x-tex">`or `<xref type="math" mediatype="application/x-tex">` to MathML adding/modifying to `<media-fragment mediatype="application/mathml+xml">` and replacing the `<inline>` element with `<xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml">`. Requires PageSeeder v6 or higher.	No	`true`
config	Universal PS config name	No	`default`

Elements

The @includes or @excludes attributes are obsolete and replaced by the nested <include name="x" /> or <exclude name="x" /> elements described in the following sections to avoid problems with commas in filenames.

Element <include>

A pattern matching documents/folders to include. If not present, then all documents/folders are included. It can not be used directly under the <ps:process> element.

Attribute Description Required

Attribute	Description	Required
name	The pattern with format is similar to the file selection in other Ant tasks. Examples: `.psml` `archive` `folder1/.psml` `*/.psml` ????/*	Yes

name

The pattern with format is similar to the file selection in other Ant tasks. Examples:

*.psml

archive

folder1/*.psml

**/*.psml

????/*

Yes

Element <exclude>

A pattern matching documents/folders to exclude. If not present, then no documents/folders are excluded. It can not be used directly under the <ps:process> element.

Attribute Description Required

Attribute	Description	Required
name	The pattern with format is similar to the file selection in other Ant tasks. Examples: `_local/` `_external/` `META-INF/**`	Yes

name

The pattern with format is similar to the file selection in other Ant tasks. Examples:

_local/**

_external/**

META-INF/**

Yes

Element <manifestdoc>

Creates a PSML document of type manifest containing a <blockxref> of type embed for each PSML file in the process src folder, in alphabetical order. The generated file is included in PSML files for subsequent processing.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in the previous sections.

Attribute Description Required

Attribute	Description	Required
filename	Filename for generated PSML document, for example: `manifest`	Yes

filename

Filename for generated PSML document, for example:

manifest

Yes

Element <pretransform>

Provides a mechanism to pre process PSML files with XSLT. The generated XML must be valid PSML.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in the previous sections.

This element might also contain nested <param name="x" expression="y" /> elements for passing XSLT parameters.

Attribute	Description	Required
xslt	Path to XSLT file to apply to all PSML files before all other processing (must produce valid PSML)	Yes

Files under the META-INF folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" /> inside this element.

Element <xrefs>

Processes the cross-references in PSML files which does the following:

Replaces the <xref> or <blockxref> element’s content with the PSML of the target document or fragment, provided the element has the following attributes: @frag, @href, @type, @uriid.
Prefixes fragment @id with [URI ID][_n]- so that all fragment IDs are unique.
For any xref of type none that points to content now included in the document (by item 1) the @href is changed to the @id of the target document or fragment (for example #456, #456-3, #456_2-3).

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

Processing, URL decodes the PSML attributes blockxref/@href, xref/@href and image/@src by default. Set processed="false" on <process> to URL encode them, for example if uploading the result to PageSeeder.

Attribute Description Required Default

Attribute	Description	Required	Default
types	Comma-separated list of xref types to transclude in the included files. For example: `transclude,embed,alternate`	Yes
levels	Whether to modify heading levels in target documents based on the xref `@level` attribute. This attribute is obsolete as of PageSeeder 5.9600. If you need it to be `false` set `headingleveladjust="numbering"` instead. See `<publication>` below.	No	`true`
xreffragment	Defines how to handle xrefs in an `<xref-fragment>` element. Possible values are: `include` – process xrefs in an xref-fragment. `exclude` – do not process xrefs in an xref-fragment, and `only` – process xrefs only in an xref-fragment.	No	`include`

types

Comma-separated list of xref types to transclude in the included files. For example:

transclude,embed,alternate

Yes

levels

Whether to modify heading levels in target documents based on the xref @level attribute.

This attribute is obsolete as of PageSeeder 5.9600. If you need it to be false set headingleveladjust="numbering" instead. See <publication> below.

true

xreffragment

Defines how to handle xrefs in an <xref-fragment> element.
Possible values are:

include – process xrefs in an xref-fragment.
exclude – do not process xrefs in an xref-fragment, and
only – process xrefs only in an xref-fragment.

include

Element <number>

This element is obsolete as of PageSeeder 5.9600. Use <publication> instead.

Generates auto-numbering in PSML files.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

Attribute	Description	Required
numberconfig	Path to numbering config file for numbering included files	Yes

Element <publication>

Generates table of contents and auto-numbering in PSML files. Available in PageSeeder v5.96 or higher.

Attribute	Description	Required	Default
config	Path to publication config file	Yes
rootfile	Path to root file of publication relative to `src`	Yes
generatetoc	If `true`, generate table of contents for the top `<toc/>` element. For details see Universal Processed Format	No	`false`
headingleveladjust	Overrides the `levels/@heading-adjust` attribute in the publication config file. Allowed values are `numbering` or `content`	No

Element <images>

This element provides options to control how images are managed and rewrites the path accordingly.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

Attribute	Description	Required
src	Format of `@src` attribute on `<image>` elements. Allowed values are: `uriid –` use the `[uriid].[ext]` format. `uriidfolders` – use `[uriid billions]/[uriid millions]/[uriid thousands]/[uriid].[ext]` with leading zeros on folders. For example, uriid 12345 would be `000/000/012/12345.png.` `permalink` – use the `[ps.site.prefix]/uri/[uriid].[ext]` format.. `filename` – use `[filename][-N].[ext]` format, where `-N` is added for uniqueness if required (pso-psml `v0.5.0` or higher). `filenameencode` – same as `filename` but URL encodes characters - useful for DOCX which doesn't allow spaces (pso-psml `v0.7.0-beta-5` or higher). `location` – is the default. The `@src` attribute is not changed.	No
location	Move all `<image>` files to this folder path. If `@src` is `uriid`, `uriidfolders` or `permalink` rename them to `[uriid].[ext]`, if `filename` put them in a single folder, otherwise use their path relative to src.	Yes, if `src=uriid`
embedmetadata	If `true`, embed the `<document level="metadata">...` in the `<image>` element and apply src/location image processing to the `@href` of an `<xref>` where metadata has `type="alternate"` and `mediatype="image/*"`	No

Element <strip>

Provides finer control over the elements to strip. Must not be used in conjunction with the @stripmetadata attribute.

Attribute	Description	Required	Default
documentinfo	Comma-separated list of items to strip. Allowed values: `all,docid,title,description,labels, publication,versions` `all` strips `<documentinfo>` elements. The value `publication` requires PageSeeder v5.9900 or higher and `versions` requires v6.1 or higher.	No
fragmentinfo	Comma-separated list of items to strip. Allowed values: `all,labels` value of `all` strips `<fragmentinfo>` elements	No
manifest	If `true`, deletes the `META-INF/manifest.xml` file	No	`false`
xrefs	Comma-separated list of items to strip. Allowed values: `all,docid,uriid,notfound, unresolved,reversexrefs` values of `all`, `notfound`, `unresolved` strip `<xref>` markup but leaves the content. The value `reversexrefs` requires PageSeeder v5.9708 or higher.	No
images	Comma-separated list of attributes to strip from `<image>`. Allowed values: `docid,uriid` Requires PageSeeder v5.9900 or higher and `docid` requires v6.2000 or higher.	No

Element <posttransform>

Provides a mechanism to post process PSML source files with XSLT. The generated XML must be valid PSML.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

This element might also contain nested <param name="x" expression="y" /> elements for passing XSLT parameters.

Attribute	Description	Required
xslt	Path to XSLT file to apply to all PSML files after all other processing (must produce valid PSML)	Yes

Files under the META-INF folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" /> inside this element.

Element <error>

The element is used to customize error handling.

Attribute	Description	Required	Default
imagenotfound	If `true`, raise an error if a referenced image is not in the export set	No	`false`
xrefambiguous	If `true`, raise an error if an xref target is ambiguous (see note at the top of this document)	No	`false`
xrefnotfound	If `true`, raise an error if an xref target file is not in the export set	No	`false`

Element <warning>

The element is used to customize warning handling. Requires PageSeeder v6.2 or higher.

Attribute	Description	Required	Default
imagenotfound	If `true`, log a warning if a referenced image is not in the export set	No	`true`
xrefambiguous	If `true`, log a warning if an xref target is ambiguous (see note at the top of this document)	No	`true`
xrefnotfound	If `true`, log a warning if an xref target file is not in the export set	No	`true`

Environment

This task uses the following PS config environment properties:

site.prefix – site prefix for the <image src="permalink"> option.

default – /ps

Errors

Possible errors are:

Required attribute or environment property missing.
Attribute or property invalid.
Required metadata missing.
Pre/post transform XSLT error.
Pre/post transform validation error.
Internal link pointing to URI x fragment y is ambiguous because this content appears in multiple locations.
Reference loop detected when resolving xref from x to y.
Xref target not found (see <error> element).
Image not found (see <error> element).

Examples

Consolidate documents

Process the xrefs so that the linked documents are concatenated into a single file. Extract all images into a single folder using permalink notation.

<ps:process src="c:\working\export"
            dest="c:\working\process"
            stripmetadata="true">

  <xrefs types="embed,transclude">   
    <include name="spec.psml" />
  </xrefs>

  <images src="uriid"
          location="c:\working\images" />

 </ps:process>

Remove Doc IDs and URI IDs

<ps:process src="c:\working\export"
            dest="c:\working\process">

  <strip documentinfo="docid"
         xrefs="docid,uriid">

    <error imagenotfound="true"
           xrefnotfound="true" />
 
</ps:process>

Concatenate all downloaded PSML documents

<ps:process src="c:\working\export"
            dest="c:\working\process"
            stripmetadata="true">

  <manifestdoc filename="manifest">
    <exclude name="META-INF/**" />
  </manifest>

  <xrefs types="embed">
    <include name="manifest.psml" />
  </xrefs>

</ps:process>

Copy documents to another group

To copy PSML documents to another PageSeeder group they must be exported, processed and then uploaded. To process PSML so that the copies stay independent from one another on the same server (i.e. they don’t reference each other) use the following options:

<ps:process src="${download}"
            dest="${process}"
            placeholders="false"
            processed="false"
            convertmarkdown="false"
            convertasciimath="false"
            converttex="false">
  <strip documentinfo="docid,publication"
         xrefs="docid,uriid"
         images="docid,uriid"
         manifest="true" />
</ps:process>

PageSeeder doesn't allow the same publicationid on different sets of documents in the same PageSeeder server domain. However, if you need publications to be preserved in the other group you could remove the documentinfo="publication" and add a <posttransform> with an XSLT that adds a suffix to the publication/@id when copying.

For a more comprehensive example that allows creation of variations see the sample bundle Copy publication which can be viewed under the project admin Template files > Create button.

Use alternate images

This makes it possible to have images for editing and use for web delivery at the same time as having higher resolution images for publishing to paper. The idea is that when:

<image src="">..<metadata>..<xref type="alternate" href=""></image>

replace

image/@src

with

xref/@href

Process task

<ps:process src="c:\working\export"
            dest="c:\working\process"
            generatetoc="true">

  <xrefs types="embed,transclude">
    <include name="spec.psml" />
  </xrefs>

  <images embedmetadata="true"/>

  <posttransform xslt="c:\working\alternate-images.xsl">
    <include name="spec.psml" />
  </posttransform>

</ps:process>

alternate-images.xsl

<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- change image src to alternate image -->
  <xsl:template match="image[.//xref/@type='alternate']">
    <image src="{.//xref[@type='alternate']/@href}">
      <xsl:copy-of select="@*[not(name()='src')]"/>
    </image>
  </xsl:template>

  <!-- copy all other elements unchanged -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Image ‘shell’ documents

A shell document contains both a lo-res and hi-res image. For editing, the link typically refers to the lo-res with the hi-res used to publish to paper or PDF. Although the approach of alternate images (see previous example) is recommended generally, there are legitimate use cases for the shell approach.

Essentially the idea is to replace the references to:

<fragment id="lores"><image .../>

with

<fragment id="hires"><image .../>

Process task

<ps:process
    src="c:\working\export"
    dest="c:\working\process"
    stripmetadata="true">
  <xrefs
      types="embed,transclude">
    <include name="spec.psml" />
  </xrefs>
  <number
      config="c:\working\defaultNumberingConfig.xml" >
    <include name="spec.psml" />
  </number>
  <pretransform xslt="c:\working\hires-images.xsl" />
</ps:process>

hires-images.xsl

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- change blockxref to point to hires image -->
  <xsl:template match="blockxref[@fragment='lores']">
    <blockxref fragment="hires">
      <xsl:copy-of select="@*[not(name()='fragment')]"/>
      <xsl:apply-templates />
    </blockxref>
  </xsl:template>

  <!-- copy all other elements unchanged -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Created on 25 October 2012, last edited on 24 June 2025 at 14:57

Publishing

Task process

Definition

Attributes

Elements

Element <include>

Element <exclude>

Element <manifestdoc>

Element <pretransform>

Element <xrefs>

Element <number>

Element <publication>

Element <images>

Element <strip>

Element <posttransform>

Element <error>

Element <warning>

Environment

Errors

Examples

Consolidate documents

Remove Doc IDs and URI IDs

Concatenate all downloaded PSML documents

Copy documents to another group

Use alternate images

Image ‘shell’ documents