Task process
Used to produce Universal Processed Format from Universal Portable Format, this task can be used to do the following, in order:
- Create a PSML document from the
manifest.xml
. - Pre-process a custom transformation.
- Process cross-references.
- Process paragraph and list numbering and table of contents.
- Process image references.
- Process metadata in or out.
- Post-process a custom transformation.
When processing documents that have cross-references (xrefs) with a @type
of embed
or transclude
, the same fragment can occur multiple times in a single document. This is a legitimate use of these xref types. However, it is important to remember that for fragments, which had previously existed in only one place but now occur in more than one, the address becomes ambiguous. Where these fragments are a destination for other xrefs, ambiguity prevents them from resolving.
To address this, the process examines sub-hierarchies for a location that contains both the xref and its target. If a sub-hierarchy can be processed without ambiguity on its own, its internal xrefs won’t change when it becomes a component of a larger hierarchy.
Also, if the same content has been transcluded and embedded, the embedded content is used as the xref target in preference to the transcluded content. If the same content is embedded multiple times within in the sub-hierarchy, then the first one is used.
Definition
Minimal definition:
<ps:process src="[source]" dest="[destination]"/>
Full definition:
<ps:process src="[source]" dest="[destination]" embedlinkmetadata="[true|false]" stripmetadata="[true|false]" preservesrc="[true|false]" failonerror="[true|false]" placeholders="[true|false]" processed="[true|false]" convertmarkdown="[true|false]" convertasciimath="[true|false]" converttex="[true|false]" config="[config name]"> <manifestdoc filename="[filename]"> <include name="[name]" /> <exclude name="[name]" /> </manifestdoc> <pretransform xslt="[pre XSLT path]"> <include name="[name]" /> <exclude name="[name]" /> <param name="[x]" expression="[y]" /> ... </pretransform> <xrefs types="[xref types]" xreffragment="[include,exclude,only]"> <include name="[name]" /> <exclude name="[name]" /> </xrefs> <publication config="[publication config path]" rootfile="[path relative to src]" generatetoc="[true|false]" headingleveladjust="[numbering|content]" /> <images src="[uriid|uriidfolders| permalink|filename|filenameencode|location]" location="[folder path]" embedmetadata="[true|false]"> <include name="[name]" /> <exclude name="[name]" /> </images> <strip manifest="[true|false]" documentinfo="[all,docid,title,description, labels,publication,versions]" fragmentinfo="[all,labels]" xrefs="[all,docid,uriid,notfound,unresolved,reversexrefs]" images="uriid" /> <posttransform xslt="[post XSLT path]"> <include name="[name]" /> <exclude name="[name]" /> <param name="[x]" expression="[y]" /> ... </posttransform> <error xrefnotfound="true" xrefambiguous="true" imagenotfound="true" /> </ps:process>
Attributes
Attribute | Description | Required | Default |
---|---|---|---|
src |
The source folder on the file system of the universal portable format input. For example:
It must not be a subfolder of the exported universal portable format files as there my be references outside the subfolder. | Yes | |
dest |
The destination folder on the file system for the universal processed format output. For example:
| Yes | |
embedlinkmetadata | If true , embed the <document level="metadata">... for the target URL in the <link> element. URLs must have been exported using xrefdepth=1 or allurls=true . Requires PageSeeder v5.99 or higher. | No | false |
stripmetadata | If true , strip manifest.xml and all <documentinfo> and <fragmentinfo> elements | No | false |
generatetoc |
If This attribute is obsolete as of PageSeeder 5.9600. Use | No | false |
preservesrc | If true , keep the original source files, if false , remove them | No | false |
failonerror | If true , stop the build process on error | No | true |
placeholders | If true , then resolve <placeholder> elements. | No | true |
processed | If true , the value of <document> /@level is changed to processed , plus blockxref/@href , xref/@href and image/@src are URL decoded. | No | true |
convertmarkdown | If true , convert the syntax of the content where <property datatype="markdown"> from markdown to valid PageSeeder markup language (XML) | No | true |
convertasciimath | If true , convert <inline label="asciimath"> or <media-fragment mediatype="text/asciimath"> to MathML modifying it to @mediatype="application/mathml+xml" and replacing the <inline> element with <xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml"> | No | true |
converttex | If true , convert <inline label="tex"> , <media-fragment mediatype="application/x-tex"> or <xref type="math" mediatype="application/x-tex"> to MathML adding/modifying to <media-fragment mediatype="application/mathml+xml"> and replacing the <inline> element with <xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml"> . Requires PageSeeder v6 or higher. | No | true |
config | Universal PS config name | No | default |
Elements
The @includes
or @excludes
attributes are obsolete and replaced by the nested <include name="x" />
or <exclude name="x" />
elements described in the following sections to avoid problems with commas in filenames.
Element <include>
A pattern matching documents/folders to include. If not present, then all documents/folders are included. It can not be used directly under the <ps:process>
element.
Attribute | Description | Required |
---|---|---|
name |
The pattern with format is similar to the file selection in other Ant tasks. Examples:
????/* | Yes |
Element <exclude>
A pattern matching documents/folders to exclude. If not present, then no documents/folders are excluded. It can not be used directly under the <ps:process>
element.
Attribute | Description | Required |
---|---|---|
name |
The pattern with format is similar to the file selection in other Ant tasks. Examples:
| Yes |
Element <manifestdoc>
Creates a PSML document of type manifest
containing a <blockxref>
of type embed
for each PSML file in the process src
folder, in alphabetical order. The generated file is included in PSML files for subsequent processing.
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in the previous sections.
Attribute | Description | Required |
---|---|---|
filename |
Filename for generated PSML document, for example:
| Yes |
Element <pretransform>
Provides a mechanism to pre process PSML files with XSLT. The generated XML must be valid PSML.
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in the previous sections.
This element might also contain nested <param name="x" expression="y" />
elements for passing XSLT parameters.
Attribute | Description | Required |
---|---|---|
xslt | Path to XSLT file to apply to all PSML files before all other processing (must produce valid PSML) | Yes |
Files under the META-INF
folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" />
inside this element.
Element <xrefs>
Processes the cross-references in PSML files which does the following:
- Replaces the
<xref>
or<blockxref>
element’s content with the PSML of the target document or fragment, provided the element has the following attributes:@frag
,@href
,@type
,@uriid
. - Prefixes fragment
@id
with[URI ID][_n]-
so that all fragment IDs are unique. - For any xref of type
none
that points to content now included in the document (by item 1) the@href
is changed to the@id
of the target document or fragment (for example#456
,#456-3
,#456_2-3
).
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in previous sections.
Processing, URL decodes the PSML attributes blockxref/@href
, xref/@href
and image/@src
by default. Set processed="false"
on <process>
to URL encode them, for example if uploading the result to PageSeeder.
Attribute | Description | Required | Default |
---|---|---|---|
types |
Comma-separated list of xref types to transclude in the included files. For example:
| Yes | |
levels |
Whether to modify heading levels in target documents based on the xref This attribute is obsolete as of PageSeeder 5.9600. If you need it to be | No | true |
xreffragment |
Defines how to handle xrefs in an
| No | include |
Element <number>
This element is obsolete as of PageSeeder 5.9600. Use <publication>
instead.
Generates auto-numbering in PSML files.
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in previous sections.
Attribute | Description | Required |
---|---|---|
numberconfig | Path to numbering config file for numbering included files | Yes |
Element <publication>
Generates table of contents and auto-numbering in PSML files. Available in PageSeeder v5.96 or higher.
Attribute | Description | Required | Default |
---|---|---|---|
config | Path to publication config file | Yes | |
rootfile | Path to root file of publication relative to src | Yes | |
generatetoc | If true , generate table of contents for the top <toc/> element. For details see Universal Processed Format | No | false |
headingleveladjust | Overrides the levels/@heading-adjust attribute in the publication config file. Allowed values are numbering or content | No |
Element <images>
This element provides options to control how images are managed and rewrites the path accordingly.
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in previous sections.
Attribute | Description | Required |
---|---|---|
src | Format of @src attribute on <image> elements. Allowed values are:
| No |
location | Move all <image> files to this folder path. If @src is uriid , uriidfolders or permalink rename them to [uriid].[ext] , if filename put them in a single folder, otherwise use their path relative to src. | Yes, if src=uriid |
embedmetadata | If true , embed the <document level="metadata">... in the <image> element and apply src/location image processing to the @href of an <xref> where metadata has type="alternate" and mediatype="image/*" | No |
Element <strip>
Provides finer control over the elements to strip. Must not be used in conjunction with the @stripmetadata
attribute.
Attribute | Description | Required | Default |
---|---|---|---|
documentinfo |
Comma-separated list of items to strip. For example:
| No | none |
fragmentinfo |
Comma-separated list of items to strip. For example:
value of | No | none |
manifest | If true , deletes the META-INF/manifest.xml file | No | false |
xrefs |
Comma-separated list of items to strip. For example:
values of | No | none |
images | If uriid strips @uriid from <image> . Requires PageSeeder v5.9900 or higher. | No |
Element <posttransform>
Provides a mechanism to post process PSML source files with XSLT. The generated XML must be valid PSML.
This element might contain multiple nested <include name="" />
or <exclude name="" />
elements as defined in previous sections.
This element might also contain nested <param name="x" expression="y" />
elements for passing XSLT parameters.
Attribute | Description | Required |
---|---|---|
xslt | Path to XSLT file to apply to all PSML files after all other processing (must produce valid PSML) | Yes |
Files under the META-INF
folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" />
inside this element.
Element <error>
The element is used to customize error handling.
Attribute | Description | Required | Default |
---|---|---|---|
xrefnotfound | If true , raise an error if an xref target file is not in the export set | No | false |
xrefambiguous | If true , raise an error if an xref target is ambiguous (see note at the top of this document) | No | false |
imagenotfound | If true , raise an error if a referenced image is not in the export set | No | false |
Environment
This task uses the following PS config environment properties:
- site.prefix – site prefix for the
<image src="permalink">
option.
default – /ps
Errors
Possible errors are:
- Required attribute or environment property missing.
- Attribute or property invalid.
- Required metadata missing.
- Pre/post transform XSLT error.
- Pre/post transform validation error.
- Internal link pointing to URI x fragment y is ambiguous because this content appears in multiple locations.
- Reference loop detected when resolving xref from x to y.
- Xref target not found (see
<error>
element). - Image not found (see
<error>
element).
Examples
Consolidate documents
Process the xrefs so that the linked documents are concatenated into a single file. Extract all images into a single folder using permalink notation.
<ps:process src="c:\working\export" dest="c:\working\process" stripmetadata="true"> <xrefs types="embed,transclude"> <include name="spec.psml" /> </xrefs> <images src="uriid" location="c:\working\images" /> </ps:process>
Remove Doc IDs and URI IDs
<ps:process src="c:\working\export" dest="c:\working\process"> <strip documentinfo="docid" xrefs="docid,uriid"> <error imagenotfound="true" xrefnotfound="true" /> </ps:process>
Concatenate all downloaded PSML documents
<ps:process src="c:\working\export" dest="c:\working\process" stripmetadata="true"> <manifestdoc filename="manifest"> <exclude name="META-INF/**" /> </manifest> <xrefs types="embed"> <include name="manifest.psml" /> </xrefs> </ps:process>
Copy documents to another group
To copy PSML documents to another PageSeeder group they must be exported, processed and then uploaded. To process PSML so that the copies stay independent from one another on the same server (i.e. they don’t reference each other) use the following options:
<ps:process src="${download}" dest="${process}" placeholders="false" processed="false" convertmarkdown="false" convertasciimath="false" converttex="false"> <strip documentinfo="docid,publication" xrefs="docid,uriid" images="uriid" manifest="true" /> </ps:process>
PageSeeder doesn't allow the same publicationid
on different sets of documents in the same PageSeeder server domain. However, if you need publications to be preserved in the other group you could remove the documentinfo="publication"
and add a <posttransform>
with an XSLT that adds a suffix to the publication/@id
when copying.
Use alternate images
This makes it possible to have images for editing and use for web delivery at the same time as having higher resolution images for publishing to paper. The idea is that when:
<image src="">..<metadata>..<xref type="alternate" href=""></image>
replace
image/@src
with
xref/@href
Process task
<ps:process src="c:\working\export" dest="c:\working\process" generatetoc="true"> <xrefs types="embed,transclude"> <include name="spec.psml" /> </xrefs> <images embedmetadata="true"/> <posttransform xslt="c:\working\alternate-images.xsl"> <include name="spec.psml" /> </posttransform> </ps:process>
alternate-images.xsl
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- change image src to alternate image --> <xsl:template match="image[.//xref/@type='alternate']"> <image src="{.//xref[@type='alternate']/@href}"> <xsl:copy-of select="@*[not(name()='src')]"/> </image> </xsl:template> <!-- copy all other elements unchanged --> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Image ‘shell’ documents
A shell document contains both a lo-res and hi-res image. For editing, the link typically refers to the lo-res with the hi-res used to publish to paper or PDF. Although the approach of alternate images (see previous example) is recommended generally, there are legitimate use cases for the shell approach.
Essentially the idea is to replace the references to:
<fragment id="lores"><image .../>
with
<fragment id="hires"><image .../>
Process task
<ps:process src="c:\working\export" dest="c:\working\process" stripmetadata="true"> <xrefs types="embed,transclude"> <include name="spec.psml" /> </xrefs> <number config="c:\working\defaultNumberingConfig.xml" > <include name="spec.psml" /> </number> <pretransform xslt="c:\working\hires-images.xsl" /> </ps:process>
hires-images.xsl
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- change blockxref to point to hires image --> <xsl:template match="blockxref[@fragment='lores']"> <blockxref fragment="hires"> <xsl:copy-of select="@*[not(name()='fragment')]"/> <xsl:apply-templates /> </blockxref> </xsl:template> <!-- copy all other elements unchanged --> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet>