Skip to main content

 Publishing

Publishing PageSeeder data to print, the Web or both

Task process

Used to produce Universal Processed Format from Universal Portable Format, this task can be used to do the following, in order:

  1. Create a PSML document from the manifest.xml.
  2. Pre-process a custom transformation.
  3. Process cross-references.
  4. Process paragraph and list numbering and table of contents.
  5. Process image references.
  6. Process metadata in or out.
  7. Post-process a custom transformation.

When processing documents that have cross-references (xrefs) with a @type of embed or transclude, the same fragment can occur multiple times in a single document. This is a legitimate use of these xref types. However, it is important to remember that for fragments, which had previously existed in only one place but now occur in more than one, the address becomes ambiguous. Where these fragments are a destination for other xrefs, ambiguity prevents them from resolving.

To address this, the process examines sub-hierarchies for a location that contains both the xref and its target. If a sub-hierarchy can be processed without ambiguity on its own, its internal xrefs won’t change when it becomes a component of a larger hierarchy.

Also, if the same content has been transcluded and embedded, the embedded content is used as the xref target in preference to the transcluded content. If the same content is embedded multiple times within in the sub-hierarchy, then the first one is used.

Definition

Minimal definition:

<ps:process src="[source]"
            dest="[destination]"/>

 Full definition:

<ps:process src="[source]"
            dest="[destination]"
            embedlinkmetadata="[true|false]"
            stripmetadata="[true|false]"
            preservesrc="[true|false]"
            failonerror="[true|false]"
            placeholders="[true|false]"
            processed="[true|false]"
            convertmarkdown="[true|false]"
            convertasciimath="[true|false]"
            converttex="[true|false]"
            config="[config name]">

  <manifestdoc filename="[filename]">
    <include name="[name]" />
    <exclude name="[name]" />
  </manifestdoc>

  <pretransform xslt="[pre XSLT path]">
    <include name="[name]" />
    <exclude name="[name]" />
    <param name="[x]"
           expression="[y]" />
    ...
  </pretransform> 

  <xrefs types="[xref types]"
         xreffragment="[include,exclude,only]">
    <include name="[name]" />
    <exclude name="[name]" />
  </xrefs>

  <publication config="[publication config path]"
               rootfile="[path relative to src]"
               generatetoc="[true|false]"
               headingleveladjust="[numbering|content]" />

  <images src="[uriid|uriidfolders|
          permalink|filename|filenameencode|location]"
          location="[folder path]"
          embedmetadata="[true|false]">
    <include name="[name]" />
    <exclude name="[name]" />
  </images>

  <strip manifest="[true|false]"
         documentinfo="[all,docid,title,description,
         labels,publication]"
         fragmentinfo="[all,labels]"
         xrefs="[all,docid,uriid,notfound,unresolved,reversexrefs]"
         images="uriid" />
 
  <posttransform xslt="[post XSLT path]">
    <include name="[name]" />
    <exclude name="[name]" />
    <param name="[x]"
           expression="[y]" />
    ...
  </posttransform>

  <error xrefnotfound="true"
         xrefambiguous="true"
         imagenotfound="true" />
</ps:process>

Attributes

AttributeDescriptionRequiredDefault
src

The source folder on the file system of the universal portable format input. For example:

c:\working\portable

It must not be a subfolder of the exported universal portable format files as there my be references outside the subfolder.

Yes
dest

The destination folder on the file system for the universal processed format output. For example:

c:\working\process

Yes
embedlinkmetadataIf true, embed the <document level="metadata">... for the target URL in the <link> element. URLs must have been exported using xrefdepth=1 or allurls=true. Requires PageSeeder v5.99 or higher.Nofalse
stripmetadataIf true, strip manifest.xml and all <documentinfo> and <fragmentinfo> elementsNofalse
generatetoc

If true, generate table of contents for every <toc/> element

This attribute is obsolete as of PageSeeder 5.9600. Use <publication> instead.

Nofalse
preservesrcIf true, keep the original source files, if false, remove themNofalse
failonerrorIf true, stop the build process on errorNotrue
placeholdersIf true, then resolve <placeholder> elements.Notrue
processedIf true, the value of <document>/@level is changed to processed, plus blockxref/@href, xref/@href and image/@src are URL decoded.Notrue
convertmarkdownIf true, convert the syntax of the content where <property datatype="markdown"> from markdown to valid PageSeeder markup language (XML)Notrue
convertasciimathIf true, convert <inline label="asciimath"> or <media-fragment mediatype="text/asciimath">to MathML modifying it to @mediatype="application/mathml+xml" and replacing the <inline> element with <xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml">Notrue
converttexIf true, convert <inline label="tex">, <media-fragment mediatype="application/x-tex">or <xref type="math" mediatype="application/x-tex"> to MathML adding/modifying to <media-fragment mediatype="application/mathml+xml"> and replacing the <inline> element with <xref frag="media" type="math"><media-fragment id="media" mediatype="application/mathml+xml">. Requires PageSeeder v6 or higher.Notrue
configUniversal PS config nameNodefault

Elements

The @includes or @excludes attributes are obsolete and replaced by the nested <include name="x" /> or  <exclude name="x" /> elements described in the following sections to avoid problems with commas in filenames.

Element <include>

A pattern matching documents/folders to include. If not present, then all documents/folders are included. It can not be used directly under the <ps:process> element.

AttributeDescriptionRequired
name

The pattern with format is similar to the file selection in other Ant tasks. Examples:

*.psml

archive

folder1/*.psml

**/*.psml

 ????/*

Yes

Element <exclude>

A pattern matching documents/folders to exclude. If not present, then no documents/folders are excluded. It can not be used directly under the <ps:process> element.

AttributeDescriptionRequired
name

The pattern with format is similar to the file selection in other Ant tasks. Examples:

_local/**

_external/**

Yes

Element <manifestdoc>

Creates a PSML document of type manifest from the manifest.xml containing a <blockxref> of type embed for each PSML file, in alphabetical order. The generated file is included in PSML files for subsequent processing.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in the previous sections.

AttributeDescriptionRequired
filename

Filename for generated PSML document, for example:

manifest

Yes

Element <pretransform>

Provides a mechanism to pre process PSML files with XSLT. The generated XML must be valid PSML.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in the previous sections.

This element might also contain nested <param name="x" expression="y" /> elements for passing XSLT parameters.

AttributeDescriptionRequired
xsltPath to XSLT file to apply to all PSML files before all other processing (must produce valid PSML)Yes

Files under the META-INF folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" /> inside this element.

Element <xrefs>

Processes the cross-references in PSML files which does the following:

  1. Replaces the <xref> or <blockxref> content with the PSML of the target document or fragment.
  2. Prefixes fragment @id with [URI ID][_n]- so that all fragment IDs are unique.
  3. For any xref of type none that points to content now included in the document (by item 1) the @href is changed to the @id of the target document or fragment (for example #456, #456-3, #456_2-3).

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

Processing, URL decodes the PSML attributes  blockxref/@href, xref/@href and image/@src by default. Set processed="false" on <process> to URL encode them, for example if uploading the result to PageSeeder.

AttributeDescriptionRequiredDefault
types

Comma-separated list of xref types to transclude in the included files. For example:

transclude,embed,alternate

Yes

levels

Whether to modify heading levels in target documents based on the xref @level attribute.

This attribute is obsolete as of PageSeeder 5.9600. Use publication config instead. See <publication> below.

Notrue
xreffragment

Defines how to handle xrefs in an <xref-fragment> element.
Possible values are:

  • include – process xrefs in an xref-fragment.
  • exclude – do not process xrefs in an xref-fragment, and
  • only – process xrefs only in an xref-fragment.
Noinclude

Element <number>

This element is obsolete as of PageSeeder 5.9600. Use <publication> instead.

Generates auto-numbering in PSML files.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

AttributeDescriptionRequired
numberconfigPath to numbering config file for numbering included filesYes

Element <publication>

Generates table of contents and auto-numbering in PSML files. Available in PageSeeder v5.96 or higher.

AttributeDescriptionRequiredDefault
configPath to publication config fileYes

rootfilePath to root file of publication relative to srcYes
generatetocIf true, generate table of contents for the top <toc/> element. For details see Universal Processed FormatNofalse
headingleveladjustOverrides the levels/@heading-adjust attribute in the publication config file. Allowed values are numbering or contentNo

Element <images>

This element provides options to control how images are managed and rewrites the path accordingly.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

AttributeDescriptionRequired
srcFormat of @src attribute on <image> elements. Allowed values are:
  • uriid – use the [uriid].[ext] format.
  • uriidfolders – use [uriid billions]/[uriid millions]/[uriid thousands]/[uriid].[ext] with leading zeros on folders. For example, uriid 12345 would be 000/000/012/12345.png.
  • permalink – use the [ps.site.prefix]/uri/[uriid].[ext] format..
  • filename – use [filename][-N].[ext] format, where -N is added for uniqueness if required (pso-psml v0.5.0 or higher).
  • filenameencode – same as filename but URL encodes characters - useful for DOCX which doesn't allow spaces (pso-psml v0.7.0-beta-5 or higher).
  • location – is the default. The @src attribute is not changed.
No
locationMove all <image> files to this folder path. If @src is uriid, uriidfolders or permalink rename them to [uriid].[ext], if filename put them in a single folder, otherwise use their path relative to src.Yes, if src=uriid
embedmetadataIf true, embed the <document level="metadata">... in the <image> element and apply src/location image processing to the @href of an <xref> where  metadata has type="alternate" and mediatype="image/*" No

Element <strip>

Provides finer control over the elements to strip. Must not be used in conjunction with the @stripmetadata attribute.

AttributeDescriptionRequiredDefault
documentinfo

Comma-separated list of items to strip. For example: 

all,docid,title,description,labels,publication

all strips <documentinfo> elements. The value publication requires PageSeeder v5.9900 or higher.

Nonone
fragmentinfo

Comma-separated list of items to strip. For example:

all,labels

value of all strips <fragmentinfo> elements

Nonone
manifestIf true, deletes the META-INF/manifest.xml fileNofalse
xrefs

Comma-separated list of items to strip. For example:

all,docid,uriid,notfound, unresolved,reversexrefs

values of all, notfound, unresolved strip <xref> markup but leaves the content. The value reversexrefs requires PageSeeder v5.9708 or higher.

Nonone
images
If uriid strips @uriid from <image>. Requires PageSeeder v5.9900 or higher.No

Element <posttransform>

Provides a mechanism to post process PSML source files with XSLT. The generated XML must be valid PSML.

This element might contain multiple nested <include name="" /> or <exclude name="" /> elements as defined in previous sections.

This element might also contain nested <param name="x" expression="y" /> elements for passing XSLT parameters.

AttributeDescriptionRequired
xsltPath to XSLT file to apply to all PSML files after all other processing (must produce valid PSML)Yes

Files under the META-INF folder are also transformed by default. To not transform these files add <exclude name="META-INF/**" /> inside this element.

Element <error>

The element is used to customize error handling.

AttributeDescriptionRequiredDefault
xrefnotfoundIf true, raise an error if an xref target file is not in the export setNofalse
xrefambiguousIf true, raise an error if an xref target is ambiguous (see note at the top of this document)Nofalse
imagenotfoundIf true, raise an error if a referenced image is not in the export setNofalse

Environment

This task uses the following PS config environment properties:

  • site.prefix – site prefix for the <image src="permalink"> option.

default –  /ps

Errors

Possible errors are:

  • Required attribute or environment property missing.
  • Attribute or property invalid.
  • Required metadata missing.
  • Pre/post transform XSLT error.
  • Pre/post transform validation error.
  • Internal link pointing to URI x fragment y is ambiguous because this content appears in multiple locations.
  • Reference loop detected when resolving xref from x to y.
  • Xref target not found (see <error> element).
  • Image not found (see <error> element).

Examples

Consolidate documents

Process the xrefs so that the linked documents are concatenated into a single file. Extract all images into a single folder using permalink notation.

<ps:process src="c:\working\export"
            dest="c:\working\process"
            stripmetadata="true">

  <xrefs types="embed,transclude">   
    <include name="spec.psml" />
  </xrefs>

  <images src="uriid"
          location="c:\working\images" />

 </ps:process>

Remove Doc IDs and URI IDs

<ps:process src="c:\working\export"
            dest="c:\working\process">

  <strip documentinfo="docid"
         xrefs="docid,uriid">

    <error imagenotfound="true"
           xrefnotfound="true" />
 
</ps:process>

Concatenate all downloaded files

<ps:process src="c:\working\export"
            dest="c:\working\process"
            stripmetadata="true">

  <manifestdoc filename="manifest"/>

  <xrefs types="embed">
    <include name="manifest.psml" />
  </xrefs>

</ps:process>

Copy documents to another group

To copy PSML documents to another PageSeeder group they must be exported, processed and then uploaded. To process PSML so that the copies stay independent from one another on the same server (i.e. they don’t reference each other) use the following options:

<ps:process src="${download}"
            dest="${process}"
            processed="false"
            convertmarkdown="false"
            convertasciimath="false"
            converttex="false">
  <strip documentinfo="docid,publication"
         xrefs="docid,uriid"
         images="uriid"
         manifest="true" />
</ps:process>

PageSeeder doesn't allow the same publicationid on different sets of documents in the same PageSeeder server domain. However, if you need publications to be preserved in the other group you could remove the documentinfo="publication" and add a <posttransform> with an XSLT that adds a suffix to the publication/@id when copying.

Use alternate images

This makes it possible to have images for editing and use for web delivery at the same time as having higher resolution images for publishing to paper. The idea is that when:

<image src="">..<metadata>..<xref type="alternate" href=""></image>

replace

image/@src

with

xref/@href

Process task

<ps:process src="c:\working\export"
            dest="c:\working\process"
            generatetoc="true">

  <xrefs types="embed,transclude">
    <include name="spec.psml" />
  </xrefs>

  <images embedmetadata="true"/>

  <posttransform xslt="c:\working\alternate-images.xsl">
    <include name="spec.psml" />
  </posttransform>

</ps:process>

alternate-images.xsl

<xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- change image src to alternate image -->
  <xsl:template match="image[.//xref/@type='alternate']">
    <image src="{.//xref[@type='alternate']/@href}">
      <xsl:copy-of select="@*[not(name()='src')]"/>
    </image>
  </xsl:template>

  <!-- copy all other elements unchanged -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Image ‘shell’ documents

A shell document contains both a lo-res and hi-res image. For editing, the link typically refers to the lo-res with the hi-res used to publish to paper or PDF. Although the approach of alternate images (see previous example) is recommended generally, there are legitimate use cases for the shell approach. 

Essentially the idea is to replace the references to: 

<fragment id="lores"><image .../>

with

<fragment id="hires"><image .../>

Process task

<ps:process
    src="c:\working\export"
    dest="c:\working\process"
    stripmetadata="true">
  <xrefs
      types="embed,transclude">
    <include name="spec.psml" />
  </xrefs>
  <number
      config="c:\working\defaultNumberingConfig.xml" >
    <include name="spec.psml" />
  </number>
  <pretransform xslt="c:\working\hires-images.xsl" />
</ps:process>

hires-images.xsl

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- change blockxref to point to hires image -->
  <xsl:template match="blockxref[@fragment='lores']">
    <blockxref fragment="hires">
      <xsl:copy-of select="@*[not(name()='fragment']"/>
      <xsl:apply-templates />
    </blockxref>
  </xsl:template>

  <!-- copy all other elements unchanged -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>
Created on , last edited on