Skip to main content

 Universal format

Portable, metadata and processed PSML

Universal Portable Format

For PageSeeder data, the Universal Portable Format is the metaphorical equivalent of a USB drive. It allows users to transport a collection of documents and images from one server to another.  Using a single ZIP format file to package all contents, the features of the universal portable format are:

  • A consistent, standalone, server-independent representation of a PageSeeder file set.
  • Isomorphic conversion to/from PageSeeder (except for edit history and comments which belong to the group, not the document), including:
    • All the necessary files, plus a manifest.
    • Links fully resolved using relative paths.
    • Uses the widely supported ‘zip’ format.

Processing the universal portable format is standard for several PageSeeder interfaces, including: 

  • Upload (input).
  • Export (output).
  • Start and End points for a number of Apache Ant tasks.

Definition

The universal portable format zip package has the following folders and files:

  • META-INF – a folder containing manifest.xml and a PSML metadata file for every folder and non-PSML file in the package. For example:
    • folder – myspec/images.psml
    • file – myspec/images/figure1.jpg.psml

Any META-INF files are optional when uploading the package to PageSeeder.

  • META-INF/_urls – a folder containing a representation of every URL in the package, using the convention [scheme]/[host]/[port]/[Unique ID].psml. For example:
    • scheme folder – https
    • host folder – en.wikipedia.org
    • port folder – 443
    • unique ID file – 210022.psml 

The unique ID can be any string unique for scheme/host/port. For an xref to reference a URL, the @href attribute must match the URL.

When uploading, there is no need to simulate the URL structure, the URL metadata files can be directly under the META-INF/_urls path, like the following:

META-INF/_urls/210022.psml
  • The actual files in the package, arranged relative to a specified context path as described in the following.

Context path

The location of all files in the export set are defined by the context path.

The context path always starts with /ps (the site prefix), and:

  • When exporting a document – the parent folder of the document is the default value. 
  • When exporting a folder – the default value is  the folder itself.

For example, the representation of the following file /ps/acme/specs/documents/spec.psml is: 

  • with context path /ps/acme/specs
  • the package path documents/spec.psml

Where files are in the same group, but outside the specified context, including them in the export requires using the _local folder. For example, exporting the file /ps/acme/specs/images/figure1.jpg from a context of /ps/acme/specs/documents would produce the package path _local/images/figure1.jpg.

A file that is not in the same group, must use the _external folder. For example /ps/acme/products/images/figure2.jpg

  • site prefix/ps
  • project/acme
  • group/products
  • folder/images
  • file/figure2.jpg

If the specified context and the source group is /ps/acme/specs/documents, the file figure2.jpg would have the package path:

_external/acme/products/images/figure2.jpg

The manifest.xml lists all the documents in the export set using the following format:

<uris>
  <uri id="123"
    scheme="http" host="acme.com" port="80"
    path="/ps/acme/specs/documents/my%20spec.psml"
    decodedpath="/ps/acme/specs/documents/my spec.psml"
    mediatype="application/vnd.pageseeder.psml+xml"
    documenttype="spec" />
    ...
</uris>

Non-PSML document metadata is expressed as the <document> element with the attribute level="metadata". There are no required sub-elements. The elements <documentinfo>, <fragmentinfo>, <metadata> and <fragments> are optional. 

<document level="metadata">
  <documentinfo>
    <uri id="234"
         docid="fig2"
         scheme="http"
         host="acme.com" port="80"
         path="/ps/acme/products/images/figure%202.jpg"
         decodedpath="/ps/acme/products/images/figure 2.jpg"
         mediatype="image/jpg">
      <displaytitle>Figure 2</displaytitle> 
      <description>Overall system diagram</description> 
      <labels>Spec,System</labels>
    </uri>
  </documentinfo>
</document>

All PageSeeder PSML files have the attribute level="portable" on the <document> element. The only required sub-element is <section>. The elements <documentinfo>, <fragmentinfo>, <metadata> and <toc> are optional.

Example

Source

GroupPath (in group)References
acme-specsdocuments/book.psmlgraph.jpg, figure1.JPG, figure2.jpg
acme-specsdocuments/graph.jpg
acme-specsimages/figure1.JPG
acme-productsimages/figure2.jpg

Specifications

ParameterValue
Source path
/ps/acme/specs/documents/book.psml
Context path
/ps/acme/specs/documents
Destination path
{OUT}/

Output

{OUT}/META-INF/manifest.xml
{OUT}/META-INF/graph.jpg.psml
{OUT}/META-INF/_local/images.psml
{OUT}/META-INF/_local/images/figure1.JPG.psml
{OUT}/META-INF/_external/acme/products.psml
{OUT}/META-INF/_external/acme/products/images.psml
{OUT}/META-INF/_external/acme/products/images/figure2.jpg.psml
{OUT}/book.psml (References: graph.jpg, _local/images/figure1.JPG,
                 _external/acme/products/images/figure2.jpg)
{OUT}/graph.jpg
{OUT}/_local/images/figure1.JPG
{OUT}/_external/acme/products/images/figure2.jpg
Created on , last edited on