Tutorials

Task-driven tutorials and recipes for PageSeeder

How to convert XML data to PSML documents with XRefs

Skills requiredXML, XSLT
Time required (minutes)30
Intended audienceDeveloper
DifficultyMedium
CategoryDocument

Objective

PSML documents are often linked together using cross-references (XRefs). This tutorial will demonstrate how use XSLT  to convert two XML files generated by Wikipedia to PSML documents linked together with XRefs. The PSML will then be uploaded to PageSeeder for checking the results.

Prerequisites

To complete this tutorial requires:

  • Software to read and write zip files.
  • Access to the command prompt on the computer running the tutorial (on Windows run cmd);
  • A decent text editing program such as Notepad++, Sublime or something that is not Windows Notepad. Notepad will have problems with the line endings.
  • Access to a PageSeeder server with at least a contributor role on the tutorial group.

All the necessary files for this tutorial are on Github .

Tutorial

Installing an XSLT processor

XSLT is a W3C standard programming language primarily designed to process XML content. One task it is particularly well suited to is convert XML to other syntax such as plain text or alternate structures such as HTML.

To interpret and run XSLT code requires an XSLT processor. The steps below will install the Saxon XSLT processor on Windows.

  • If it is not already present, install Java as follows:
  1. Go to https://www.oracle.com/java  and choose Java for Developers / Java SE.
  2. Download the JDK or JRE .exe for windows (x64 for 64 bit) and run it.

Run the XSLT code

  • Copy the following files from Github to a tutorial folder, for example c:\ps\xref
    • wikipediafilms.xml – the source XML film data.
    • wikipediabios.xml – the source XML bio data.
    • films-bios.xsl – the XSLT code
  • Open a command prompt in the folder with the files from the previous step and use Saxon to process the XML with the XSLT code. For example:
> java -jar c:\saxon\saxon9he.jar -s:wikipediafilms.xml -xsl:films-bios.xsl -o:output.txt

This should create files in the current folder according to the following naming pattern:

bios/bio-[n].psml
films/film-[n].psml

or

bios/bio-1.psml
bios/bio-2.psml
bios/bio-3.psml
etc....
films/film-1.psml
films/film-2.psml
films/film-3.psml
etc....

Before continuing, open some files in a text editor to check the content. Also as reference for XSLT conversion code or PSML markup, review wikipediafilms.xml, wikipediabios.xml and films-bios.xsl.

Include images

Adding images to the collection requires storing the image files in a paths relative to the text. For example, the following:

c:\ps\xref\bios\images
c:\ps\xref\films\images

Copy the film images from Github to these folder.

Package and upload the PSML

The final step requires moving the data from a local file system into PageSeeder. Do this via the following steps:

  • 'Zip' the films and bios folders into a single zip archive. 
  • Upload the archive to the PageSeeder group and select the unzip icon (see image below).
  • After unzipping the archive, simply continue through the upload.

ps_upload-films.jpg

Created on , last edited on