How to convert XML data to PSML documents with XRefs
|Skills required||XML, XSLT|
|Time required (minutes)||30|
PSML documents are often linked together using cross-references (XRefs). This tutorial will demonstrate how use XSLT to convert two XML files generated by Wikipedia to PSML documents linked together with XRefs. The PSML will then be uploaded to PageSeeder for checking the results.
To complete this tutorial requires:
- Software to read and write zip files.
- Access to the command prompt on the computer running the tutorial (on Windows run
- A decent text editing program such as Notepad++, Sublime or something that is not Windows Notepad. Notepad will have problems with the line endings.
- Access to a PageSeeder server with at least a contributor role on the tutorial group.
All the necessary files for this tutorial are on Github .
Installing an XSLT processor
XSLT is a W3C standard programming language primarily designed to process XML content. One task it is particularly well suited to is convert XML to other syntax such as plain text or alternate structures such as HTML.
To interpret and run XSLT code requires an XSLT processor. The steps below will install the Saxon XSLT processor on Windows.
- If it is not already present, install Java as follows:
- Go to https://www.oracle.com/java and choose Java for Developers / Java SE.
- Download the JDK or JRE .exe for windows (x64 for 64 bit) and run it.
- Go to http://sourceforge.net/projects/saxon/files/ , download the latest SaxonHE
.zipfile, unzip and install it.
Run the XSLT code
- Copy the following files from Github to a tutorial folder, for example
wikipediafilms.xml– the source XML film data.
wikipediabios.xml– the source XML bio data.
films-bios.xsl– the XSLT code
- Open a command prompt in the folder with the files from the previous step and use Saxon to process the XML with the XSLT code. For example:
> java -jar c:\saxon\saxon9he.jar -s:wikipediafilms.xml -xsl:films-bios.xsl -o:output.txt
This should create files in the current folder according to the following naming pattern:
bios/bio-1.psml bios/bio-2.psml bios/bio-3.psml etc.... films/film-1.psml films/film-2.psml films/film-3.psml etc....
Before continuing, open some files in a text editor to check the content. Also as reference for XSLT conversion code or PSML markup, review
Adding images to the collection requires storing the image files in a paths relative to the text. For example, the following:
Copy the film images from Github to these folder.
Package and upload the PSML
The final step requires moving the data from a local file system into PageSeeder. Do this via the following steps:
- 'Zip' the
biosfolders into a single zip archive.
- Upload the archive to the PageSeeder group and select the unzip icon (see image below).
- After unzipping the archive, simply continue through the upload.
Created on , last edited on