Validating documents
Overview
PageSeeder has in-built document validation technology based on Schematron . Schematron is an ISO standard for processing XML documents and the following screenshot shows a PSML document validated using the Best Practice schema that comes with PageSeeder.
This schema demonstrates a small sample of Schematron capabilities and can be customized to suit more specific needs. When authoring documents, there are several advantages to Schematron when compared to conventional XML schemas (W3C XSD files), including:
- Much better error and warning messages – instead of cryptic parser messages designed for programmers, Schematron messages are written for specific circumstances (for example,
do not embed a photo without including a credit and copyright statement
). This improves productivity and reduces training and author frustration. - Ability to prioritize information – all information is not equal. So while all documents created in PageSeeder are always valid against the PSML schema, Schematron makes it straightforward to add extra checking to areas of specific importance such as complex references, metadata or detailed content such as addresses or part numbers.
Although different to the approach used by conventional XML editing tools, Schematron allows developers to validate for more constructs than W3C XML schemas and requires less time to implement.
Usage
Single file validation
PageSeeder provides a
Batch validation
A more powerful feature of PageSeeder is the ability to validate an entire collection of documents. This can be done by selecting specific documents from search results or by validating a folder.
Batch validation is very useful for QA or when the structure or semantics of documents needs to evolve.
For example, batch validation can be used for:
- Checking that assets match specific constraints (for example, their dimension or resolution) before being published.
- Ensuring all xrefs are resolved.
- Diagnosing any structural issue that might not be supported in a publish process.
- Ensuring domain specific semantics.
Configuring a Schematron schema
The easiest way to configure a Schematron for a specific document type is to use the Template configuration page.
Special URLs
When running Schematron validations in PageSeeder, the document()
function can be used with the following special URLs to get additional data in PageSeeder:
[URI path or URL]
A URI path for a document (for example /ps/acme/specs/mydoc.psml
) or a full URL returns the PSML for that document or URL. In PageSeeder 5.9811 or higher, non-PSML documents and URLs return the Universal Metadata Format. Example:
<!-- Match cross references --> <sch:rule context="xref|blockxref"> <!-- Check referenced document contains a heading2 --> <sch:assert test="document(@href)//heading[@level='2']">Document <sch:value-of select="@href"/> has no heading2. </sch:assert> </sch:rule>
ps:search
This returns search results from the group where the validation was launched and can use any parameters from the Group search service. Example:
<sch:let name="definitions" value="document( 'ps:search?filters=psdocumenttype:definition)" />
If the parameter project=myproject
is used, then all groups the user can view under that project are searched. To restrict the groups, use the groups
parameter with a comma-separated list of group names. To maximize performance, the results of up to 30 searches are cached and reused for the validation of each single folder or batch search. Requires PageSeeder 5.9800 or higher.
ps:source-metadata
This uses the same parameters and returns the same XML as the Get externalURI source metadata forURL service. Requires PageSeeder 5.9807 or higher. Example:
<sch:let name="url-metadata" value="document(concat( 'ps:source-metadata?method=head&url=', encode-for-uri(@href)))"/>
As of PageSeeder 5.9900, if no url
parameter is supplied, the URL for the URI being validated is used. This is recommended when validating URLs as it also throttles the validation so a host isn’t sent too many requests. Example:
<sch:let name="url-metadata" value="document('ps:source-metadata?method=head')"/>
ps:self
This returns the same XML as the Get self service. Requires PageSeeder 5.9811 or higher. Example:
<sch:let name="self" value="document('ps:self')"/> <sch:let name="memberid" value="$self/member/@id" />
ps:workflow
This returns the same XML as the Get URI workflow service. Requires PageSeeder 5.9811 or higher. Example:
<sch:let name="workflow" value="document('ps:workflow')" />
ps:publications
This returns the same XML as the Get URI publications service with no parameters. Requires PageSeeder 6.1 or higher. Example:
<sch:let name="pubs" value="document('ps:publications')" />
Quickfix
Added in PageSeeder v6, a quickfix is some XSLT that transforms a fragment to fix a validation error. They are placed in a quickfix
root folder of the project template with the same name as the ID specified in the schema properties. For example remove-markup-from-heading.xsl
in the following code:
<sch:properties xmlns:sch="http://purl.oclc.org/dsdl/schematron"> <sch:property id="remove-markup-from-heading" role="quickfix"> <description>Remove the markup from the heading</description> <parameter name="uripath"><sch:value-of select="/document/documentinfo/uri/@path"/></parameter> </sch:property> </sch:properties>
This is bound to a schematron assertion using @properties
as follows:
<sch:assert test="not(heading) or heading[not(*)]" subject="heading" properties="fragment remove-markup-from-heading"> Heading should not contain markup </sch:assert>
In PageSeeder v6.1 or higher the special URLs supported in schematron are also supported in the quickfix XSLT but may require a <parameter>
to be specified in the properties. For example the uripath
in the previous example could be used in the following XSLT:
<xsl:param name="uripath" /> ... <xsl:variable name="doc" select="document($uripath)" />
Sample code
There are several examples of Schematron rules in the Schematron code samples.