Skip to main content

 Configuration

Configuration manual for PageSeeder

URL types

URL types can be used to:

  • Store metadata about any URL using all the features of PSML properties.
  • Extract, and store as editable properties, metadata from a URL where the source is HTML.

URL types can only be used from the version 6 user interface in PageSeeder v5.99 or higher.

Extraction

PageSeeder can extract metadata when creating URLs or on existing URLs – on the Global template page, click Manage global types, then the Reprocess button.

Reprocessing is useful:

  • When your URL template has been changed, or
  • When the source websites might have been updated and have new or changed metadata.

To preview the raw metadata, click show more on the display page for a URL. Alternatively, the metadata properties derived from the raw metadata can be viewed anywhere the Document properties are displayed.

When uploading PSML containing <link> elements, the URL metadata is automatically extracted where possible but by default won’t overwrite metadata for an existing URL.

If the user has permission to edit all URLs, then:

  • To overwrite existing metadata, on the Upload document dialog, select Developer options, then select the “Overwrite metadata and document properties (title, docid, publication id/type, description, labels)” option.
  • Any PSML metadata for URLs included in the upload is used instead of extracting it from the URL source.

Following are example URL metadata field names for HTML sources but these vary widely, especially between different domains.

apple-touch-icon
byl
canonical
content-language
description
image
license
news_keywords
og:description
og:image
og:image:alt
og:title
og:type
og:url
pdate
shortcut icon
size
thumbnail
title
twitter:card
twitter:description
twitter:image
twitter:image:alt
twitter:title
twitter:url
url

Configuration

Configuration of a URL type is global for the whole server and is done with the following files in the Global template under the url/[url type] folder:

  • url-config.xml
  • url-template.psml
  • editor-config.xml
  • *.sch (schematron files)

To create these files on the Global template page, click Manage global types, then Create URL type. Then click create in the column for that type.

URL config

When URLs of any type are created, the options specified in the URL config are applied. The URL config follows this structure:

<url-config>
  <creation> ... </creation>
  <labeling> ... </labeling>
  <publishing> ... </publishing>
</url-config>

These elements are used to configure the following:

  • <creation> – which domains and media types the type can be used for (required).
  • <labeling> – which labels are available on this URL (optional). It has the same format as for the document config except the labels/@type must be url.
  • <publishing> – which publishing options are available to a particular type (optional).

<creation>

The <creation> element has the following structure:

<creation [disable="true"]>
  <title> ... </title>?
  <description> ... </description>?
  <domain name="..."/>*
  <media type="..."/>*
</creation>

 All the following elements are optional:

ElementDescription
<title>The title of this URL type
<description>Description of this URL type
<domain>Which domains this type can be used on
<media>Which media types this type can be used for
  • @disable – set the boolean value to true on the @disable attribute to stop the creation of this URL type. This can be used to disable built-in URL types.

<title>

The title is used by the PageSeeder UI to provide a user-friendly name of the document type which doesn’t have the restrictions imposed on the name of the document type, by allowing any character.

It defaults to the name of the document type.

<description>

The description is displayed to the user and is good way to document what the URL type is for. It is recommended that every URL type have a description.

<domain> & <media>

The <domain> and <media> elements determine what type is used for which URL based on its domain (for example youtube.com) and media type (for example application/pdf) using the following rules.

The default type for a URL is the first URL type in alphabetical order on name:

  1. With matching domain and media type, then
  2. With matching domain and no media type, then
  3. With no domain and matching media type.

If no type matches, then the default type is used.

Following is an example url-config.xml file.

<url-config>
  <creation>
    <title>YouTube</title>
    <description>Allows YouTube videos to be played in PageSeeder.</description>
    <domain name="youtu.be"/>
    <domain name="youtube.com"/>
    <domain name="www.youtube.com"/>
  </creation>
</url-config>

URL template

The url-template.psml controls the processing of the metadata fields for each URL type. By default, there are no  url-template.psml files.

They follow the same format as document templates except that @level on <document> must be metadata. The metadata fields are inserted by using {$meta.[field name]} for attributes and <t:value name="meta.[field name]" /> for content.

After creating or modifying a URL template, all relevant URLs can be updated on the Global template page. Click the Manage global types button then the Reprocess button for the type and choose one or more of the following options:

  • “Overwrite URL properties (title, description, labels)”.
  • “Update the type of existing URLs to this one if their domain/media type matches”
  • Select the metadata properties to have their value updated.

When reprocessing the metadata, properties that aren’t currently in the template are always deleted and new properties that are not selected to be updated are added empty.

Defaults

Any of the following that are not specified in the template are extracted from the URL source metadata. This is the case for the default URL type unless it has been overridden in the Global template.

  • uri/@title: is set from twitter:title if it exists, otherwise og:title, otherwise dc:title, otherwise title.
  • uri/description: is set from twitter:description if it exists, otherwise og:description, otherwise dc:description, otherwise dc:description.abstract, otherwise description.
  • uri/@size: is set from size.
  • uri/@mediatype: is set from media-type.

Following is an example url-template.psml file.

<document type="video"
          level="metadata"
      xmlns:t="http://pageseeder.com/psml/template">
  <documentinfo>
    <uri title="{$meta.twitter:title}">
      <description>
        <t:value name="meta.twitter:description" />
      </description>
      <labels>video</labels>
    </uri>
  </documentinfo>
  <metadata>
    <properties>
      <property name="content" title="Content"
                value="{$meta.og:video:content}" />
      <property name="width" title="Width"
                value="{$meta.og:video:width}" />
      <property name="height" title="Height"
                value="{$meta.og:video:height}" />
      <property name="format" title="Format"
                value="{$meta.og:video:type}" />
      <property name="category" title="Category"
                multiple="true">
        <value>
          <t:value name="meta.category" />
        </value>
      </property>
    </properties>
  </metadata>
</document>

Editor config

Metadata for a URL can be edited through the Document info & metadata panel. Alternatively, the Edit sheet supports bulk editing of properties in a grid view.

The metadata editor behavior is configurable through the editor-config.xml for the editor name PSMLMetadata using the same options as the PSML properties editor.

Following is an example editor-config.xml file.

<editor-configs>
  <editor-config name="PSMLMetadata">
    <field name="width" type="text" label="Width"
           pattern="[0-9]" />
    <field name="height" type="text" label="Height"
           pattern="[0-9]" />
    <field name="format" type="select" label="Format">
      <value>video/3gpp</value>
      <value>video/mp4</value>
      <value>video/mpeg</value>
      <value>video/quicktime</value>
      <value>video/vivio</value>
    </field>
  </editor-config>
</editor-configs>
Created on , last edited on