Skip to main content

 Configuration

Configuration manual for PageSeeder

URL types

URL types can be used to:

  • Store metadata about any URL using all the features of PSML properties.
  • Extract, and store as editable properties, metadata from a URL where the source is HTML.
  • Display the correct audio or video player for certain built in types.

The built in URL types include: SoundCloud, Spotify, YouTube, Vimeo.

URL types can only be used from the version 6 user interface in PageSeeder v5.99 or higher.

Extraction

PageSeeder can extract metadata when creating URLs or on existing URLs through the Reprocess option under Manage global types on the Global template administration page.

To preview the raw metadata, click show more on the display page for a URL. Alternately, the metadata properties derived from the raw metadata can be viewed anywhere the Document properties are displayed.

When uploading PSML containing <link> elements the URL metadata is automatically extracted where possible but by default will not overwrite metadata for an existing URL.

If the user has permission to edit all URLs then:

  • Ticking Overwrite metadata under Developer options on the Document upload dialogue overwrites existing metadata.
  • Any PSML metadata for URLs included in the upload will be used instead of extracting it from the URL source.

Following are example URL metadata field names for HTML sources but these will vary widely, especially between different domains.

apple-touch-icon
byl
canonical
content-language
description
image
license
news_keywords
og:description
og:image
og:image:alt
og:title
og:type
og:url
pdate
shortcut icon
size
thumbnail
title
twitter:card
twitter:description
twitter:image
twitter:image:alt
twitter:title
twitter:url
url

Configuration

Configuration of a URL type is global for the whole server and is done with the following files in the Global template under the url/[url type] folder:

  • url-config.xml
  • url-template.psml
  • editor-config.xml
  • *.sch (schematron files)

To create these files click Create URL type under Manage global types on the Global template administration page. Then click create in the column for that file.

URL config

When URLs of any type are created, the options specified in the URL config are applied. The URL config follows this structure:

<url-config>
  <creation> ... </creation>
  <labeling> ... </labeling>
  <publishing> ... </publishing>
</url-config>

These elements are used to configure the following:

  • <creation> – which domains and media types the type can be used for (required).
  • <labeling> – which labels are available on this URL (optional). It has the same format as for the document config except the labels/@type must be url.
  • <publishing> – which publishing options should be available to a particular type (optional).

<creation>

The <creation> element has the following structure:

<creation [disable="true"]>
  <title> ... </title>?
  <description> ... </description>?
  <domain name="..."/>*
  <media type="..."/>*
</creation>

 All the following elements are optional:

ElementDescription
<title>The title of this URL type
<description>Description of this URL type
<domain>Which domains this type can be used on.
<media>Which media types this type can be used for.
  • @disable – set the boolean value to true on the @disable attribute to stop the creation of this URL type. This can be used to disable built in URL types.

<title>

The title is used by the PageSeeder UI to provide a user-friendly name of the document type which does not have the restrictions imposed on the name of the document type, by allowing any character.

It defaults to the name of the document type.

<description>

The description is displayed to the uses and is good way to document what the URL type is for. It is recommended that every URL type have a description.

<domain> & <media>

The <domain> and <media> elements determine what type is used for which URL based on it's domain (for example youtube.com) and media type (for example application/pdf) using the following rules.

The default type for a URL is the first URL type in alphabetical order on name:

  1. with matching domain and media type, then
  2. with matching domain and no media type, then
  3. with no domain and matching media type.

If no type matches then the default type is used.

Following is an example url-config.psml file.

<url-config>
  <creation>
    <title>YouTube</title>
    <description>Allows YouTube videos to be played in PageSeeder.</description>
    <domain name="youtu.be"/>
    <domain name="youtube.com"/>
    <domain name="www.youtube.com"/>
  </creation>
</url-config>

URL template

The url-template.psml controls the processing of the metadata fields for each URL type. By default there are no  url-template.psml files.

They follow the same format as document templates except that @level on <document> must be metadata. The metadata fields are inserted by using {$meta.[field name]} for attributes and <t:value name="meta.[field name]" /> for content.

After creating or modifying a URL template, all relevant URLs can be updated using the Reprocess link on the Manage global types page and choosing one or more of the following options:

  • “Overwrite URL properties (title, description, labels)”.
  • "Update the type of existing URLs to this one if their domain/media type matches"
  • Select the metadata properties to have their value updated.

When reprocessing the metadata properties not currently in the template are always deleted and new properties that are not selected to be updated are added empty.

Defaults

Any of the following that are not specified in the template will be extracted from the URL source metadata. This is the case for the default URL type unless it has been overridden in the Global template.

  • uri/@title: will be set from twitter:title if it exists, otherwise og:title, otherwise dc:title, otherwise title.
  • uri/description: will be set from twitter:description if it exists, otherwise og:description, otherwise dc:description, otherwise dc:description.abstract, otherwise description.
  • uri/@size: will be set from size.
  • uri/@mediatype: will be set from media-type.

Following is an example url-template.psml file.

<document type="video" level="metadata"
          xmlns:t="http://pageseeder.com/psml/template">
  <documentinfo>
    <uri title="{$meta.twitter:title}">
       <description>
         <t:value name="meta.twitter:description" />
       </description>
       <labels>video</labels>
    </uri>
  </documentinfo>
  <metadata>
    <properties>
      <property name="content" title="Content"
                value="{$meta.og:video:content}" />
      <property name="width" title="Width"
                value="{$meta.og:video:width}" />
      <property name="height" title="Height"
                value="{$meta.og:video:height}" />
      <property name="format" title="Format"
                value="{$meta.og:video:type}" />
      <property name="category" title="Category" multiple="true">
        <value><t:value name="meta.category" /></value>
      </property>
    </properties>
  </metadata>
</document>

Editor config

Metadata for a URL can be edited through the Information panel available on the Document View page. Alternately, the Edit sheet, available on the Search page, supports bulk editing of properties in a grid view.

The metadata editor behavior is configurable through the editor-config.xml for the editor name PSMLMetadata using the same options as the PSML properties editor.

Following is an example editor-config.xml file.

<editor-configs>
  <editor-config name="PSMLMetadata">
    <field name="width" type="text" label="Width"
           pattern="[0-9]" />
    <field name="height" type="text" label="Height"
           pattern="[0-9]" />
    <field name="format" type="select" label="Format">
      <value>video/3gpp</value>
      <value>video/mp4</value>
      <value>video/mpeg</value>
      <value>video/quicktime</value>
      <value>video/vivio</value>
    </field>
  </editor-config>
</editor-configs>

Schematron

Coming soon.

Created on , last edited on