Configuration

Configuration manual for PageSeeder

Media types

Overview

Media types can be used to:

  • Store metadata about any non-PSML file using all the features of PSML properties.
  • Extract metadata from the following binary file types and store it as properties:
    • Microsoft Office (.docx, .xlsx, .pptx)
    • PDF (.pdf)
    • Images (.jpg, .jpeg, .gif, .png)

Extraction

Metadata can be extracted when the file is loaded into PageSeeder or by reprocessing the metadata of existing files from the Dev > Document config page.

Prior to extraction the raw metadata can be viewed via the 'eye' icon (in developer perspective) on the Upload documents or Document properties pages.

Below are the field names of metadata that can be extracted for each file type.

Microsoft Office

dc-title
dc-description
dc-creator
dc-subject
cp-keywords
cp-category
cp-revision
cp-version
dcterms-created
dcterms-modified

PDF

docinfo-title
docinfo-subject
docinfo-keywords
docinfo-author
docinfo-creator
docinfo-producer
docinfo-creationdate
docinfo-moddate
docinfo-[any custom property]

Images

iptc-keywords
exif-image-description
exif-user-comment
exif-artist
exif-date-time
exif-date-time-orginal
exif-image-width
exif-image-height
exif-x-resolution (dpi)
exif-y-resolution (dpi)
exif-copyright
exif-focal-length (mm)
exif-f-number
exif-iso-speed-ratings
exif-gps-altitude (meters)
exif-gps-altitude-ref
exif-gps-latitude
exif-gps-latitude-ref
exif-gps-longitude
exif-gps-longitude-ref
exif-gps-dest-latitude
exif-gps-dest-latitude-ref
exif-gps-dest-longitude
exif-gps-dest-longitude-ref

Note

The following clean-up is done on metadata when they are used in document templates:

  • cp-keywords, docinfo-keywords, iptc-keywords:
    • replace ';' by ','
    • then remove any spaces after or before ','
    • then replace other non-label chars by '_'
    • then truncate to 250 chars at last comma.
  • dc-title, docinfo-title, exif-image-description:
    • truncate to 250 chars.

Configuration

Which metadata is extracted and/or stored can be configured using media-template.psml files shown at the bottom of the Document config page (below) which is located under the Dev tab in the Developer perspective.

Below are the default  media-template.psml files for different file extensions. They follow the same format as document templates except that @level on <document> must be "metadata".

The metadata fields are inserted by using {$meta.[field name]} for attributes and <t:value name="meta.[field name]" /> for content.

To change these, click Create media type and override one of the existing media templates. Once a media template has been created or modified all the associated documents can be updated using the reprocess link and choosing either:

  • Add new metadata properties only (preserve existing), or
  • Overwrite all metadata and document properties (title, docid, description, labels)

.docx, .pptx, .xlsx

<document xmlns:t="http://pageseeder.com/psml/template" level="metadata">
  <documentinfo>
    <uri title="{$meta.dc-title}">
      <description><t:value name="meta.dc-description" /></description>
      <labels><t:value name="meta.cp-keywords" /></labels>
    </uri>
  </documentinfo>
  <metadata>    
    <properties>
      <property name="author" title="Author" value="{$meta.dc-creator}" />
    </properties>
  </metadata>
</document>

.pdf

<document xmlns:t="http://pageseeder.com/psml/template" level="metadata">
  <documentinfo>
    <uri title="{$meta.docinfo-title}">
      <description><t:value name="meta.docinfo-subject" /></description>
      <labels><t:value name="meta.docinfo-keywords" /></labels>
    </uri>
  </documentinfo>
  <metadata>    
    <properties>
      <property name="author" title="Author" value="{$meta.docinfo-author}" />
    </properties>
  </metadata>
</document>

.gif, .jpg, .png

<document xmlns:t="http://pageseeder.com/psml/template" level="metadata">
  <documentinfo>
    <uri title="{$meta.exif-image-description}">
      <description><t:value name="meta.exif-user-comment" /></description>
      <labels><t:value name="meta.iptc-keywords" /></labels>
    </uri>
  </documentinfo>
  <metadata>    
    <properties>
      <property name="author" title="Author" value="{$meta.exif-artist}" />
      <property name="latitude" title="Latitude"
        value="{$meta.exif-gps-latitude}{$meta.exif-gps-latitude-ref}" />
      <property name="longitude" title="Longitude"
        value="{$meta.exif-gps-longitude}{$meta.exif-gps-longitude-ref}" />
    </properties>
  </metadata>
</document>

Note

When extracting metadata an upload may take longer. To save time when metadata is not required override the default media template and remove references to meta. fields or disable metadata altogether for an extension by using this media-template.psml:

<document level="metadata">
</document>

Editing

Metadata can be edited individually via the Edit document properties page (Properties tab on document view) or in bulk via the Edit sheet (accessed via links on Documents, Search or Images pages).

The metadata editor behavior can be configured using an editor-config.json with the editor name PSMLMetadataConfig and the same options as the PSML properties editor. To create this file click Create media type on the Document config page (above). Below is an example editor-config.json file.

{
  "PSMLMetadataConfig": {
 
    "fields": {
      "width" : {
        "type": "text",
        "label": "Width",
        "pattern": "[0-9]*"
      },
      "height" : {
        "type": "text",
        "label": "Height",
        "pattern": "[0-9]*"
      },
      "hi-res" : {
        "type": "xref",
        "label": "Hi-res",
        "autosuggest" : {
          "with": {
            "pssubtype"     : "image"
          }
        }
      },
      "action" : {
        "type": "select",
        "label": "Action",
        "values": ["None", "Zoom", "Fullscreen"]
      }
    }    
  }
}

Created on , last edited on