Tutorials

Task-driven tutorials and recipes for PageSeeder

How to create a Berlioz index using PSML content

Skills requiredXML
Time required (minutes)15
Intended audienceDeveloper
DifficultyEasy
CategoryDocument

Objective

This tutorial will explain how to create a Berlioz index using PSML content.

Prerequisites:

PageSeeder (version 5.7 or above) installed, please see Installation guide for setting up PageSeeder;

Jetty with Berlioz webapp installed;

PSML content published to the psml folder on the webapp. In this tutorial we will consider that the folder is located under /WEB-INF/psml (the default location).

Tutorial

Implement indexing XSLT

The indexing XSLT is used to transform the PSML content into XML that is parsed by Berlioz to build the index. Create the XSLT in the ixml folder: the location of the file should be WEB-INF/ixml/default.xsl. The output format is ixml and is defined by a DTD, the ixml version in this example is 3.0, so the output declaration should be the following:

<xsl:output method="xml" indent="no" encoding="utf-8"
            doctype-public="-//Weborganic//DTD::Flint Index Documents 3.0//EN"
            doctype-system="http://weborganic.org/schema/flint/index-documents-3.0.dtd"/>

The Berlioz indexer sends some parameters defining the document indexed, these are:

<xsl:param name="_src"          />
<xsl:param name="_path"         />
<xsl:param name="_filename"     />
<xsl:param name="_visibility"   />
<xsl:param name="_lastmodified" />

In our example, the content to be indexed is in PSML documents with type film and the PSML properties are indexed as separate fields, along with the "about" section, the links and the image (if there is one):

<xsl:template match="/">
  <documents version="3.0">
    <document>
      <!-- Common fields -->
      <field name="_src"          tokenize="false" stored="false"><xsl:value-of select="$_src"/></field>
      <field name="_path"         tokenize="false"><xsl:value-of select="$_path"/></field>
      <field name="_filename"     tokenize="false"><xsl:value-of select="$_filename"/></field>
      <field name="_lastmodified" tokenize="false"><xsl:value-of select="$_lastmodified"/></field>
      <field name="_visibility"   tokenize="false"><xsl:value-of select="$_visibility"/></field>
      <field name="type"          tokenize="false"><xsl:value-of select="@type"/></field>
      <field name="title"         tokenize="false"><xsl:value-of select="if (.//heading) then (.//heading)[1] else
                                                                         if (documentinfo/uri/@title) then documentinfo/uri/@title else $_filename"/></field>
    <!-- use sections -->
    <xsl:apply-templates select="document[@type = 'film']/section" />
    </document>
  </documents>
</xsl:template>

<!-- Details -->
<xsl:template match="section[@id='details']">
  <xsl:for-each select="properties-fragment/property">
    <xsl:choose>
      <xsl:when test="value">
        <xsl:variable name="field" select="@name" />
        <xsl:for-each select="value">
          <field name="{$field}" tokenize="false"><xsl:value-of select="."/></field>
        </xsl:for-each>
      </xsl:when>
      <xsl:otherwise>
        <field name="{@name}" tokenize="false"><xsl:value-of select="@value"/></field>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>

<!-- Summary -->
<xsl:template match="section[@id='summary']">
  <field name="about"><xsl:value-of select="string(.)" /></field>
</xsl:template>

<!-- Images -->
<xsl:template match="section[@id='image']">
  <xsl:if test=".//image">
    <field name="image" tokenize="false"><xsl:value-of select="(.//image)[1]/@src" /></field>
  </xsl:if>
</xsl:template>

Created on , last edited on