Web service API

How to use PageSeeder's Web service API

group search

/groups/{group}/search [GET]

com.pageseeder.search.Search

Description

The purpose of this service is to query the index of a single group by a question.

For additional information see Index fieldsand Index field definitions.

Question

This parameter performs a full-text search on the index. Mostly useful for tokenized fields such as the pstitle and pscontent. Typically single, or multiple terms, separated by spaces, the question is usually based on input from the user.

The questionfields parameter is a comma-separated list of fields to look for the values from the question. By default, the terms in the question are processed as a full text search – meaning title (pstitle) and content (pscontent). questionfields should be analyzed and indexed. When not simply searching the field, but retrieving the contents, a field must be stored.

The Lucene query produced will be the equivalent of:

+(field1:term1 field1:term2 field2:term1 field2:term2 ...)

Boolean operators

Full-text search using the question parameter should support AND and OR using a simple lexer. The default remains OR.

User InputNormalisedLucene predicate
cheese crackerscheese OR crackersfield:cheese field:crackers
rat AND mouserat AND mouse+field:rat +field:mouse

Phrases

Using double quotes around terms specifies a phrase. For example: "cheese crackers".

Grouping

Use brackets to express queries composed of multiple boolean operators. For example (rat OR mouse) AND cheese.

Facets

This service supports parameters for computing facet values and filtering the results.

Cardinality is calculated automatically by counting the number of results that match a facet out of the overall results. Use the facets parameter to specify fields in the index, preferably using fields which are indexed but not analyzed.

An example of using the facets parameter would be to process the 'psauthor' field. This would instruct the servlet to calculate all possible 'psauthor' fields and the number of results for each unique value.

Specifying facet values to compute does not affect the search results. To apply a facet use filters.

Wildcard characters

Facet field names can end with a wildcard character '*' which means that facets will be computed for all fields in the result set starting with that name. For example facets=psproperty-* would compute facets for psproperty-year, psproperty-genre, etc. if they occur in the result set.

Range facets (experimental)

Range facets can be used to count results in multiple ranges for string, numeric or date fields. The brackets have the following meaning:

  • [ = include start term for all ranges (e.g. [0;10;20;30} = 0-9.999, 10-19.999, 20-20.999])
  • ] = include end term in final range (e.g. [0;10;20;30] = 0-9.999, 10-19.999, 20-30])
  • {= exclude start term for all ranges (e.g. {0;10;20;30] = 0.001-10, 10.001-20, 20.001-30])
  • }= exclude end term in final range (e.g. {0;10;20;30} = 0.001-10, 10.001-20, 20.001-20.999])

Example:

facets=pstitle-sort:[a;l;zz],psxrefcount:[0;1;10;100;1000;]
facets=psmodifieddate:[2016-01-01;2017-01-01;]

Interval facets (experimental)

Interval facets can be used to count results within certain intervals for string, numeric or date fields. The format is x;y|z or x|z where x is the start term, y is the optional ending term and z is the interval (1 = 1 character/number, 2=2 characters/numbers, etc. and 1Y= 1 year, 1M = 1 month, 1D = 1 day, etc.). The brackets have the following meaning:

  • [ = include start term for all intervals (e.g. [0;30|10} = 0-9.999, 10-19.999, 20-20.999])
  • ] = include end term in final interval (e.g. [0;30|10] = 0-9.999, 10-19.999, 20-30])
  • {= exclude start term for all intervals (e.g. {0;30|10] = 0.001-10, 10.001-20, 20.001-30])
  • }= exclude end term in final interval (e.g. {0;30|10} = 0.001-10, 10.001-20, 20.001-20.999])

Example:

facets=pstitle-sort:[a|1]
facets=psmodifieddate:[2016-01-01T00:00:00Z|1M]
facets=pstitle-sort:[a;zz|5],psxrefcount:[0;100|10]
facets=psmodifieddate:[2016-01-01T00:00:00Z;2017-01-01T00:00:00Z|1M]

Numeric intervals must have a start and end. If there is no end the interval is measured forward and backward from the start.

For string intervals the interval only applies to the first character (e.g. [aa|1] = aa, ba, ca, etc.)

Warning!

Range and interval facets are experimental features which may change in future so should not be used on production systems.

Computing cardinality

There are two methods for computing the cardinality of the facets:

  • Including the facet from the base query in the computation (result facet)
  • Excluding the facet from the base query in the computation (flexible facet)

Result facets

These facets are more efficient to compute, but it does not allow the user to move "across" result sets or to provide UI to combine facets of the same field. They are useful to help users to narrow or widen their search since the cardinality of the facet counts the results in the current result set.

Example:

Filterfacetsfacet valuesResult set
(empty)animal,colorbird:2
cat:1
dog:1
black:2
yellow:1
brown:1
1, bird A, yellow
2, bird B, black
3, cat C, black
4, dog D, brown
animal:birdanimal,colorbird:2black:1
yellow:1
1, bird A, yellow
2, bird B, black
color:blackanimal,colorbird:1
cat:1
black:22, bird B, black
3, cat C, black
animal:bird,color:blackanimal,colorbird:1black:12, bird B, black

Flexible facets

In the case of flexible facets, the base query is recomputed so that the field is excluded from the query if the field is used in the filter. The cardinality of the facet counts the matching results in the result set using the recomputed filter.

Example:

Filterfexible-facetsfacet valuesResult set
(empty)animal,colorbird:2
cat:1
dog:1
black:2
yellow:1
brown:1
1, bird A, yellow
2, bird B, black
3, cat C, black
4, dog D, brown
animal:birdanimal,colorbird:2
cat:1
dog:1
black:1
yellow:1
1, bird A, yellow
2, bird B, black
color:blackanimal,colorbird:1
cat:1
black:2
yellow:1
brown:1
2, bird B, black
3, cat C, black
animal:bird,color:blackanimal,colorbird:1
cat:1
black:1
yellow:1
2, bird B, black

Filters

Filters are specific terms that the query should match, they are specified as a comma-separated list of Lucene index terms using the filters parameter as expressed as: [field]:[term].

For example, the select parameter value psauthor:john,pspriority:High will only display results by author 'john' and have priority 'High'.

When selecting facets, the Lucene query produced will be the equivalent of:

+(facet1:value1) +(facet2:value2 facet2:value3) ...

The default behaviour of filters is to use:

  • AND for different field names
  • OR for the same field

Also, selected facets can be prefixed with '+' (required) and '-' (prohibited) to further qualify the facet.

UserMeansLucene predicate
label:abc,label:xyzlabel:abc OR label:xyzlabel:abc label:xyz
+label:abc,+label:xyzlabel:abc AND label:xyz+label:abc,+label:xyz
label:abc,label:xyz,type:mno(label:abc OR label:xyz) AND type:mnolabel:abc label:xyz type:mno
label:abc,-type:mnolabel:abc NOT type:mnolabel:abc -type:mno

Ranges

The ranges parameter can be used to filter using a set of ranges. It is a comma separated list of range searches with format:

field:[|{(lower)?;(upper)?}|]

where [ or ] means include limit value and { or } means exclude limit value. For example:

ranges=psproperty-expires:[2015-03-20;2016-01-01],psproperty-title:[A;C}
ranges=psxrefcount:{50;],psreversexrefcount:[;10]

Date ranges

The service does not support the from, last or to parameters. Use the ranges parameter to perform a date range search on the last modified date. See example below.

GenericSearchRanges
from=2017-04-03T14:00psmodifieddate:[2015-04-03T14:00:00;]
from=2017-04-03&to=2017-05-03psmodifieddate:[2015-04-03;2017-05-03]

To replace the last parameter, you need to compute the lower end of the range from the current date and time.

Size ranges

The service does not support the min-size or max-size parameters. Use a range search on the pssize field instead.

Organizing search results

Search results can be paged and ordered.

The page and page-size controls which part of the search results will be returned. They must be positive integer values and will default to 1 and 100 respectively.

The sortby parameter is a comma separated list of fields to sort the results; the results are sorted by relevance if no sortby parameter is specified.

Parameters

NameDescriptionRequ​iredTypeDefault value
facetsA comma-separated list of fields to use as result facets.nostrings
facetsizeThe max number of facet values to load (max 1000).nointeger10
filtersA comma-separated list of field:term pairs to use as filters.nostrings
flexiblefacetsA comma-separated list of fields to use as flexible facets.nostrings
fieldsizeMaximum number of characters allowed in a result field (max 10000).nointeger1000
pageThe current page to view.nointeger1
pagesizeHow many results does a page contain.nointeger100
questionThe question to search for.nostring
questionfieldsA comma-separated list of fields to search the question in.nostringspstitle,​pscontent
rangesA comma-separated list of range searches.nostrings
sortfieldsA comma-separated list of fields to sort the results.nostrings
suggestsizeThe max number of suggestions to load (only for question query).nointeger20

Escaping characters

In facets, flexiblefacets, filters and ranges parameters the characters ',', ';', '|' and '\' must be preceded by backslash (e.g. pstitle:one\,two\\three\|four\;five means the term one,two\three|four;five ).

question

An optional parameter to specify terms to search in the text. It can include boolean operators AND, OR; phrases and grouping using brackets.

question-fields

A comma-separated list of fields to search for the question. Each field must match matching ^([a-zA-Z0-9\-]+)$. If not specified, the default fields for the question are "pstitle,pscontent".

facets

A comma-separated list of field names to include as facets of the current result set.

flexiblefacets

A comma-separated list of field names to include as facets of the result set excluding any filter based on these field names.

facetsize

The maximum number of facet values returned per facet or flexible facet. Setting this to zero should improve performance as no values are computed, but the has-results attribute is still returned.

filters

A comma-separated list of terms to select facets.

ranges

A comma-separated list of range searches.

sort-fields

A comma-separated list of fields to sort the results. Prefix the field name with a minus (-) to use descending order, otherwise it will be ascending. If not specified results are sorted by relevance.

suggest-size

The maximum number of suggestions.

page

The current page to view.

page-size

How many results does a page contain.

Permission

Permissions requirements to be updated.

Response

 The XML response is:

<search [reindexing="true"]>
    <query> ... </query>
    <suggestions> ... </suggestions>
    <facets> ... </facets>
    <results> ... </results>
</search>

Query

<query empty="false"
       predicate="[lucene_predicate]">
    <question empty="false">
        <field boost="1.0"
               name="pscontent"/>
        <field boost="1.0"
               name="psfilename"/>
        <field boost="1.0"
               name="psid"/>
        <field boost="1.0"
               name="psdocid"/>
        <field boost="1.0"
               name="pstitle"/>
        <text>hello</text>
    </question>
    <filters>
        <filter field="[field_name]">
            <term text="[value1]"
                  occur="[must|
                         must_not|
                         should]"/>
            <term text="[value2]"
                  occur="[must|
                         must_not|
                         should]"/>
        </filter> 
        ...
    </filters>
    <ranges>
        <date-range field="[field_name]"
                    from="[from]"
                    to="[to]"/>
        <numeric-range type="integer"
                       field="[field_name]"
                       min="[min]"
                       min-included="[true|false]"
                       max="[max]"
                       max-included="[true|false]"/>
      ...
    </ranges>
    <sort by="relevance"/>
</query>

Suggestions

<suggestions>
    <suggestion question="get"
                cardinality="510"/>
    <suggestion question="list"
                cardinality="408"/>
    <suggestion question="must"
                cardinality="360"/>
    ...
</suggestions>

Facets

<facets>
    <facet name="[field]" 
           type="field"
           flexible="[true|false]"
           has-results="[true|false]"
           total-terms="[total_number_of_terms_for_facets]">
        <term text="comment"
              cardinality="1098"/>
        <term text="task"
              cardinality="388"/>
        <term text="document"
              cardinality="168"/>
        ...
    </facet>
  ...
</facet>

Results

<results page="3"
         page-size="100"
         total-pages="7"
         total-results="645"
         first-result="201" 
         last-result="300">
    <result score="2.34">
        <field|extract name="[name]">[value]</field>
        ...
   </result>
   ...
</results>

Error Handling

0x1501If there is an invalid group specified
0x1502If there are no indexes selected
0x1503If the predicate is not valid
0x1504If a sort field cannot be used for sorting
0x1505If the facet size specified is bigger than the global property for face size ( maxFacetSize)
0x1506If a numeric field is specified as a facet
0x6502Missing catalog, please re-index group

Created on , last edited on