Web service API

How to use PageSeeder's Web service API

group search

/groups/{group}/search [GET]

com.pageseeder.search.Search

Description

Search a single group by question.

Warning!

This class is not part of the public API, and may be subject to change!

Question

This parameter is used to perform a full-text search on the index. It is most useful for tokenized fields such as the pstitle and pscontent. The question is typically one or multiple terms, separated by spaces, usually specified directly from the user.

Each term in the question will be converted into Lucene Terms for each field specified in the question-fields parameters. The fields parameter would typically include the title ( pstitle) and content ( pscontent ) for a full text search. The fields should be indexed by Lucene; and it is preferable that they are analyzed. In order to generate extracts, they must be stored.

The Lucene query produced will be the equivalent of:

+(field1:term1 field1:term2 field2:term1 field2:term2 ...)

Boolean operators

Full-text search using the question parameter should support AND and OR using simple lexer. Default remains OR.

UserNormalisedLucene predicate
cheese crackerscheese OR crackersfield:cheese field:crackers
rat AND mouserat AND mouse+field:rat +field:mouse

Phrases

Phrases can be specified using double quotes, for example, "cheese crackers".

Grouping

In order to support, multiple boolean operators, the question uses brackets for prioritising query, for example (rat OR mouse) AND cheese.

Facets

This service can be used for faceted search by computing facet values and filtering the results using selected facets.

Facets cardinality, that is the number of search results matching a facet within the current results is calculated automatically based on the values of the fields specified as facets. Use the facets to specify which fields to use in the index; generally, it is preferable to use fields which are indexed but not analyzed.

As an example, if the facets parameter include the 'psauthor' field, this servlet will calculate for each possible 'psauthor' field value how many results within the current results match each facet value or the 'psauthor' field.

Specifying facet values to compute does not affect the search results. To apply a facet use filters.

Wildcard characters

Facet field names can end with a wildcard character '*' which means that facets will be computed for all fields in the result set starting with that name. For example facets=psproperty-* would compute facets for psproperty-year, psproperty-genre, etc. if they occur in the result set.

Range facets (experimental)

Range facets can be used to count results in multiple ranges for string, numeric or date fields. The brackets have the following meaning:

  • [ = include start term for all ranges (e.g. [0;10;20;30} = 0-9.999, 10-19.999, 20-20.999])
  • ] = include end term in final range (e.g. [0;10;20;30] = 0-9.999, 10-19.999, 20-30])
  • {= exclude start term for all ranges (e.g. {0;10;20;30] = 0.001-10, 10.001-20, 20.001-30])
  • }= exclude end term in final range (e.g. {0;10;20;30} = 0.001-10, 10.001-20, 20.001-20.999])

Example:

facets=pstitle-sort:[a;l;zz],psxrefcount:[0;1;10;100;1000;]
facets=psmodifieddate:[2016-01-01;2017-01-01;]

Interval facets (experimental)

Interval facets can be used to count results within certain intervals for string, numeric or date fields. The format is x;y|z or x|z where x is the start term, y is the optional ending term and z is the interval (1 = 1 character/number, 2=2 characters/numbers, etc. and 1Y= 1 year, 1M = 1 month, 1D = 1 day, etc.). The brackets have the following meaning:

  • [ = include start term for all intervals (e.g. [0;30|10} = 0-9.999, 10-19.999, 20-20.999])
  • ] = include end term in final interval (e.g. [0;30|10] = 0-9.999, 10-19.999, 20-30])
  • {= exclude start term for all intervals (e.g. {0;30|10] = 0.001-10, 10.001-20, 20.001-30])
  • }= exclude end term in final interval (e.g. {0;30|10} = 0.001-10, 10.001-20, 20.001-20.999])

Example:

facets=pstitle-sort:[a|1]
facets=psmodifieddate:[2016-01-01T00:00:00Z|1M]
facets=pstitle-sort:[a;zz|5],psxrefcount:[0;100|10]
facets=psmodifieddate:[2016-01-01T00:00:00Z;2017-01-01T00:00:00Z|1M]

Numeric intervals must have a start and end. If there is no end the interval is measured forward and backward from the start.

For string intervals the interval only applies to the first character (e.g. [aa|1] = aa, ba, ca, etc.)

Warning!

Range and interval facets are experimental features which may change in future so should not be used on production systems.

Computing method

There are two methods for computing the cardinality of the facets:

  • Including the facet from the base query in the computation (result facet)
  • Excluding the facet from the base query in the computation (flexible facet)

Result facets

These facets are more efficient to compute, but it does not allow the user to move "across" result sets or to provide UI to combine facets of the same field.  They are useful to help users to narrow or widen their search since the cardinality of the facet counts the results in the current result set.

Example:

Filterfacetsfacet valuesResult set
(empty)animal,colorbird:2
cat:1
dog:1
black:2
yellow:1
brown:1
1, bird A, yellow
2, bird B, black
3, cat C, black
4, dog D, brown
animal:birdanimal,colorbird:2black:1
yellow:1
1, bird A, yellow
2, bird B,  black
color:blackanimal,colorbird:1
cat:1
black:22, bird B,  black
3, cat C, black
animal:bird,color:blackanimal,colorbird:1black:12, bird B,  black

Flexible facets

In the case of flexible facets, the base query is recomputed so that the field is excluded from the query if the field is used in the filter. The cardinality of the facet counts the matching results in the result set using the recomputed filter.

Example:

Filterfexible-facetsfacet valuesResult set
(empty)animal,colorbird:2
cat:1
dog:1
black:2
yellow:1
brown:1
1, bird A, yellow
2, bird B, black
3, cat C, black
4, dog D, brown
animal:birdanimal,colorbird:2
cat:1
dog:1
black:1
yellow:1
1, bird A, yellow
2, bird B,  black
color:blackanimal,colorbird:1
cat:1
black:2
yellow:1
brown:1
2, bird B,  black
3, cat C, black
animal:bird,color:blackanimal,colorbird:1
cat:1
black:1
yellow:1
2, bird B,  black

 

Filters

Filters are specific terms that the query should match, they are specified as a comma-separated list of Lucene index terms using the filters parameter as expressed as: [field]:[term].

For example, the select parameter value psauthor:john,pspriority:High will only display results by author 'john' and have priority 'High'.

When selecting facets, the Lucene query produced will be the equivalent of:

+(facet1:value1) +(facet2:value2 facet2:value3) ...

The default behaviour of filters is to use:

  • AND for different field names
  • OR for the same field

Also, selected facets can be prefixed with '+' (required) and '-' (prohibited) to further qualify the facet.

UserMeansLucene predicate
label:abc,label:xyzlabel:abc OR label:xyzlabel:abc label:xyz
+label:abc,+label:xyzlabel:abc AND label:xyz+label:abc,+label:xyz
label:abc,label:xyz,type:mno(label:abc OR label:xyz) AND type:mnolabel:abc label:xyz type:mno
label:abc,-type:mnolabel:abc NOT type:mnolabel:abc -type:mno

 

Ranges

The ranges parameter can be used to filter using a set of ranges. It is a comma separated list of range searches with format:

field:[|{(lower)?;(upper)?}|]

where [ or ] means include limit value and { or } means exclude limit value. For example:

ranges=psproperty-expires:[2015-03-20;2016-01-01],psproperty-title:[A;C}
ranges=psxrefcount:{50;],psreversexrefcount:[;10]

Date ranges

The service does not support the from, last or  to parameters.  Use the ranges parameter to perform a date range search on the last modified date. See example below.

GenericSearchRanges
from=2017-04-03T14:00psproperty-expires:[2015-04-03T14:00:00;]
from=2017-04-03&to=2017-05-03psproperty-expires:[2015-04-03;2017-05-03]

To replace the last parameter, you need to compute the lower end of the range from the current date and time.

Size ranges

The service does not support the min-size or max-size parameters. Use a range search on the pssize field instead.

Organizing search results

Search results can be paged and ordered.

The page and page-size controls which part of the search results will be returned. They must be positive integer values and will default to 1 and 100 respectively.

The sortby parameter is a comma separated list of fields to sort the results; the results are sorted by relevance if no sortby parameter is specified.

Parameters

NameDescriptionRequiredTypeDefault value
facetsA comma-separated list of fields to use as result facetsnostrings
facetsizeThe max number of facet values to load (max 1000)nointeger10
filtersA comma-separated list of field:term pairs to use as filtersnostrings
flexiblefacetsA comma-separated list of fields to use as flexible facetsnostrings
pageThe current page to viewnointeger1
pagesizeHow many results does a page containnointeger100
questionThe question to search fornostring
questionfieldsA comma-separated list of fields to search the question innostringspstitle,pscontent
rangesA comma-separated list of range searchesnostrings
sortfieldsA comma-separated list of fields to sort the resultsnostrings
suggestsizeThe max number of suggestions to load (only for question query)nointeger20

Escaping characters

In facets, flexiblefacets, filters and ranges parameters the characters ',', ';', '|' and '\' must be preceded by backslash (e.g. pstitle:one\,two\\three\|four\;five means the term one,two\three|four;five ).

question

An optional parameter to specify terms to search in the text. It can include boolean operators AND, OR; phrases and grouping using brackets.

question-fields

A comma-separated list of fields to search for the question. Each field must match matching ^([a-zA-Z0-9\-]+)$. If not specified, the default fields for the question are "pstitle,pscontent".

facets

A comma-separated list of field names to include as facets of the current result set.

flexiblefacets

A comma-separated list of field names to include as facets of the result set excluding any filter based on these field names.

facetsize

The maximum number of facet values returned per facet or flexible facet. Setting this to zero should improve performance as no values are computed, but the has-results attribute is still returned.

filters

A comma-separated list of terms to select facets.

ranges

A comma separated list of range searches.

sort-fields

A comma-separated list of fields to sort the results.

suggest-size

The maximum number of suggestions.

page

The current page to view.

page-size

How many results does a page contain.

Permission

Undocumented permissions requirements.

Response

 The XML response is:

<search>
  <query> ... </query>
  <suggestions> ... </suggestions>
  <facets> ... </facets>
  <results>  ... </results>
</search>

Query

<query empty="false" predicate="[lucene_predicate]">
   <question empty="false">
      <field boost="1.0" name="pscontent"/>
      <field boost="1.0" name="psfilename"/>
      <field boost="1.0" name="psid"/>
      <field boost="1.0" name="psdocid"/>
      <field boost="1.0" name="pstitle"/>
      <text>hello</text>
   </question>
   <filters>
      <filter field="[field_name]">
         <term text="[value1]" occur="[must|must_not|should]"/>
         <term text="[value2]" occur="[must|must_not|should]"/>
      </filter>
      ...
   </filters>
   <ranges>
      <date-range field="[field_name]" from="[from]" to="[to]"/>
      <numeric-range type="integer" field="[field_name]"
                     min="[min]" min-included="[true|false]"
                     max="[max]" max-included="[true|false]"/>
      ...
   </ranges>
   <sort by="relevance"/>
</query>

Facets

<facets>
  <facet name="[field]" type="field"
        flexible="[true|false]" has-results="[true|false]"
        total-terms="[total_number_of_terms_for_facets]">
    <term text="comment" cardinality="1098"/>
    <term text="task" cardinality="388"/>
    <term text="document" cardinality="168"/>
    ...
  </facet>
  ...
</facet>

Results

<results page="3" page-size="100" total-pages="7"
         total-results="645" first-result="201" last-result="300">
   <result score="2.34">
      <field|extract name="[name]">[value]</field>
      ...
   </result>
   ...
</results>

Error Handling

0x1501 If there is an invalid group specified
0x1502 If there are no indexes selected
0x1503 If the predicate is not valid
0x1504 If a sort field cannot be used for sorting
0x1505 If the facet size specified is bigger than the global property for face size ( maxFacetSize)
0x1506 If a numeric field is specified as a facet

Created on , last edited on