Advanced

Advanced topics

Servlet: GenericSearch

com.pageseeder.search.GenericSearch

Description

Search the specified index using the Lucene search engine.

Select the index

The current implementation of PageSeeder produces one Lucene index per group. Use the groups parameter to specify which Lucene index to search. The groups parameter can be either a group id or a group name.

Note that while it is possible to search across multiple groups, this class is currently not capable of collating the results from different indexes; specifying multiple group is therefore not recommended.

The question

The main parameters for this servlet are question and fields which will form the main predicate for the search.

The question is typically one or multiple terms, separated with spaces, and may also come directly from the user.

Each term in the question will be converted into Lucene Terms for each field specified in the fields parameters. The fields parameter would typically include the title ( pstitle) and content ( pscontent ) for a full text search. The fields should be indexed by Lucene; and it is preferable that they are analyzed. In order to generate extracts, they must be stored.

The Lucene query produced will be the equivalent of:

+(field1:term1 field1:term2 field2:term1 field2:term2 ...)

Document types

The types parameter can be used to choose which types of Lucene documents can be returned as a comma separated list of values; each value should match the value of a 'pstype' field. Valid types include document, comment, task .

The Lucene query produced will be the equivalent of:

+(psdocument:type1 psdocument:type2 ...)

Using facets

This servlet can be used for a faceted search.

Facets cardinality, that is the number of search results matching a facet within the current results is calculated automatically based on the values of the fields specified as facets. Use the facets to specify which fields to use in the index; generally, it is preferable to use fields which are indexed but not analyzed.

As an example, if the facets parameter include the 'psauthor' field, this servlet will calculate for each possible 'psauthor' field value how many results within the current results match each facet value or the 'psauthor' field.

Specifying facets to compute does not affect the search results.

To affect search results, a particular facet must be selected with the select parameter. The select parameter is a comma separated list of Lucene index terms including the field name: [field]:[term] .

For example, the select parameter value psauthor:john,pspriority:high will only display results by author 'john' and have priority 'high'.

When selecting facets, the Lucene query produced will be the equivalent of:

+facet1:value1 +facet2:value2 ...)

Note

In the select parameter commas inside terms can be escaped with \. For example psauthor:smith\,john,pspriority:high

Date filtering

Results can be filtered by date by using either the from and to parameters or the last parameter.

The from and to parameters use dates formatted as ISO-8601 standard (extended format only), for example 2010-10-25, 2010-10-25T12:26. Open ended date ranges are possible if one of from or to is specified.

The last parameter automatically generates the date range based on the current date using the specified duration; the duration should match: [span][unit] , where [span] is the length of time (digits only) and [unit] is the quantifier to use, valid units are values are "years", "months", "days", "hours", "minutes" and "seconds".

Examples: 30years, 15days, 2hours .

When filtering by date, the Lucene query produced will be the equivalent of:

[psdate:start TO psdate:end]

Note

When using date filtering, only results which have a date will be returned; in other words, if the Lucene document does not have a date, it will not be considered by the search.

Size filtering

The min-size and max-size parameters can be used to filter documents by byte size, values must be positive integer values. This parameter only applies to URIs (pstype=document) which do have a byte size, such as images, videos, etc.

When filtering by size, the Lucene query produced will be the equivalent of:

[pssize:minsize TO pssize:maxsize]

Note

When using size filtering, only results which have a size will be returned.

Ranges

The ranges parameter can be used to filter using a set of ranges. It is a comma separated list of range searches with format:

field:[|{(lower)?;(upper)?}|]

where [ or ] means include limit value and { or } means exclude limit value. For example:

ranges=psproperty-expires:[2015-03-20;2016-01-01],psproperty-title:[A;C}
ranges=psxrefcount:{50;],psreversexrefcount:[;10]

 

Organizing search results

Results can be paged and ordered.

The page and page-size controls which part of the search results will be returned. They must be positive integer values and will default to 1 and 100 respectively.

The sortby parameter is a comma separated list of fields to sort the results; the results are sorted by relevance if no sortby parameter is specified.

HTTP Method: GET

Same as POST method.

HTTP Method: POST

Performs a search based on the specified parameters and returns the search results as XML.

HTTP Parameters

NameDescriptionRequiredTypeDefault
groupsA list of groups to search (comma-separated list of IDs or names)yesstrings
questionThe query entered by the user (not a Lucene query)nostring
fieldsA comma separated list of fields to search for the questionnostrings
typesA comma separated list of the types of documents to search,noenums
facetsA comma separated list of fields to include as facetsnostrings
facet-sizeThe maximum number of facet values returned per facetnostring10
rangesA comma separated list of range searches (format is field:[|{(lower)? ; (upper)?}|] where { or } means exclude limit value)nostrings
selectA comma separated list of terms to select facetsnostrings
sortbyA comma separated list of fields to sort the results, results are sorted by relevance if unspecifiednostrings
fromAn ISO-8601 date to specify the start of a date range filter,nodate
toAn ISO-8601 date to specify the end of a date range filter,nodate
lastA duration to filter the resultnoenum
min-sizeThe minimum size the document must havenoint
max-sizeThe maximum size the document must havenoint
pageThe current page to viewnoint1
page-sizeHow many results does a page containnoint100

types

Supported types are 

last

The format of the parameter must be as follows [span][unit], where [span] is the length of time (digits) and [unit] is the quantifier to use, valid quantifiers are "years", "months", "days", "hours", "minutes" and "seconds".

Examples: "30years", "15days", "2hours", "5years"

Created on , last edited on