group search
API Support | Available since | Last updated | Output |
---|---|---|---|
5.9400 |
Description
The purpose of this service is to query the index of a single group by a question.
For additional information see Index fields and Index field definitions.
Question
This parameter performs a full-text search on the index. Mostly useful for tokenized fields such as the pstitle
and pscontent
. Typically single, or multiple terms, separated by spaces, the question
is usually based on input from the user.
The questionfields
parameter is a comma-separated list of fields to look for the values from the question
. By default, the terms in the question
are processed as a full text search – meaning title (pstitle
) and content (pscontent
). questionfields
are analyzed and indexed. When not searching the field, but retrieving the contents, a field must be stored.
The Lucene query produced is the equivalent of:
+(field1:term1 field1:term2 field2:term1 field2:term2 ...)
Boolean operators
Full-text search using the question parameter supports AND
and OR
(in capitals) using a lexer. The default remains AND
.
User Input | Normalised | Lucene predicate |
---|---|---|
cheese crackers | cheese AND crackers | +field :cheese +field :crackers |
rat OR mouse | rat OR mouse | field :rat field :mouse |
Phrases
Using double quotes around terms specifies a phrase. For example: "cheese crackers"
.
Grouping
Use brackets to express queries composed of multiple boolean operators. For example (rat OR mouse) AND cheese
.
Wildcard characters
The character ‘*’ means any zero or more characters and ‘?’ means any single character. For example the question sec*r??
would match secret and secured but not security. Requires PageSeeder v5.98 or newer.
Facets
This service supports parameters for computing facet values and filtering the results.
Cardinality is calculated automatically by counting the number of results that match a facet out of the overall results. Use the facets parameter to specify fields in the index, preferably using fields which are indexed but not analyzed.
An example of using the facets parameter would be to process the 'psauthor'
field. This would instruct the servlet to calculate all possible 'psauthor'
fields and the number of results for each unique value.
Specifying facet values to compute does not affect the search results. To apply a facet use filters.
Wildcard characters
Facet field names can end with a wildcard character '*' which means that facets will be computed for all fields in the result set starting with that name. For example facets=psproperty-*
would compute facets for psproperty-year
, psproperty-genre
, etc. if they occur in the result set.
Range facets (experimental)
Range facets can be used to count results in multiple ranges for string, numeric or date fields. The brackets have the following meaning:
[
= include start term for all ranges (e.g.[0;10;20;30}
= 0-9.999, 10-19.999, 20-20.999])]
= include end term in final range (e.g.[0;10;20;30]
= 0-9.999, 10-19.999, 20-30]){
= exclude start term for all ranges (e.g.{0;10;20;30]
= 0.001-10, 10.001-20, 20.001-30])}
= exclude end term in final range (e.g.{0;10;20;30}
= 0.001-10, 10.001-20, 20.001-20.999])
Example:
facets=pstitle-sort:[a;l;zz],psxrefcount:[0;1;10;100;1000;] facets=psmodifieddate:[2016-01-01;2017-01-01;]
Interval facets (experimental)
Interval facets can be used to count results within certain intervals for string, numeric or date fields. The format is x;y|z
or x|z
where x
is the start term, y
is the optional ending term and z
is the interval (1
= 1 character/number, 2
=2 characters/numbers, etc. and 1Y
= 1 year, 1M
= 1 month, 1D
= 1 day, etc.). The brackets have the following meaning:
[
= include start term for all intervals (e.g.[0;30|10}
= 0-9.999, 10-19.999, 20-20.999])]
= include end term in final interval (e.g.[0;30|10]
= 0-9.999, 10-19.999, 20-30]){
= exclude start term for all intervals (e.g.{0;30|10]
= 0.001-10, 10.001-20, 20.001-30])}
= exclude end term in final interval (e.g.{0;30|10}
= 0.001-10, 10.001-20, 20.001-20.999])
Example:
facets=pstitle-sort:[a|1] facets=psmodifieddate:[2016-01-01T00:00:00Z|1M] facets=pstitle-sort:[a;zz|5],psxrefcount:[0;100|10] facets=psmodifieddate:[2016-01-01T00:00:00Z;2017-01-01T00:00:00Z|1M]
Numeric intervals must have a start and end. If there is no end the interval is measured forward and backward from the start.
For string intervals the interval only applies to the first character (e.g. [aa|1]
= aa, ba, ca, etc.)
Range and interval facets are experimental features which may change in future so should not be used on production systems.
Computing cardinality
There are two methods for computing the cardinality of the facets:
- Including the facet from the base query in the computation (result facet)
- Excluding the facet from the base query in the computation (flexible facet)
Result facets
These facets are more efficient to compute, but it does not allow the user to move "across" result sets or to provide UI to combine facets of the same field. They are useful to help users to narrow or widen their search since the cardinality of the facet counts the results in the current result set.
Example:
Filter | facets | facet values | Result set | |
---|---|---|---|---|
(empty) | animal,color | bird:2 cat:1 dog:1 | black:2 yellow:1 brown:1 | 1, bird A, yellow 2, bird B, black 3, cat C, black 4, dog D, brown |
animal:bird | animal,color | bird:2 | black:1 yellow:1 | 1, bird A, yellow 2, bird B, black |
color:black | animal,color | bird:1 cat:1 | black:2 | 2, bird B, black 3, cat C, black |
animal:bird,color:black | animal,color | bird:1 | black:1 | 2, bird B, black |
Flexible facets
In the case of flexible facets, the base query is recomputed so that the field is excluded from the query if the field is used in the filter. The cardinality of the facet counts the matching results in the result set using the recomputed filter.
Example:
Filter | fexible-facets | facet values | Result set | |
---|---|---|---|---|
(empty) | animal,color | bird:2 cat:1 dog:1 | black:2 yellow:1 brown:1 | 1, bird A, yellow 2, bird B, black 3, cat C, black 4, dog D, brown |
animal:bird | animal,color | bird:2 cat:1 dog:1 | black:1 yellow:1 | 1, bird A, yellow 2, bird B, black |
color:black | animal,color | bird:1 cat:1 | black:2 yellow:1 brown:1 | 2, bird B, black 3, cat C, black |
animal:bird,color:black | animal,color | bird:1 cat:1 | black:1 yellow:1 | 2, bird B, black |
Filters
Filters are specific terms that the query should match, they are specified as a comma-separated list of Lucene index terms using the filters
parameter as expressed as: [field]:[term]
.
For example, the select parameter value psauthor:john,pspriority:High
will only display results by author 'john' and have priority 'High'.
When selecting facets, the Lucene query produced will be the equivalent of:
+(facet1:value1) +(facet2:value2 facet2:value3) ...
The default behaviour of filters is to use:
- AND for different field names
- OR for the same field
Also, selected facets can be prefixed with '+' (required) and '-' (prohibited) to further qualify the facet.
User | Means | Lucene predicate |
---|---|---|
label:abc,label:xyz | label:abc OR label:xyz | label:abc label:xyz |
+label:abc,+label:xyz | label:abc AND label:xyz | +label:abc,+label:xyz |
label:abc,label:xyz,type:mno | (label:abc OR label:xyz) AND type:mno | label:abc label:xyz type:mno |
label:abc,-type:mno | label:abc NOT type:mno | label:abc -type:mno |
When inserting ‘+’ in a URL it must be encoded as ‘%2B’ otherwise it will be interpreted as a space.
Ranges
The ranges parameter can be used to filter using a set of ranges. It is a comma separated list of range searches with format:
field:[|{(lower)?;(upper)?}|]
where [
or ]
means include limit value and {
or }
means exclude limit value. For example:
ranges=psproperty-expires:[2015-03-20;2016-01-01],psproperty-title:[A;C} ranges=psxrefcount:{50;],psreversexrefcount:[;10]
Date ranges
The service does not support the from
, last
or to
parameters. Use the ranges
parameter to perform a date range search on the last modified date. See example below.
GenericSearch | Ranges |
---|---|
from=2017-04-03T14:00 | psmodifieddate:[2015-04-03T14:00:00;] |
from=2017-04-03&to=2017-05-03 | psmodifieddate:[2015-04-03;2017-05-03] |
To replace the last
parameter, you need to compute the lower end of the range from the current date and time.
Size ranges
The service does not support the min-size
or max-size
parameters. Use a range search on the pssize
field instead.
Organizing search results
Search results can be paged and ordered.
The page
and page-size
controls which part of the search results will be returned. They must be positive integer values and will default to 1
and 100
respectively.
The sortby parameter is a comma separated list of fields to sort the results; the results are sorted by relevance if no sortby parameter is specified.
Parameters
Name | Description | Required | Type | Default value |
---|---|---|---|---|
facets | A comma-separated list of fields to use as result facets | no | strings | |
facetsize | The max number of facet values to load (ignored if larger than global property maxFacetSize - default 1000 ) | no | integer | 10 |
filters | A comma-separated list of field:term pairs to use as filters | no | strings | |
flexiblefacets | A comma-separated list of fields to use as flexible facets | no | strings | |
fieldsize | Maximum number of characters allowed in a result field (max 10000) | no | integer | 1000 |
page | The current page to view | no | integer | 1 |
pagesize | How many results does a page contain | no | integer | 100 |
question | The question to search for | no | string | |
questionfields | A comma-separated list of fields to search the question in | no | strings | pstitle,pscontent |
ranges | A comma-separated list of range searches | no | strings | |
sortfields | A comma-separated list of fields to sort the results | no | strings | |
suggestsize | The max number of suggestions to load (only for question query) | no | integer | 20 |
Escaping characters
In facets
, flexiblefacets
, filters
and ranges
parameters the characters ‘,
’, ‘;
’, ‘|
’ and ‘\
’ must be preceded by backslash (e.g. pstitle:one\,two\\three\|four\;five
means the term one,two\three|four;five
).
question
An optional parameter to specify terms to search in the text. It can include boolean operators AND, OR; phrases and grouping using brackets.
question-fields
A comma-separated list of fields to search for the question. Each field must match matching ^([a-zA-Z0-9\-]+)$
. If not specified, the default fields for the question are "pstitle,pscontent".
facets
A comma-separated list of field names to include as facets of the current result set.
flexiblefacets
A comma-separated list of field names to include as facets of the result set excluding any filter based on these field names.
facetsize
The maximum number of facet values returned per facet or flexible facet. Setting this to zero improves performance as no values are computed, but the has-results
attribute is still returned.
filters
A comma-separated list of terms to select facets.
ranges
A comma-separated list of range searches.
sort-fields
A comma-separated list of fields to sort the results. Prefix the field name with a minus (-) to use descending order, otherwise it is ascending. If not specified, results are sorted by relevance.
suggest-size
The maximum number of suggestions.
page
The current page to view.
page-size
How many results does a page contain.
Permission
This service is restricted to guest and higher unless the group is accessible to public.
Response
The XML response is:
<search indexes="[urls,][comma separated group IDs]" [reindexing="true"] [warning=""]> <query> ... </query> <suggestions> ... </suggestions> <facets> ... </facets> <results> ... </results> </search>
Query
<query empty="false" predicate="[lucene_predicate]"> <question empty="false"> <field boost="1.0" name="pscontent"/> <field boost="1.0" name="psfilename"/> <field boost="1.0" name="psid"/> <field boost="1.0" name="psdocid"/> <field boost="1.0" name="pstitle"/> <text>hello</text> </question> <filters> <filter field="[field_name]"> <term text="[value1]" occur="[must|must_not|should]"/> <term text="[value2]" occur="[must|must_not|should]"/> </filter> ... </filters> <ranges> <date-range field="[field_name]" from="[from]" to="[to]"/> <numeric-range type="integer" field="[field_name]" min="[min]" min-included="[true|false]" max="[max]" max-included="[true|false]"/> ... </ranges> <sort by="relevance"/> </query>
Suggestions
<suggestions> <suggestion question="get" cardinality="510"/> <suggestion question="list" cardinality="408"/> <suggestion question="must" cardinality="360"/> ... </suggestions>
Facets
<facets> <facet name="[field]" type="field" flexible="[true|false]" has-results="[true|false]" total-terms="[total_number_of_terms_for_facets]" [datatype="[data type]"]> <term text="comment" cardinality="1098"/> <term text="task" cardinality="388"/> <term text="document" cardinality="168"/> ... </facet> ... </facet>
Results
<results page="3" page-size="100" total-pages="7" total-results="645" first-result="201" last-result="300"> <result score="2.34"> <field|extract name="[name]">[value]</field> ... </result> ... </results>
Error Handling
0x1501 | If there is an invalid group specified |
---|---|
0x1502 | If there are no indexes selected |
0x1503 | If the predicate is not valid |
0x1504 | If a sort field cannot be used for sorting |
0x1505 | If the facet size specified is bigger than the global property for face size ( maxFacetSize) |
0x1506 | If a numeric field is specified as a facet |
0x6502 | Missing catalog, re-index group |