mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
190 lines
6 KiB
Text
190 lines
6 KiB
Text
[[query-dsl-combined-fields-query]]
|
|
=== Combined fields
|
|
++++
|
|
<titleabbrev>Combined fields</titleabbrev>
|
|
++++
|
|
|
|
The `combined_fields` query supports searching multiple text fields as if their
|
|
contents had been indexed into one combined field. The query takes a term-centric
|
|
view of the input string: first it analyzes the query string into individual terms,
|
|
then looks for each term in any of the fields. This query is particularly
|
|
useful when a match could span multiple text fields, for example the `title`,
|
|
`abstract`, and `body` of an article:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"combined_fields" : {
|
|
"query": "database systems",
|
|
"fields": [ "title", "abstract", "body"],
|
|
"operator": "and"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
The `combined_fields` query takes a principled approach to scoring based on the
|
|
simple BM25F formula described in
|
|
http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf[The Probabilistic Relevance Framework: BM25 and Beyond].
|
|
When scoring matches, the query combines term and collection statistics across
|
|
fields to score each match as if the specified fields had been indexed into a
|
|
single, combined field. This scoring is a best attempt; `combined_fields` makes
|
|
some approximations and scores will not obey the BM25F model perfectly.
|
|
|
|
// tag::max-clause-limit[]
|
|
[WARNING]
|
|
.Field number limit
|
|
===================================================
|
|
By default, there is a limit to the number of clauses a query can contain. This
|
|
limit is defined by the
|
|
<<indices-query-bool-max-clause-count,`indices.query.bool.max_clause_count`>>
|
|
setting, which defaults to `4096`. For `combined_fields` queries, the number of
|
|
clauses is calculated as the number of fields multiplied by the number of terms.
|
|
===================================================
|
|
// end::max-clause-limit[]
|
|
|
|
==== Per-field boosting
|
|
|
|
Field boosts are interpreted according to the combined field model. For example,
|
|
if the `title` field has a boost of 2, the score is calculated as if each term
|
|
in the title appeared twice in the synthetic combined field.
|
|
|
|
[source,console]
|
|
----
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"combined_fields" : {
|
|
"query" : "distributed consensus",
|
|
"fields" : [ "title^2", "body" ] <1>
|
|
}
|
|
}
|
|
}
|
|
----
|
|
<1> Individual fields can be boosted with the caret (`^`) notation.
|
|
|
|
NOTE: The `combined_fields` query requires that field boosts are greater than
|
|
or equal to 1.0. Field boosts are allowed to be fractional.
|
|
|
|
[[combined-field-top-level-params]]
|
|
==== Top-level parameters for `combined_fields`
|
|
|
|
`fields`::
|
|
(Required, array of strings) List of fields to search. Field wildcard patterns
|
|
are allowed. Only <<text,`text`>> fields are supported, and they must all have
|
|
the same search <<analyzer,`analyzer`>>.
|
|
|
|
`query`::
|
|
+
|
|
--
|
|
(Required, string) Text to search for in the provided `<fields>`.
|
|
|
|
The `combined_fields` query <<analysis,analyzes>> the provided text before
|
|
performing a search.
|
|
--
|
|
|
|
`auto_generate_synonyms_phrase_query`::
|
|
+
|
|
--
|
|
(Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
|
|
queries are automatically created for multi-term synonyms. Defaults to `true`.
|
|
|
|
See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
|
|
example.
|
|
--
|
|
|
|
`operator`::
|
|
+
|
|
--
|
|
(Optional, string) Boolean logic used to interpret text in the `query` value.
|
|
Valid values are:
|
|
|
|
`or` (Default)::
|
|
For example, a `query` value of `database systems` is interpreted as `database
|
|
OR systems`.
|
|
|
|
`and`::
|
|
For example, a `query` value of `database systems` is interpreted as `database
|
|
AND systems`.
|
|
--
|
|
|
|
`minimum_should_match`::
|
|
+
|
|
--
|
|
(Optional, string) Minimum number of clauses that must match for a document to
|
|
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
|
|
parameter>> for valid values and more information.
|
|
--
|
|
|
|
`zero_terms_query`::
|
|
+
|
|
--
|
|
(Optional, string) Indicates whether no documents are returned if the `analyzer`
|
|
removes all tokens, such as when using a `stop` filter. Valid values are:
|
|
|
|
`none` (Default)::
|
|
No documents are returned if the `analyzer` removes all tokens.
|
|
|
|
`all`::
|
|
Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
|
|
query.
|
|
|
|
See <<query-dsl-match-query-zero>> for an example.
|
|
--
|
|
|
|
===== Comparison to `multi_match` query
|
|
|
|
The `combined_fields` query provides a principled way of matching and scoring
|
|
across multiple <<text, `text`>> fields. To support this, it requires that all
|
|
fields have the same search <<analyzer,`analyzer`>>.
|
|
|
|
If you want a single query that handles fields of different types like
|
|
keywords or numbers, then the <<query-dsl-multi-match-query,`multi_match`>>
|
|
query may be a better fit. It supports both text and non-text fields, and
|
|
accepts text fields that do not share the same analyzer.
|
|
|
|
The main `multi_match` modes `best_fields` and `most_fields` take a
|
|
field-centric view of the query. In contrast, `combined_fields` is
|
|
term-centric: `operator` and `minimum_should_match` are applied per-term,
|
|
instead of per-field. Concretely, a query like
|
|
|
|
[source,console]
|
|
----
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"combined_fields" : {
|
|
"query": "database systems",
|
|
"fields": [ "title", "abstract"],
|
|
"operator": "and"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
is executed as:
|
|
|
|
[source,txt]
|
|
----
|
|
+(combined("database", fields:["title" "abstract"]))
|
|
+(combined("systems", fields:["title", "abstract"]))
|
|
----
|
|
|
|
In other words, each term must be present in at least one field for a
|
|
document to match.
|
|
|
|
The `cross_fields` `multi_match` mode also takes a term-centric approach and
|
|
applies `operator` and `minimum_should_match per-term`. The main advantage of
|
|
`combined_fields` over `cross_fields` is its robust and interpretable approach
|
|
to scoring based on the BM25F algorithm.
|
|
|
|
[NOTE]
|
|
.Custom similarities
|
|
===================================================
|
|
The `combined_fields` query currently only supports the BM25 similarity,
|
|
which is the default unless a <<index-modules-similarity, custom similarity>>
|
|
is configured. <<similarity, Per-field similarities>> are also not allowed.
|
|
Using `combined_fields` in either of these cases will result in an error.
|
|
===================================================
|