Commit graph

38 commits

Author SHA1 Message Date
Julie Tibshirani
509b636d25
Correct docs for multi_match scoring (#91430)
The documentation claimed that for the most_fields type, the score is equal to
the sum of all matches divided by the number of matches. This is not correct,
we actually don't divide by the number of matches.

This line in the documentation was added several years ago as part of a large
PR, and was likely just a mistake.
2022-11-10 08:16:56 -08:00
Julie Tibshirani
3c1b070329
Avoid negative scores with cross_fields type (#89016)
The cross_fields scoring type can produce negative scores when some documents
are missing fields. When blending term document frequencies, we take the maximum
document frequency across all fields. If one field appears in fewer documents
than another, this means that its IDF can become negative. This is because IDF
is calculated as `Math.log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))`

This change adjusts the docFreq for each field to `Math.min(docCount, docFreq)`
so that the IDF can never become negative. It makes sense that the term document
frequency should never exceed the number of documents containing the field.
2022-09-06 13:02:24 -07:00
James Rodewig
dd1ed30731
[DOCS] Fix combined_fields query ref in multi_match query docs (#81456)
The current `multi_match` docs contain an erroneous reference to the `combined_fields` query. This updates the reference to reference the correct query.

Relates to https://github.com/elastic/elasticsearch/pull/76893
2021-12-07 16:47:44 -05:00
Adam Locke
1056c857ee
[DOCS] Update combined fields wording (#76893)
* [DOCS] Update combined fields wording

* Clarifications from review feedback
2021-08-26 13:16:55 -04:00
Adrien Grand
feb6620d14
indices.query.bool.max_clause_count now limits all query clauses (#75297)
In the upcoming Lucene 9 release, `indices.query.bool.max_clause_count` is
going to apply to the entire query tree rather than per `bool` query. In order
to avoid breaks, the limit has been bumped from 1024 to 4096.

The semantics will effectively change when we upgrade to Lucene 9, this PR
is only about agreeing on a migration strategy and documenting this change.

To avoid further breaks, I am leaning towards keeping the current setting name
even though it contains `bool`. I believe that it still makes sense given that
`bool` queries are typically the main contributors to high numbers of clauses.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-07-21 12:16:30 +02:00
Julie Tibshirani
318bf14126
Introduce combined_fields query (#71213)
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.

This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.

Addresses #41106.
2021-04-14 13:33:19 -07:00
James Rodewig
693807a6d3
[DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
Julie Tibshirani
da668e134a
Correct cross_fields docs on how analyzer groups are combined. (#69936)
When performing a multi_match in cross_fields mode, we group fields based on
their analyzer and create a blended query per group. Our docs claimed that the
group scores were combined through a boolean query, but they are actually
combined through a dismax that incorporates the tiebreaker parameter.

This commit updates the docs and adds a test verifying the behavior.
2021-03-08 14:56:17 -08:00
Josh Devins
9b8b20a32b
[DOCS] Clarifies the effect of per-field boosting (#63733)
The original description of per-field boosting is incorrect. Boosting a
field does not imply that it is more important relative to other fields.
It simply means that the score is multiplied by the supplied boost
value. Due to the differences in each field's term and document
statistics, it's not possible to imply relative importance of fields
based on the per-field boost value alone.
2020-10-15 09:24:32 -04:00
Dan Hermann
9397510778
[DOCS] Update tie_breaker defaults for bool_prefix and most_fields query types (#61112) 2020-08-19 07:55:54 -05:00
homersimpsons
b0cc62f69e
[DOCS] Fix rewrite => fuzzy_rewrite in multi match query docs (#60175) 2020-07-27 12:13:33 -04:00
James Rodewig
2774cd6938
[DOCS] Swap [float] for [discrete] (#60124)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 11:48:22 -04:00
Mayya Sharipova
620996287a
Remove docs related to index time boosting (#51704)
As there is no really index time boosting,
as boost is only applied during query time,
this removes mentions of index time boosting.
2020-01-31 09:01:52 -05:00
James Rodewig
5c78f606c2
[DOCS] Change // CONSOLE comments to [source,console] (#46440) 2019-09-09 10:45:37 -04:00
James Rodewig
ec37a9cea0
[DOCS] Make Query DSL titles consistent (#43935) 2019-07-18 10:18:11 -04:00
Marios Trivyzas
69993049a8
[Docs] Fix reference to boost and slop params (#42803)
For `multi_match` query: link `boost` param to the generic reference
for query usage and `slop` to the `match_phrase` query where its usage
is documented.

Fixes: #40091
2019-06-03 22:56:39 +02:00
Marios Trivyzas
6dd4d2b7a6
Remove CommonTermsQuery and cutoff_frequency param (#42654)
Remove `common` query and `cutoff_frequency` parameter of
`match` and `multi_match` queries. Both have already been
deprecated for the next 7.x version.

Closes: #37096
2019-05-31 17:06:06 +02:00
James Rodewig
adf67053f4
[DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:19:09 -04:00
Andy Bristol
d51cbc664e
fix summary of phrase_prefix scoring (#40567)
The language here implies that phrase_prefix scoring works like
most_fields, but it actually works like best_fields
2019-04-01 12:03:25 -07:00
Andy Bristol
6bba9fc83b
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried

Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.

The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
2019-03-27 10:03:30 -07:00
Christoph Büscher
113af7996c
Make limit on number of expanded fields configurable (#35284)
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in #26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.

Closes #34778
2018-11-08 17:04:40 +01:00
Abdon Pijpelink
32ee6148d2 [DOCS] Clarify scoring for multi_match phrase type (#32672)
The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.
2018-09-18 16:57:33 +02:00
Jim Ferenczi
53462f6499
Make fields optional in multi_match query and rely on index.query.default_field by default (#27380)
* Make fields optional in multi_match query and rely on index.query.default_field by default

This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.
2017-11-17 10:25:21 +01:00
Jim Ferenczi
792641a6e3 [Docs] #26541: add warning regarding the limit on the number of fields that can be queried at once in the multi_match query. 2017-10-30 18:03:56 +01:00
Alexander Kazakov
9c95e91471 Expose fuzzy_transpositions parameter in fuzzy queries (#26870)
Add fuzzy_transpositions parameter to multi_match and query_string queries.
Add fuzzy_transpositions, fuzzy_prefix_length and fuzzy_max_expansions
parameters to simple_query_string query.
2017-10-05 09:01:09 +02:00
Jim Ferenczi
a7e1610134 Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string (#26097)
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string

This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)

Note how the multi terms synonym "new york" produces a phrase query.
2017-08-09 12:15:09 +02:00
David Pilato
475a7ca84f Add documentation for lenient in multimatch
`lenient` option is documented for `match` query but not for `multi_match` query.
2016-11-17 08:35:20 +01:00
David Pilato
c946094d5b Add documentation for lenient in multimatch
`lenient` option is documented for `match` query but not for `multi_match` query.
2016-11-16 16:15:28 +01:00
Isabel Drost-Fromm
4c02e97bcd Add back doc execution to query dsl.
Relates to #18211

This reverts commit 20aafb1196.
2016-05-24 12:43:41 +02:00
Isabel Drost-Fromm
20aafb1196 Revert "Add Autosense annotation for query dsl testing" 2016-05-17 20:55:56 +02:00
Isabel Drost-Fromm
2d402c732c Merge branch 'master' into docs/add_autosense_to_query_dsl 2016-05-17 11:59:50 +02:00
Christoph Büscher
a40c397c67 Don't allow fuzziness for multi_match types cross_fields, phrase and phrase_prefix
Currently `fuzziness` is not supported for the `cross_fields` type
of the `multi_match` query since it complicates the logic that
blends the term queries that cross_fields uses internally. At the
moment using this combination is silently ignored, which can lead to
confusions. Instead we should throw an exception in this case.
The same is true for phrase and phrase_prefix type.

Closes #7764
2016-05-13 17:32:14 +02:00
Isabel Drost-Fromm
a865090cf3 CONSOLE is the new AUTOSENSE 2016-05-10 12:42:17 +02:00
Isabel Drost-Fromm
e486560ea8 Add Autosense annotation for query dsl testing
this adds the autosense annotation to a couple of query dsl
docs files and fixes the snippets to work in the tests along
the way.
2016-05-10 11:54:48 +02:00
Adrien Grand
8d5fff37ae multi_match query applies boosts too many times.
The `multi_match` query groups terms that have the same analyzer together and
then applies the boost of the first query in each group. This is not necessary
given that boosts for each term are already applied another way.
2015-08-06 19:07:12 +02:00
Clinton Gormley
0b0846f84b Updated multi-match-query.asciidoc
Corrected note about which field is boosted in a cross-fields multi_match query.

Relates to #12294
2015-08-05 10:52:56 +02:00
Clinton Gormley
171687d207 Docs: Reorganised the Query DSL docs into families and explaing query vs filter context 2015-06-04 01:59:37 +02:00
Adrien Grand
a0af88e996 Query DSL: Remove filter parsers.
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
2015-05-07 20:14:34 +02:00
Renamed from docs/reference/query-dsl/queries/multi-match-query.asciidoc (Browse further)