elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-24 23:27:25 -04:00

Author	SHA1	Message	Date
Julie Tibshirani	509b636d25	Correct docs for multi_match scoring (#91430 ) The documentation claimed that for the most_fields type, the score is equal to the sum of all matches divided by the number of matches. This is not correct, we actually don't divide by the number of matches. This line in the documentation was added several years ago as part of a large PR, and was likely just a mistake.	2022-11-10 08:16:56 -08:00
Julie Tibshirani	3c1b070329	Avoid negative scores with cross_fields type (#89016 ) The cross_fields scoring type can produce negative scores when some documents are missing fields. When blending term document frequencies, we take the maximum document frequency across all fields. If one field appears in fewer documents than another, this means that its IDF can become negative. This is because IDF is calculated as `Math.log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))` This change adjusts the docFreq for each field to `Math.min(docCount, docFreq)` so that the IDF can never become negative. It makes sense that the term document frequency should never exceed the number of documents containing the field.	2022-09-06 13:02:24 -07:00
James Rodewig	dd1ed30731	[DOCS] Fix `combined_fields` query ref in `multi_match` query docs (#81456 ) The current `multi_match` docs contain an erroneous reference to the `combined_fields` query. This updates the reference to reference the correct query. Relates to https://github.com/elastic/elasticsearch/pull/76893	2021-12-07 16:47:44 -05:00
Adam Locke	1056c857ee	[DOCS] Update combined fields wording (#76893 ) * [DOCS] Update combined fields wording * Clarifications from review feedback	2021-08-26 13:16:55 -04:00
Adrien Grand	feb6620d14	`indices.query.bool.max_clause_count` now limits all query clauses (#75297 ) In the upcoming Lucene 9 release, `indices.query.bool.max_clause_count` is going to apply to the entire query tree rather than per `bool` query. In order to avoid breaks, the limit has been bumped from 1024 to 4096. The semantics will effectively change when we upgrade to Lucene 9, this PR is only about agreeing on a migration strategy and documenting this change. To avoid further breaks, I am leaning towards keeping the current setting name even though it contains `bool`. I believe that it still makes sense given that `bool` queries are typically the main contributors to high numbers of clauses. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-07-21 12:16:30 +02:00
Julie Tibshirani	318bf14126	Introduce `combined_fields` query (#71213 ) This PR introduces a new query called `combined_fields` for searching multiple text fields. It takes a term-centric view, first analyzing the query string into individual terms, then searching for each term any of the fields as though they were one combined field. It is based on Lucene's `CombinedFieldQuery`, which takes a principled approach to scoring based on the BM25F formula. This query provides an alternative to the `cross_fields` `multi_match` mode. It has simpler behavior and a more robust approach to scoring. Addresses #41106.	2021-04-14 13:33:19 -07:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
Julie Tibshirani	da668e134a	Correct cross_fields docs on how analyzer groups are combined. (#69936 ) When performing a multi_match in cross_fields mode, we group fields based on their analyzer and create a blended query per group. Our docs claimed that the group scores were combined through a boolean query, but they are actually combined through a dismax that incorporates the tiebreaker parameter. This commit updates the docs and adds a test verifying the behavior.	2021-03-08 14:56:17 -08:00
Josh Devins	9b8b20a32b	[DOCS] Clarifies the effect of per-field boosting (#63733 ) The original description of per-field boosting is incorrect. Boosting a field does not imply that it is more important relative to other fields. It simply means that the score is multiplied by the supplied boost value. Due to the differences in each field's term and document statistics, it's not possible to imply relative importance of fields based on the per-field boost value alone.	2020-10-15 09:24:32 -04:00
Dan Hermann	9397510778	[DOCS] Update tie_breaker defaults for bool_prefix and most_fields query types (#61112 )	2020-08-19 07:55:54 -05:00
homersimpsons	b0cc62f69e	[DOCS] Fix `rewrite` => `fuzzy_rewrite` in multi match query docs (#60175 )	2020-07-27 12:13:33 -04:00
James Rodewig	2774cd6938	[DOCS] Swap `[float]` for `[discrete]` (#60124 ) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks	2020-07-23 11:48:22 -04:00
Mayya Sharipova	620996287a	Remove docs related to index time boosting (#51704 ) As there is no really index time boosting, as boost is only applied during query time, this removes mentions of index time boosting.	2020-01-31 09:01:52 -05:00
James Rodewig	5c78f606c2	[DOCS] Change // CONSOLE comments to [source,console] (#46440 )	2019-09-09 10:45:37 -04:00
James Rodewig	ec37a9cea0	[DOCS] Make Query DSL titles consistent (#43935 )	2019-07-18 10:18:11 -04:00
Marios Trivyzas	69993049a8	[Docs] Fix reference to `boost` and `slop` params (#42803 ) For `multi_match` query: link `boost` param to the generic reference for query usage and `slop` to the `match_phrase` query where its usage is documented. Fixes: #40091	2019-06-03 22:56:39 +02:00
Marios Trivyzas	6dd4d2b7a6	Remove CommonTermsQuery and cutoff_frequency param (#42654 ) Remove `common` query and `cutoff_frequency` parameter of `match` and `multi_match` queries. Both have already been deprecated for the next 7.x version. Closes: #37096	2019-05-31 17:06:06 +02:00
James Rodewig	adf67053f4	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:19:09 -04:00
Andy Bristol	d51cbc664e	fix summary of phrase_prefix scoring (#40567 ) The language here implies that phrase_prefix scoring works like most_fields, but it actually works like best_fields	2019-04-01 12:03:25 -07:00
Andy Bristol	6bba9fc83b	search as you type fieldmapper (#35600 ) Adds the search_as_you_type field type that acts like a text field optimized for as-you-type search completion. It creates a couple subfields that analyze the indexed terms as shingles, against which full terms are queried, and a prefix subfield that analyze terms as the largest shingle size used and edge-ngrams, against which partial terms are queried Adds a match_bool_prefix query type that creates a boolean clause of a term query for each term except the last, for which a boolean clause with a prefix query is created. The match_bool_prefix query is the recommended way of querying a search as you type field, which will boil down to term queries for each shingle of the input text on the appropriate shingle field, and the final (possibly partial) term as a term query on the prefix field. This field type also supports phrase and phrase prefix queries however	2019-03-27 10:03:30 -07:00
Christoph Büscher	113af7996c	Make limit on number of expanded fields configurable (#35284 ) Currently we introduced a hard limit of 1024 to the number of fields a query can be expanded to in #26541. Instead of using a hard limit, we should make this configurable. This change removes the hard limit check and uses the existing `max_clause_count` setting instead. Closes #34778	2018-11-08 17:04:40 +01:00
Abdon Pijpelink	32ee6148d2	[DOCS] Clarify scoring for multi_match phrase type (#32672 ) The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.	2018-09-18 16:57:33 +02:00
Jim Ferenczi	53462f6499	Make fields optional in multi_match query and rely on index.query.default_field by default (#27380 ) * Make fields optional in multi_match query and rely on index.query.default_field by default This commit adds the ability to send `multi_match` query without providing any `fields`. When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field` (which in turns defaults to ``). The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies the heuristic to `multi_match` queries. Relying on `index.query.default_field` rather than `` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using `multi_match`, `query_string` or `simple_query_string` do not throw an exception.	2017-11-17 10:25:21 +01:00
Jim Ferenczi	792641a6e3	[Docs] #26541 : add warning regarding the limit on the number of fields that can be queried at once in the multi_match query.	2017-10-30 18:03:56 +01:00
Alexander Kazakov	9c95e91471	Expose `fuzzy_transpositions` parameter in fuzzy queries (#26870 ) Add fuzzy_transpositions parameter to multi_match and query_string queries. Add fuzzy_transpositions, fuzzy_prefix_length and fuzzy_max_expansions parameters to simple_query_string query.	2017-10-05 09:01:09 +02:00
Jim Ferenczi	a7e1610134	Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string (#26097 ) * Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true). This option can be used in conjunction with synonym_graph token filter to generate phrase queries when multi terms synonyms are encountered. For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed: ((ny OR "new york") AND city) Note how the multi terms synonym "new york" produces a phrase query.	2017-08-09 12:15:09 +02:00
David Pilato	475a7ca84f	Add documentation for lenient in multimatch `lenient` option is documented for `match` query but not for `multi_match` query.	2016-11-17 08:35:20 +01:00
David Pilato	c946094d5b	Add documentation for lenient in multimatch `lenient` option is documented for `match` query but not for `multi_match` query.	2016-11-16 16:15:28 +01:00
Isabel Drost-Fromm	4c02e97bcd	Add back doc execution to query dsl. Relates to #18211 This reverts commit `20aafb1196`.	2016-05-24 12:43:41 +02:00
Isabel Drost-Fromm	20aafb1196	Revert "Add Autosense annotation for query dsl testing"	2016-05-17 20:55:56 +02:00
Isabel Drost-Fromm	2d402c732c	Merge branch 'master' into docs/add_autosense_to_query_dsl	2016-05-17 11:59:50 +02:00
Christoph Büscher	a40c397c67	Don't allow `fuzziness` for `multi_match` types cross_fields, phrase and phrase_prefix Currently `fuzziness` is not supported for the `cross_fields` type of the `multi_match` query since it complicates the logic that blends the term queries that cross_fields uses internally. At the moment using this combination is silently ignored, which can lead to confusions. Instead we should throw an exception in this case. The same is true for phrase and phrase_prefix type. Closes #7764	2016-05-13 17:32:14 +02:00
Isabel Drost-Fromm	a865090cf3	CONSOLE is the new AUTOSENSE	2016-05-10 12:42:17 +02:00
Isabel Drost-Fromm	e486560ea8	Add Autosense annotation for query dsl testing this adds the autosense annotation to a couple of query dsl docs files and fixes the snippets to work in the tests along the way.	2016-05-10 11:54:48 +02:00
Adrien Grand	8d5fff37ae	`multi_match` query applies boosts too many times. The `multi_match` query groups terms that have the same analyzer together and then applies the boost of the first query in each group. This is not necessary given that boosts for each term are already applied another way.	2015-08-06 19:07:12 +02:00
Clinton Gormley	0b0846f84b	Updated multi-match-query.asciidoc Corrected note about which field is boosted in a cross-fields multi_match query. Relates to #12294	2015-08-05 10:52:56 +02:00
Clinton Gormley	171687d207	Docs: Reorganised the Query DSL docs into families and explaing query vs filter context	2015-06-04 01:59:37 +02:00
Adrien Grand	a0af88e996	Query DSL: Remove filter parsers. This commit makes queries and filters parsed the same way using the QueryParser abstraction. This allowed to remove duplicate code that we had for similar queries/filters such as `range`, `prefix` or `term`.	2015-05-07 20:14:34 +02:00

38 commits