The cross_fields scoring type can produce negative scores when some documents
are missing fields. When blending term document frequencies, we take the maximum
document frequency across all fields. If one field appears in fewer documents
than another, this means that its IDF can become negative. This is because IDF
is calculated as `Math.log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))`
This change adjusts the docFreq for each field to `Math.min(docCount, docFreq)`
so that the IDF can never become negative. It makes sense that the term document
frequency should never exceed the number of documents containing the field.
The current docs mention that Elasticsearch indexes prefixes between 2 and 5 characters in a separate field. 2 and 5 are default values, and the size of the prefixes indexed depend on the configuration settings.
Documents the `EMPTY` and `NONE` `flag` values for the `regexp` query.
Also documents the `""` (empty string) value, which is an alias for `ALL`.
Closes#81978.
(cherry picked from commit e53ecc3f43)
Changes:
* Notes that the query string query's `default_field` and `fields` parameters support wildcards.
* Adds an xref to the `index.query.default_field` docs to the `default_field` parameter.
(cherry picked from commit f5f76ff1ca)
The current `multi_match` docs contain an erroneous reference to the `combined_fields` query. This updates the reference to reference the correct query.
Relates to https://github.com/elastic/elasticsearch/pull/76893
# Conflicts:
# docs/reference/query-dsl/combined-fields-query.asciidoc
Changes:
* Documents the `wildcard` parameter for the `wildcard` query. This parameter is an alias for the `value` parameter.
* Reorders the parameters alphabetically.
Closes#79711
As the script has only access to the nested document, this should be
documented.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Alexander Reelsen <alexander@reelsen.net>
Adds additional information about how Elasticsearch uses polygon orientation. Elasticsearch only uses a polygon's orientation to determine if it crosses the international dateline. If so, Elasticsearch splits the polygon at the dateline.
Closes#74891
# Conflicts:
# docs/reference/mapping/types/geo-shape.asciidoc
Changes:
* Notes that you can't use cross-cluster search to run a terms lookup on a remote index.
* Removes a redundant sentence noting `_source` is enabled by default.
Closes#61364.
Changes:
* Use "geopoint" when not referring to the literal field type
* Use "geoshape" when not referring to the literal field type or query type
* Use "GeoJSON" consistently
# Conflicts:
# docs/reference/ingest/processors/enrich.asciidoc
The current `ids` option doesn't allow pinning a specific document in a
single index when searching over multiple indices. This introduces a
`documents` option, which is an array of `_id` and `_index`
fields to allow index-specific pins.
Closes https://github.com/elastic/elasticsearch/issues/67855.
Co-authored-by: David Harsha <david.harsha@elastic.co>
`field_masking_span` is the only span query that does not begin with
`span_`. This commit deprecates the existing name and adds a new
name `span_field_masking` to better fit with the other queries.
The `geo_bounding_box` query's `type` parameter is currently ignored and has no
effect on the query. This documents the deprecation of the parameter in 7.14.0.
The parameter will be removed in 8.0.0.
Relates to #70561. Backport of #74008.
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.
This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.
Addresses #41106.
This adds a "note" on the docs for the script query pointing folks to
runtime fields because they are more flexible. It also translates the
request example into runtime fields.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
When performing a multi_match in cross_fields mode, we group fields based on
their analyzer and create a blended query per group. Our docs claimed that the
group scores were combined through a boolean query, but they are actually
combined through a dismax that incorporates the tiebreaker parameter.
This commit updates the docs and adds a test verifying the behavior.
The doc is misleading : The following intervals search returns documents containing `my favorite food` **immediately** followed by `hot water` or `cold porridge`
max_gaps apply only to the match query and is not used for checking proximity with the other match, the example given actually`This search would match a my_text value of my favorite food is cold`
Co-authored-by: Julien Guay <guay_j@yahoo.fr>
The original description of per-field boosting is incorrect. Boosting a
field does not imply that it is more important relative to other fields.
It simply means that the score is multiplied by the supplied boost
value. Due to the differences in each field's term and document
statistics, it's not possible to imply relative importance of fields
based on the per-field boost value alone.
Co-authored-by: Josh Devins <josh.devins@elastic.co>