Improvement includes:
1. Remove reference to Lucene queries (this information is not necessary
for Elastic users, and can be outdated)
2. For `span_field_masking` include a node to use
"require_field_match" : false parameter for highlighters to work.
Closes#101804
Return matched_queries for named queries in Percolator.
In a response, each hit together with
a `_percolator_document_slot` field will contain
`_percolator_document_slot_<slotNumber>_matched_queries` fields that will show
which sub-queries matched each percolated document.
Closes#10163
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to size
- num_candidates is a size of queue for candidates to consider while
search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
is applied as pre-filter or 2) a filter provided as a parameter
inside knn query.
Currently pinned queries require either the `ids` or `docs` parameter.
`docs` allows pinning documents from specific indices. However for
`docs` the `_index` field is always required:
```
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"docs": [
{ "_id": "1" }
]
}
}
}
```
returns an error:
```
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"line": 10,
"col": 22
}
],
"type": "parsing_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"line": 10,
"col": 22,
"caused_by": {
"type": "x_content_parse_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Required [_index]"
}
}
},
"status": 400
}
```
The proposal here is to make `_index` optional. I don't think we have a
strong requirement for making `_index` required, when it was initially
introduced in https://github.com/elastic/elasticsearch/pull/74873, we
mostly wanted the ability to pin docs from specific indices.
Making `_index` optional can give more flexibility to use a combination
of pinned documents from specific indices or just document ids. This
change can also help with pinned query rules. Currently pinned query
rules can accept either `ids` or `docs`. If multiple pinned query rules
match and they use a combination of `ids` and `docs`, we cannot build a
pinned query and we would need to return an error. This is because a
pinned query cannot accept both `ids` and `docs`. By making `_index`
optional we would no longer need to return an error when pinned query
rules use a combination of `ids` and `docs`, because we can easily
translate `ids` in `docs`.
The following pinned queries would be equivalent:
```
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"docs": [
{ "_id": "1" }
]
}
}
}
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"ids": [1]
}
}
}
```
The scores should be consistent when using a combination of _docs that
might use _index or not - see example
<details> <summary>Example </summary>
```
PUT test-1/_doc/1 { "title": "doc 1" }
PUT test-1/_doc/2 { "title": "doc 2" }
PUT test-2/_doc/1 { "title": "doc 1" }
PUT test-2/_doc/3 { "title": "lalala" }
POST test-1,test-2/_search { "query": { "pinned": {
"organic": { "query_string": { "query": "lalala"
} }, "docs": [ { "_id": "2", "_index": "test-1" },
{ "_id": "1" } ] } } }
```
response:
```
{ "took": 1, "timed_out": false, "_shards": { "total": 2,
"successful": 2, "skipped": 0, "failed": 0 }, "hits": {
"total": { "value": 4, "relation": "eq" },
"max_score": 1.7014124e+38, "hits": [ { "_index":
"test-1", "_id": "2", "_score": 1.7014124e+38,
"_source": { "title": "doc 2" } }, {
"_index": "test-1", "_id": "1", "_score": 1.7014122e+38,
// same score as doc with id 1 from test-2 "_source": {
"title": "doc 1" } }, { "_index": "test-2",
"_id": "1", "_score": 1.7014122e+38, // same score as doc with
id 1 from test-1 "_source": { "title": "doc 1"
} }, { "_index": "test-2", "_id": "3",
"_score": 0.8025915, // organic result "_source": {
"title": "lalala" } } ] } }
```
</details>
For query rules, if we have two query rules that both match and use a
combination of `ids` and `pinned`:
```
PUT _query_rules/test-ruleset
{
"ruleset_id": "test-ruleset",
"rules": [
{
"rule_id": "1",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"docs": [
{ "_index": "singers", "_id": "1" }
]
}
},
{
"rule_id": "2",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"ids": [
2
]
}
}
]
}
```
and the following query:
```
POST singers/_search
{
"query": {
"rule_query": {
"organic": {
"query_string": {
"default_field": "name",
"query": "country"
}
},
"match_criteria": {
"query_string": "country"
},
"ruleset_id": "test-ruleset"
}
}
}
```
then this would get translated into the following pinned query:
```
POST singers/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"default_field": "name",
"query": "country"
}
},
"docs": [
{ "_index": "singers", "_id": "1" },
{"_id": 2 }
]
}
}
}
```
I think we can also simplify the pinned query rule so that it only
receives `docs`:
```
PUT _query_rules/test-ruleset
{
"ruleset_id": "test-ruleset",
"rules": [
{
"rule_id": "1",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"docs": [
{ "_id": "1" },
{ "_id": "2", "_index": "singers" }
]
}
}
]
}
```
This change adds a new rest parameter called `rest_include_named_queries_score` that when set, includes the score of the named queries that matched the document.
Note that with this change, the score of named queries is always returned when using the transport client. The rest level has the ability to set the format of
the matched_queries section for BWC (kept as is by default).
Closes#65563
The documentation claimed that for the most_fields type, the score is equal to
the sum of all matches divided by the number of matches. This is not correct,
we actually don't divide by the number of matches.
This line in the documentation was added several years ago as part of a large
PR, and was likely just a mistake.
* Refine geo-point and geo-shape docs
While reviewing the docs for another issue, some deprecated
references to prefix-trees were discovered, leading to interest
in bringing the docs a little more up-to-date.
* Update docs/reference/mapping/types/geo-point.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* Update docs/reference/mapping/types/geo-shape.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
The cross_fields scoring type can produce negative scores when some documents
are missing fields. When blending term document frequencies, we take the maximum
document frequency across all fields. If one field appears in fewer documents
than another, this means that its IDF can become negative. This is because IDF
is calculated as `Math.log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))`
This change adjusts the docFreq for each field to `Math.min(docCount, docFreq)`
so that the IDF can never become negative. It makes sense that the term document
frequency should never exceed the number of documents containing the field.
There are some redundant words so I just removed those words. Please accept this change.
(cherry picked from commit e1e5398051)
Co-authored-by: Adnan Ashraf <adnan.ashraff1@gmail.com>
The current docs mention that Elasticsearch indexes prefixes between 2 and 5 characters in a separate field. 2 and 5 are default values, and the size of the prefixes indexed depend on the configuration settings.
* Soft-deprecation of point/geo_point formats
Since GeoJSON and WKT are now common formats for all three types:
geo_shape, geo_point and point
We decided to soft-deprecate the other point formats by ordering:
* GeoJSON (object with keys `type` and `coordinates`)
* WKT `POINT(x y)`
* Object with keys `lat` and `lon` (or `x` and `y` for point)
* Array [lon,lat]
* String `"lat,lon"` (or `"x,y"` in point)
* String with geohash (only in `geo_point`)
The geohash is last because it is only in one field type.
The string version is second last because it is the most controversial
being the only version to reverse the coordinate order from all other
formats (for geo_point only, since the coordinates are not reversed
in point).
In addition we replaced many examples in both documentation and tests
to prioritize WKT over the plain string format.
Many remaining examples of array format or object with keys still exist
and could be replaced by, for example, GeoJSON, if we feel the need.
* Incorrect quote position
Documents the `EMPTY` and `NONE` `flag` values for the `regexp` query.
Also documents the `""` (empty string) value, which is an alias for `ALL`.
Closes#81978.
Changes:
* Notes that the query string query's `default_field` and `fields` parameters support wildcards.
* Adds an xref to the `index.query.default_field` docs to the `default_field` parameter.
The current `multi_match` docs contain an erroneous reference to the `combined_fields` query. This updates the reference to reference the correct query.
Relates to https://github.com/elastic/elasticsearch/pull/76893
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
Changes:
* Documents the `wildcard` parameter for the `wildcard` query. This parameter is an alias for the `value` parameter.
* Reorders the parameters alphabetically.
Closes#79711
As the script has only access to the nested document, this should be
documented.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Adds additional information about how Elasticsearch uses polygon orientation. Elasticsearch only uses a polygon's orientation to determine if it crosses the international dateline. If so, Elasticsearch splits the polygon at the dateline.
Closes#74891