* Refactor semantic text field to align with text field behaviour (#119183)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
* fix compil after backport
* fix compil after backport (bis)
---------
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
* Term Stats documentation
* Update docs/reference/reranking/learning-to-rank-model-training.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Fix query example.
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
(cherry picked from commit 0416812456)
Co-authored-by: Aurélien FOUCRET <aurelien.foucret@gmail.com>
Since https://github.com/apache/lucene-solr/pull/620, intervals disjunctions are automatically rewritten to handle cases where minimizations can miss valid matches.
This change updates the documentation to take this behaviour into account (users don't need to manually pull intervals disjunctions to the top anymore).
* Cleanup: Remove pinned IDs from applied rules in favor of single applied docs
* Add support for query rules of type exclude, to exclude specified documents from result sets
* Support exluded documents that specify the _index as well as the _id
* Cleanup
* Update docs/changelog/111420.yaml
* Update docs
* Spotless
* PR feedback - docs updates
* Apply PR feedback
* PR feedback
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
indices.query.bool.max_clause_count is set automatically and does
not default to 4096 as before. This remove mentions of 4096
from query documentations.
Relates to PR#91811
Introduce an optional k param for knn query
If k is not set, knn query has the previous behaviour:
- `num_candidates` docs is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results
If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.
Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.
Closes#108473
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
* Add modelId and modelText to KnnVectorQueryBuilder
Use QueryVectorBuilder within KnnVectorQueryBuilder to make it
possible to perform knn queries also when a query vector is not
immediately available. Supplying a text_embedding query_vector_builder
with model_text and model_id instead of the query_vector will result
in the generation of a query_vector by calling inference on the
specified model_id with the supplied model_text (during query
rewrite). This is consistent with the way query vectors are built
from model_id / model_text in KnnSearchBuilder (DFS phase).
This enhancement adds a new abstraction to the _search API called "retriever." A
retriever is something that returns top hits. This adds three initial retrievers called
"standard", "knn", and "rrf". The retrievers use a parser-only approach where they
are parsed and then translated into a SearchSourceBuilder to execute the actual
search.
---------
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
With this commit we amend the docs for the `exists` query to clarify
that it works with either `index` *or* `doc_values` set to `true` in the
mapping. Only if both are disabled, the `exists` query won't work.
Improvement includes:
1. Remove reference to Lucene queries (this information is not necessary
for Elastic users, and can be outdated)
2. For `span_field_masking` include a node to use
"require_field_match" : false parameter for highlighters to work.
Closes#101804
Return matched_queries for named queries in Percolator.
In a response, each hit together with
a `_percolator_document_slot` field will contain
`_percolator_document_slot_<slotNumber>_matched_queries` fields that will show
which sub-queries matched each percolated document.
Closes#10163
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to size
- num_candidates is a size of queue for candidates to consider while
search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
is applied as pre-filter or 2) a filter provided as a parameter
inside knn query.
Currently pinned queries require either the `ids` or `docs` parameter.
`docs` allows pinning documents from specific indices. However for
`docs` the `_index` field is always required:
```
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"docs": [
{ "_id": "1" }
]
}
}
}
```
returns an error:
```
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"line": 10,
"col": 22
}
],
"type": "parsing_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"line": 10,
"col": 22,
"caused_by": {
"type": "x_content_parse_exception",
"reason": "[10:22] [pinned] failed to parse field [docs]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Required [_index]"
}
}
},
"status": 400
}
```
The proposal here is to make `_index` optional. I don't think we have a
strong requirement for making `_index` required, when it was initially
introduced in https://github.com/elastic/elasticsearch/pull/74873, we
mostly wanted the ability to pin docs from specific indices.
Making `_index` optional can give more flexibility to use a combination
of pinned documents from specific indices or just document ids. This
change can also help with pinned query rules. Currently pinned query
rules can accept either `ids` or `docs`. If multiple pinned query rules
match and they use a combination of `ids` and `docs`, we cannot build a
pinned query and we would need to return an error. This is because a
pinned query cannot accept both `ids` and `docs`. By making `_index`
optional we would no longer need to return an error when pinned query
rules use a combination of `ids` and `docs`, because we can easily
translate `ids` in `docs`.
The following pinned queries would be equivalent:
```
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"docs": [
{ "_id": "1" }
]
}
}
}
GET test/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"query": "something"
}
},
"ids": [1]
}
}
}
```
The scores should be consistent when using a combination of _docs that
might use _index or not - see example
<details> <summary>Example </summary>
```
PUT test-1/_doc/1 { "title": "doc 1" }
PUT test-1/_doc/2 { "title": "doc 2" }
PUT test-2/_doc/1 { "title": "doc 1" }
PUT test-2/_doc/3 { "title": "lalala" }
POST test-1,test-2/_search { "query": { "pinned": {
"organic": { "query_string": { "query": "lalala"
} }, "docs": [ { "_id": "2", "_index": "test-1" },
{ "_id": "1" } ] } } }
```
response:
```
{ "took": 1, "timed_out": false, "_shards": { "total": 2,
"successful": 2, "skipped": 0, "failed": 0 }, "hits": {
"total": { "value": 4, "relation": "eq" },
"max_score": 1.7014124e+38, "hits": [ { "_index":
"test-1", "_id": "2", "_score": 1.7014124e+38,
"_source": { "title": "doc 2" } }, {
"_index": "test-1", "_id": "1", "_score": 1.7014122e+38,
// same score as doc with id 1 from test-2 "_source": {
"title": "doc 1" } }, { "_index": "test-2",
"_id": "1", "_score": 1.7014122e+38, // same score as doc with
id 1 from test-1 "_source": { "title": "doc 1"
} }, { "_index": "test-2", "_id": "3",
"_score": 0.8025915, // organic result "_source": {
"title": "lalala" } } ] } }
```
</details>
For query rules, if we have two query rules that both match and use a
combination of `ids` and `pinned`:
```
PUT _query_rules/test-ruleset
{
"ruleset_id": "test-ruleset",
"rules": [
{
"rule_id": "1",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"docs": [
{ "_index": "singers", "_id": "1" }
]
}
},
{
"rule_id": "2",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"ids": [
2
]
}
}
]
}
```
and the following query:
```
POST singers/_search
{
"query": {
"rule_query": {
"organic": {
"query_string": {
"default_field": "name",
"query": "country"
}
},
"match_criteria": {
"query_string": "country"
},
"ruleset_id": "test-ruleset"
}
}
}
```
then this would get translated into the following pinned query:
```
POST singers/_search
{
"query": {
"pinned": {
"organic": {
"query_string": {
"default_field": "name",
"query": "country"
}
},
"docs": [
{ "_index": "singers", "_id": "1" },
{"_id": 2 }
]
}
}
}
```
I think we can also simplify the pinned query rule so that it only
receives `docs`:
```
PUT _query_rules/test-ruleset
{
"ruleset_id": "test-ruleset",
"rules": [
{
"rule_id": "1",
"type": "pinned",
"criteria": [
{
"type": "exact",
"metadata": "query_string",
"value": "country"
}
],
"actions": {
"docs": [
{ "_id": "1" },
{ "_id": "2", "_index": "singers" }
]
}
}
]
}
```