This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
The `?local` parameter becomes a no-op and is marked as deprecated.
Relates #101805
Relates #107984
Today the docs for the `?wait_for_active_shards` parameter say that it
must be a positive integer, proscribing `0`, yet `0` is a legitimate
value for this parameter. This commit fixes this point and rewords the
docs slightly for clarity.
Introduce an optional k param for knn query
If k is not set, knn query has the previous behaviour:
- `num_candidates` docs is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results
If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.
Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.
Closes#108473
The cluster level dense vector stats returns the total number of dense vector indices globally including the replicas.
This commit fixes the total to only include the value count of the primary indices.
This change aligns with the docs stats which also reports the number of primary documents when used in cluster stats.
The indices stats API still reports granular results for replicas and primaries so the information is not lost.
* Add SparseVectorStats
* Update to use mappings in engine
* Update to be unique to primary shards
* Fix doc
* Fix null error in test
* Cleanup
* fix yaml
* remove comment
* add version to yaml
* Revert whitespace changes to stats doc
* fix yml test
* Checkstyle
* Fix NPE in test
* Update docs/changelog/108793.yaml
* Add link to sparse_vector field type in docs
* PR feedback
* Flesh out test a bit more
* PR feedback - alphabetize placement in docs
* Fix doc change
APIs which perform cluster state updates typically accept the
`?master_timeout=` and `?timeout=` parameters to respectively set the
pending task queue timeout and the acking timeout for the cluster state
update. Both of these parameters accept the value `-1`, but
`?master_timeout=-1` means to wait indefinitely whereas `?timeout=-1`
means the same thing as `?timeout=0`, namely that acking times out
immediately on commit.
There are some situations where it makes sense to wait for as long as
possible for nodes to ack a cluster state update. In practice this wait
is bounded by other mechanisms (e.g. the lag detector will remove the
node from the cluster after a couple of minutes of failing to apply
cluster state updates) but these are not really the concern of clients.
Therefore with this commit we change the meaning of `?timeout=-1` to
mean that the acking timeout is infinite.
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
Specifying `?master_timeout=-1` on an API which performs a cluster state
update means that the cluster state update task will never time out
while waiting in the pending tasks queue. However this parameter is also
re-used in a few places where a timeout of `-1` means something else,
typically to timeout immediately. This commit fixes those places so that
`?master_timeout=-1` consistently means to wait forever.
* Add modelId and modelText to KnnVectorQueryBuilder
Use QueryVectorBuilder within KnnVectorQueryBuilder to make it
possible to perform knn queries also when a query vector is not
immediately available. Supplying a text_embedding query_vector_builder
with model_text and model_id instead of the query_vector will result
in the generation of a query_vector by calling inference on the
specified model_id with the supplied model_text (during query
rewrite). This is consistent with the way query vectors are built
from model_id / model_text in KnnSearchBuilder (DFS phase).
This enhancement adds a new abstraction to the _search API called "retriever." A
retriever is something that returns top hits. This adds three initial retrievers called
"standard", "knn", and "rrf". The retrievers use a parser-only approach where they
are parsed and then translated into a SearchSourceBuilder to execute the actual
search.
---------
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Adding equations to the docs around how to best calculate similarity & score. The similarity parameter for search was added in 8.8.
The max-inner-product mentions will be removed for all versions before 8.11 when backporting.
closes: https://github.com/elastic/elasticsearch/issues/102924
This adds a new parameter to `knn` that allows filtering nearest neighbor results that are outside a given similarity.
`num_candidates` and `k` are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search `num_candidates` and only keep those that are within the provided `similarity` boundary, and then finally reduce to only the global top `k` as normal.
For example, when using the `l2_norm` indexed similarity value, this could be considered a `radius` post-filter on `knn`.
relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574
Added Cartesian support for centroid aggregation
* First draft of cartesian-centroid docs
However, this is largely a duplicate of geo-centroid docs since they are essentially identical behaviour. We should consider merging them.
* Work on isAggregatable caused a minor logic conflict. When that work was done, Point and Shape were not aggregatable, but now they are.
This removes "data streams" from the docs for the `index`, `delete`,
and `update` actions because data streams only support the `update`
action.
Closes#87231
The current `ignore_unavailable` definition is a bit misleading. The parameter primarily determines if a request that targets a missing or closed index returns an error.
This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
PR #55884 removed documentation for several query parameters from the search API
docs. During tests, I failed to notice that these are valid parameters but require other parameters to use.
Changes:
* Notes the following search API parameters require the `q` query string parameter:
* `analyzer`
* `analyze_wildcard`
* `default_operator`
* `df`
* `lenient`
* Notes the following search API parameters require the `suggest_field` and `suggest_text` query parameters:
* `suggest_mode`
* `suggest_size`
* Re-adds the above parameters to the search API docs.
These changes also affect API documentation that reuses the search API parameters:
* Delete by query API
* Update by query API
* Count API
* Explain API
* Validate API
Closes#79674