The completion_time is set as the start_time (already present) plus the 'took'
time that is set in the SearchResponse object and only if the isRunning status == false
since took is set even for in-progress searches.
We use the 'took' field because it is based on relative time, not absolute wall clock time
which can go backwards due to NTP issues. See the comments in TransportSearchAction about
the SearchTimeProvider for details.
Closes#88640
Before we used to track max_score in collapse when requested (track_scores=true)
or when there is no sort in collapse (see PR#27122). But this feature
was lost through refactoring and changes.
This PR restores this feature.
Closes#97653
There are situations in which the terminate_after functionality causes
the collection to keep on going although there is nothing to collect,
with the only goal of incrementing the counter of collected docs and
eventually early terminating which sets the `terminated_early` flag
in the search response to true.
When docs collection early terminates, we should rather honor the
corresponding `CollectionTerminatedException` that is thrown, and
adjust expectations around the fact that `terminate_after` affects
actual collection of documents, meaning that it can't be honored if
the threshold has not been reached by the team the collection early
terminates for other reasons.
This commit adjust the QueryPhaseCollector behavior to do that, which
allows for some additional simplifications.
Closes#97269
Added a clusterAlias to the Painless execute Request object, so that index
expressions in the request of the form "myremote:myindex" will be parsed to
set clusterAlias to "myremote" and the index to "myindex".
If clusterAlias is null, then it is executed against a shard on the local cluster, as before.
If clusterAlias is non-null, then the SingleShardTransportAction is sent to the remote cluster,
where it will run the full request (doing remote coordination). Note that the new clusterAlias
field is not Writeable so that when it is sent to the remote cluster it will only see the index
name, not the clusterAlias (which it wouldn't know how to handle correctly).
Added PainlessExecuteIT test that tests cross-cluster calls
Updated painless-execute-script end user docs to indicate support for cross-cluster executions
Currently the prefix size of the _terms_enum endpoint are not limited in size.
Since they run against a keyword field and build automata, this can lead to high memory
consumption and the danger of running OOM. This change check the size of the prefix
early in the rest request and throw a validation error in case it exceeds
IndexWriter.MAX_TERM_LENGTH, which is the same limit we apply to the length of
keyword field values anyway, so this comes at no loss in functionality.
Closes#96572
The query phase uses a number of different collectors and combines them together, pretty much one per feature that the search API exposes: there is a collector for post_filter, one for min_score, one for terminate_after, one for aggs. While this is very flexible, we always combine such collectors together in the same way (e.g. terminate_after must be the first one, post_filter is only applied to top docs collection, min score is applied to both aggs and top docs). This means that despite we could flexibly compose collectors, we need to apply each feature predictably which makes the composability not needed. Furthermore, composability causes complexity.
The terminate_after functionality is a clear example of complexity introduced as a consequence of having a complex collector tree: it relies on a multi collector, and throws an exception to force terminating the collection for all other collectors in the tree. If there was a single collector aware of post_filter, min_score and terminate_after at the same time, we could simply reuse Lucene mechanisms to early terminate the collection (CollectionTerminatedException) instead of forcing the termination throwing an exception that Lucene does not handle.
Furthermore, MultiCollector is a complex and generic collector to combine multiple collectors together, while we always every combine maximum two collectors with it, which are more or less fixed (e.g. top docs and aggs).
This PR introduces a new top-level collector that is inspired by MultiCollector in that it holds the top docs and the optional aggs collector and applies post_filter, min_score as well as terminate_after as part of its execution. This allows us to have a specialized collector for our needs, less flexibility and more control. This surfaced some strange behaviour that we may want to change as a follow-up in how terminate_after makes us collecting docs even when all possible collections have been early terminated. The goal of this PR though is to have feature parity with query phase before the refactoring, without any change of behaviour.
A nice benefit of this work is that it allows us to rely on CollectionTerminatedException for the terminate_after functionality. This simplifies the introduction of multi-threaded collector managers when it comes to handling exceptions.
Added additional fields to SearchProfileResults for XContent output: node_id, cluster, index, shard_id.
It parses the existing composite ID using the new parseProfileShardId method, which reverses
the SeachShardTarget.toString method.
No new information is added here, merely the splitting out of the four pieces of information
in the profile shards "composite" id that is created by the SeachShardTarget.toString method.
Profile/shards output now has the form:
```
"profile": {
"shards": [
{
"id": "[2m7SW9oIRrirdrwirM1mwQ][blogs][0]",
"node_id": "2m7SW9oIRrirdrwirM1mwQ",
"shard_id": "0",
"index": "blogs",
"cluster": "(local)",
"searches": [ ... ]
...
},
{
"id": "[UngEVXTBQL-7w5j_tftGAQ][remote1:blogs][2]",
"node_id": "UngEVXTBQL-7w5j_tftGAQ",
"shard_id": "2",
"index": "blogs",
"cluster": "remote1",
"searches": [ ... ]
...
```
where the latter is on a remote cluster and you can see that as the prefix on the index name.
Partially addresses #25896
Added yamlRestTest for the new fields in the profile response.
Added documentation to search-across-clusters.asciidoc showing that async-search
can now support the ccs_minimize_roundtrips=true flag and how it behaves relative to
async CCS when ccs_minimize_roundtrips=true.
I also updated the "Don't minimize network roundtrips" section to reflect the fact that the
REST based Search Shards API is no longer called but rather an internal transport-layer only
version of search_shards.
* Support CCS minimize round trips in async search
This commit makes the smallest set of changes to allow async-search based cross-cluster search
to work with the CCS minimize_round_trips feature without changing the internals/architecture of
the search action.
When ccsMinimizeRoundtrips is set to true on SubmitAsyncSearchRequest, the AsyncSearchTask on the
primary CCS coordinator sends a synchronous SearchRequest to all to clusters for a remote coordinator
to orchestrate and return the entire result set to the CCS coordinator as a single response.
This is the same functionality provided by synchronous CCS search using minimize_roundtrips.
Since this is an async search, it means that the async search coordinator has no visibility
into search progress on the remote clusters while they are running the search, thus losing one of
the key features of async search. However, this is a good first approach for improving overall search
latency for cross cluster searches that query a large number of shards on remote clusters, since
Kibana does not currently expose incremental progress of an async search to users.
Relates #73971
* Update term-suggest.asciidoc
It is really easy to miss the fact, that that's the default setting, since it is not highlighted or called out in anyway
* Apply review suggestion
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
This change at a high level adds global ranking on the coordinating node at the end of query reduction
prior to the fetch phase. Individual rank methods are defined in plugins.
The first rank plugin added as part of this change is reciprocal rank fusion (RRF). RRF uses a relatively
simple formula for merging 1...n results sets together with sum(1/(k+d)) where k is a ranking constant
and d is a document's scored position within a result set from a query.
This adds a new parameter to `knn` that allows filtering nearest neighbor results that are outside a given similarity.
`num_candidates` and `k` are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search `num_candidates` and only keep those that are within the provided `similarity` boundary, and then finally reduce to only the global top `k` as normal.
For example, when using the `l2_norm` indexed similarity value, this could be considered a `radius` post-filter on `knn`.
relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574
The _terms_enum API currently does not support ip fields. However,
type-ahead-like completion is useful for UI purposes.
This change adds the ability to query ip fields via the _terms_enum API by
leveraging the terms enumeration available when doc_values are enabled on the
field, which is the default. In order to make prefix filtering fast, we
internally create a fast prefix automaton from the user-supplied prefix that
gets intersected with the shards terms enumeration, similar to what we do for
keyword fields already.
Closes#89933
The text_embedding query vector builder that can be used with
KNN search to deliver a semantic search solution will be experimental
for its first release.
The _terms_enum API currently only supports the keyword, constant_keyword
and flattened field type. This change adds support for the `version` field type
that sorts according to the semantic versioning definition.
Closes#83403
This was only needed because the percolator uses a MemoryIndex which did
not support stored fields, and so when it ran a highlighting phase it needed to
force it to read from source. MemoryIndex added stored fields support in
lucene 9.5, so we can remove this internal parameter.
The parameter remains available, but deprecated, via the rest layer, and no
longer has any effect.