Fixes two bugs in _resolve/cluster.
First, the code that detects older clusters versions and does a fallback to the _resolve/index
endpoint was using an outdated string match for error detection. That has been adjusted.
Second, upon security exceptions, the _resolve/cluster endpoint was marking the clusters as connected: true,
under the assumption that all security exceptions related to cross cluster calls and remote index access were
coming from the remote cluster, but that is not always the case. Some cross-cluster security violations can
be detected on the local querying cluster after issuing the remoteClient.execute call but before the transport
layer actually sends the request remotely. So we now mark the connected status as false for all ElasticsearchSecurityException cases. End user docs have been updated with this information.
* [Connector API] Add interface for soft-deletes
* Define connector deleted system index
* Got soft-delete logic working
* Add unit tests
* Add yaml e2e test and attempt to update permissions
* Fix permissions
* Update docs
* Fix docs
* Update docs/changelog/118282.yaml
* Change logic
* Fix tests
* Remove unnecessary privilege from yaml rest test
* Update changelog
* Update docs/changelog/118669.yaml
* Adapt yaml tests
* Undo changes to muted-tests.yml
* Fix compilation issue after other PR got merged
* Exclude soft-deleted connector from checks about index_name already in use
* Update docs/reference/connector/apis/get-connector-api.asciidoc
Co-authored-by: Tim Grein <tim@4greins.de>
* Update rest-api-spec/src/main/resources/rest-api-spec/api/connector.list.json
Co-authored-by: Tim Grein <tim@4greins.de>
* Adapt comments, add connector wire serializing test
* Introduce new transport versions for passing the delete flag
* Get rid of wire serialisation, use include_deleted instead of deleted flag
* Remove unused import
* Final tweaks
* Adapt variable name in rest layer
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Tim Grein <tim@4greins.de>
I believe this is a typo, as in our 8.16.1 cluster this field appears to be called `combined_coordinating_and_primary`
Co-authored-by: Ian Lee <IanLee1521@gmail.com>
This PR introduces a new syntactical feature to index expression resolution: The selector.
Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of
an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a
concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We
define a component of an index abstraction to be some searchable unit of the index abstraction.
This exposes new OTel node and index based metrics for indexing failures due to version conflicts.
In addition, the /_cat/shards, /_cat/indices and /_cat/nodes APIs also expose the same metric, under the newly added column iifvc.
Relates: #107601
When originally added, the knn query didn't apply `top-k` restrictions
to the query. Instead it would allow the resulting `num_candidate` to be
combined with sibling queries without restricting to `top-size` results
ahead of time.
This honestly is confusing behavior and leads to some bugs in understand
how it all works.
This commit addresses this by eagerly gathering only `size` results when
`k==null` before combining with other queries.
To achieve the previous behavior, this can be done directly by setting
`k==num_candidates` in the query.
Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.
This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models.
For example, this is how you would use it via the API:
```
PUT index
{
"mappings": {
"properties": {
"late_interaction_vectors": {
"type": "rank_vectors"
}
}
}
}
```
Then to index:
```
POST index/_doc
{
"late_interaction_vectors": [[0.1, ...],...]
}
```
For querying, scoring can be exposed with scripting:
```
POST index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "maxSimDotProduct(params.query_vector, 'my_vector')",
"params": {
"query_vector": [[0.42, ...], ...]
}
}
}
}
}
```
Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.
Shutdown metadata is keyed on node id. This makes sense since only one
node with a given node id can exist within a cluster. However, it is
possible that shutdown was initiated for one instance of a node, but
that node is restarted. This commit adds the ephemeral node id to
shutdown metadata so that nodes with the same id but different ephemeral
id can be distinguished.
* (Doc+) Enrich run on ingest+data nodes not coordinating-only
👋 howdy, team! I'm not otherwise finding it so documenting https://github.com/elastic/elasticsearch/issues/95969 in ES docs
> Currently we tell users of enrich that they should co-locate the nodes that perform the enrichment (ingest nodes) with the actual enrich data so that enrich operations don't require a remote search operation.
* feedback
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
---------
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
The `?local` parameter becomes a no-op and is marked as deprecated.
Relates #101805
Relates #107984
This ensures the usage API docs tests are passing again. We achieve this
by: 1. ignoring the contents of `inference.models` because the models
might not yet have been initialized and 2. adding missing fields to the
`logsdb` usage.
This pull request introduces a new retriever called `rescorer`, which leverages the `rescore` functionality of the search request.
The `rescorer` retriever re-scores only the top documents retrieved by its child retriever, offering fine-tuned scoring capabilities.
All rescorers supported in the `rescore` section of a search request are available in this retriever, and the same format is used to define the rescore configuration.
<details>
<summary>Example:</summary>
```yaml
- do:
search:
index: test
body:
retriever:
rescorer:
rescore:
window_size: 10
query:
rescore_query:
rank_feature:
field: "features.second_stage"
linear: { }
query_weight: 0
retriever:
standard:
query:
rank_feature:
field: "features.first_stage"
linear: { }
size: 2
```
</details>
Closes#118327
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Replace 'ent-search-generic' with 'search-default' pipeline
* missed one
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
#118443 added a new index version for indices that can be opened in read-only mode by Lucene. This change adds this information to the discovery node's VersionInformation and the transport serialization logic.
In a short future we'd like to use this information in methods like IndexMetadataVerifier#checkSupportedVersion and NodeJoineExecutor to allow opening indices in N-2 versions as read-only indices on ES V9.