This adds a new transport action to get the status of a migration
reindex (started via the API at #118109), and a new rest action to use
it. The rest action accepts the data stream or index name, and returns
the status. For example if the reindex task exists for data stream
`my-data-stream`:
```
GET /_migration/reindex/my-data-stream/_status?pretty
```
returns
```
{
"start_time" : 1733519098570,
"complete" : true,
"total_indices" : 1,
"total_indices_requiring_upgrade" : 0,
"successes" : 0,
"in_progress" : 0,
"pending" : 0,
"errors" : [ ]
}
```
If a reindex task does not exist:
```
GET _migration/reindex/my-data-stream/_status?pretty
```
Then a 404 is returned:
```
{
"error" : {
"root_cause" : [
{
"type" : "resource_not_found_exception",
"reason" : "No migration reindex status found for [my-data-stream]"
}
],
"type" : "resource_not_found_exception",
"reason" : "No migration reindex status found for [my-data-stream]"
},
"status" : 404
}
```
Removes the old `_knn_search` API that was never out of tech preview and
deprecated throughout the v8 cycle.
To utilize the API, `compatible-with=8` can be utilized.
This measurably improves BBQ by adjusting the underlying algorithm to an
optimized per vector scalar quantization.
This is a brand new way to quantize vectors. Instead of there being a
global set of upper and lower quantile bands, these are optimized and
calculated per individual vector. Additionally, vectors are centered on
a common centroid.
This allows for an almost 32x reduction in memory, and even better
recall than before at the cost of slightly increasing indexing time.
Additionally, this new approach is easily generalizable to various other
bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may
update our scalar quantized indices in the future to use this new
algorithm, giving significant boosts in recall.
The recall gains spread from 2% to almost 10% for certain datasets with
an additional 5-10% indexing cost when indexing with HNSW when compared
with current BBQ.
* Adding a _migration/reindex endpoint
* Adding rest api spec and test
* Adding a feature flag for reindex data streams
* updating json spec
* fixing a typo
* Changing mode to an enum
* Moving ParseFields into public static finals
* Commenting out test that leaves task running, until we add a cancel API
* Removing persistent task id from output
* replacing a string with a variable
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
This will make `TransportLocalClusterStateAction` wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param `master_timeout` for
this timeout as well.
The only class that was using `TransportLocalClusterStateAction` was
`TransportGetAliasesAction`, so its request needed to accept a timeout
again as well.
* Parse the contents of dynamic objects for [subobjects:false]
* Update docs/changelog/117762.yaml
* add tests
* tests
* test dynamic field
* test dynamic field
* fix tests
Remove to, from, include_lower, include_upper range query params.
These params have been removed from our documentation in v. 0.90.4 (d6ecdec),
and got deprecated in 8.16 in #113286.
The current name doesn't allow skipping it to workaround compatibility
test failures:
```
> Task :rest-api-spec:yamlRestCompatTestTransform FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':rest-api-spec:yamlRestCompatTestTransform'.
> class com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to class com.fasterxml.jackson.databind.node.ArrayNode (com.fasterxml.jackson.databind.node.ObjectNode and com.fasterxml.jackson.databind.node.ArrayNode are in unnamed module of loader org.gradle.internal.classloader.VisitableURLClassLoader$InstrumentingVisitableURLClassLoader @15eaac09)
```
This adds a new `multi_dense_vector` field that focuses on the maxSim
usecase provided by Col[BERT|Pali].
Indexing vectors in HNSW as it stands makes no sense. Performance wise
or for cost. However, we should totally support rescoring and
brute-force search over vectors with maxSim.
This is step one of many. Behind a feature flag, this adds support for
indexing any number of vectors of the same dimension.
Supports bit/byte/float.
Scripting support will be a follow up.
Marking as non-issue as its behind a flag and unusable currently.
Currently dense_vector field don't support docvalue_fields.
This add this support for debugging purposes. Users can inspect
row values of their vectors even if the source is disabled.
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
* Track source for objects and fields with [synthetic_source_keep:arrays] in arrays as ignored
* Update TransportResumeFollowActionTests.java
* rest compat fixes
* rest compat fixes
* update test
A Lucene commit doesn't contain sync ids `SegmentInfos` anymore, so we can't rely on them during recovery. The fields was marked as deprecated in #102343.
A change made in 8.0 intended to deprecate this parameter. However,
because the new code only checked for the presence of the parameter
and never consumed it, the effect was actually to remove support for
the parameter. This code therefore basically does nothing and can be
removed.
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:
1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.
2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.
To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.
This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
We can randomly inject a global template that defaults to 2 shards
instead of 1. This causes the lookup index YAML tests to fail. To avoid
this, the change requires specifying the default_shards setting for
these tests
This change introduces a new index mode, lookup, for indices intended
for lookup operations in ES|QL. Lookup indices must have a single shard
and be replicated to all data nodes by default. Aside from these
requirements, they function as standard indices. Documentation will be
added later when the lookup operator in ES|QL is implemented.
It was deprecated in #104209 (8.13) and shouldn't be set or returned in 9.0
The Desired Nodes API is an internal API, and users shouldn't depend on its backward compatibility.