* Updating error message when index field type is unknown
* Fix style issue
* Add yaml test for invalid field type error message
* Update docs/changelog/122860.yaml
* Updating error message for runtime and multi field type parser
* add and fix yaml tests
* Fix code styles by running spotlessApply
* Update changelog
* Updatig the test in yml
* Updating error message for runtime
* Fix failing yaml tests
* Update error message to Fix unit tests
* fix serverless qa test
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.
For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]
Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]
Empty arrays are also supported by encoding a zigzag vint array of zero elements.
Limitations:
currently only doc values based array support for keyword field mapper.
multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].
These limitations can be addressed, but some require more complexity and or additional storage.
With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).
The `local` param for the `GetFieldMapping` API was deprecated in #55014
and I think #57265 aimed to propogate that deprecation to the REST API
spec, but it changed `get_mapping.json` instead of
`get_field_mapping.json`. #55100 removed the `local` param for the
_field_ mapping API so we can safely remove the field from the spec and
remove the YAML test.
We shouldn't run the post-snapshot-delete cleanup work on the master
thread, since it can be quite expensive and need not block subsequent
cluster state updates. This commit forks it onto a `SNAPSHOT` thread.
This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB.
Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met:
* The index mode is LogsDB.
* The field name is @timestamp.
* Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not).
* Doc values are enabled.
This ensures that only one index structure is defined on the @timestamp field:
* If the conditions above are met, the inverted index is replaced with a sparse doc values index.
* This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead.
This change aligns with our goal of optimizing LogsDB for storage efficiency while possibly maintaining reasonable query latency performance. It will enable us to run benchmarks and evaluate the impact of sparse indexing on the @timestamp field as well.
This patch removes the check that fails requests that attempt to use fields of type: nested within indices with mode time_series.
This patch also updates TimeSeriesIdFieldMapper#postParse to set the _id field on child documents once it's calculated.
Closes#120874
When utilizing synthetic source with nested fields, we attempt to
rebuild the child values in addition to all the parent values.
While this generally works well, its potential that certain values might
be missing from various child docs. Consequently, we will attempt to
iterate the vector values strangely, resulting in seemingly missing
values or potentially exceptions indicating EOFs.
closes: #122383
We experimented with using synthetic source for recovery and observed quite positive impact
on indexing throughput by means of our nightly Rally benchmarks. As a result, here we enable
it by default when synthetic source is used. To be more precise, if `index.mapping.source.mode`
setting is `synthetic` we enable recovery source by means of synthetic source.
Moreover, enabling synthetic source recovery is done behind a feature flag. That would allow us
to enable it in snapshot builds which in turn will allow us to see performance results in Rally nightly
benchmarks.
This simplifies the setup and relaxes the similarity check.
We can restrict the similarity check once we evolve the quantization
algorithm in the future.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
This attempts to fix a flay test where the term_freq returned by the
multiple terms vectors API was `null`. I was not able to reproduce this
test but this proposes a fix based on the following running theory: - an
Elasticsearch cluster comprised of at least 2 nodes - we create a couple
of indices with 1 primary and 1 replica - we index a document that was
acknowledged only by the primary (because `wait_for_active_shards`
defaults to `1`) - the test executes the multiple terms vectors API and
it hits the node hosting the replica shard, which hasn't yet received
the document we ingested in the primary shard.
This race condition between the document replication and the test
running the terms vectors API on the replica shard could yield a `null`
value for the the term's `term_freq` (as the replica shard contains 0
documents).
This PR proposes we change the `wait_for_active_shards` value to `all`
so each write is acknowledged by all replicas before the client receives
the response.
Fixes#113325
Currently if a document has duplicate suggestions across different
contexts, only the first gets indexed, and when a user tries to
search using the second context, she will get 0 results.
This PR addresses this, but adding support for duplicate suggestions
across different contexts, so documents like below with duplicate inputs
can be searched across all provided contexts.
```json
{
"my_suggest": [
{
"input": [
"foox",
"boo"
],
"weight" : 2,
"contexts": {
"color": [
"red"
]
}
},
{
"input": [
"foox"
],
"weight" : 3,
"contexts": {
"color": [
"blue"
]
}
}
]
}
```
Closes#82432
Updated indices.resolve_cluster.json to match new resolve/cluster spec.
Added new test for the no-index-expression endpoint.
Adjust syntax in 10_basic_resolve_cluster.yml syntax fix so that the elasticsearch-specification validation tests pass.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Add documentation for new REST endpoints related to data stream upgrade.
Endpoints:
- /_migration/reindex
- /_migration/reindex/{index}/_status
- /_migration/reindex/{index}/_cancel
- /_create_from/{source}/{dest}
A new query parameter `?include_source_on_error` was added for create / index, update and bulk REST APIs to control
if to include the document source in the error response in case of parsing errors. The default value is `true`.
Add capability to stop async query on demand
The theory:
- User initiates async search request
- User sends the stop request (POST _query/async/<ID>/stop)
- If the async is finished by that time, it's like regular async get
- If it's not finished, the sinks are closed and the request is forcefully finished
Add a new boolean parameter remove_index_blocks from the _create_from API.
If this parameter is set to true, all index block will be filtered out when creating the destination index.
This change disables auto_expand_replicas on lookup indices to enhance
the lookup join user experience. Users can, however, enable this setting
at any time to optimize performance.
This adds a sentence to `redirects.asciidoc` explaining what frozen
indices were - otherwise, everything will point to the message about
the unfreeze API having gone away, which is not very helpful. Some
cross-references are updated to point to this rather than to the
notice about the removal of the unfreeze API.
ES-9736 #comment Removed `_unfreeze` REST endpoint in https://github.com/elastic/elasticsearch/pull/119227
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary
work after a client failure or timeout. The `?local` parameter
becomes a no-op and is marked as deprecated.
We didn't have a YAML test for this API, and we're reusing the YAML
tests elsewhere as a smoke test for functionality, so it's helpful to
have one for this API.