Currently, the raw path is only available from the RestRequest. This
makes the logic to determine if a handler supports streaming more
challenging to evaluate. This commit moves the raw path into pre request
to allow easier streaming support logic.
The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way:
When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail:
- when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`.
- when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode.
We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are:
- `used`: meaning this document was indexed in the failure store
- `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable.
- `failed`: meaning this failed document, failed to be stored in the failure store.
Example:
```
"errors": true,
"took": 202,
"items": [
{
"create": {
"_index": ".fs-my-ds-2024.09.04-000002",
"_id": "iRDDvJEB_J3Inuia2zgH",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 6,
"_primary_term": 1,
"status": 201,
"failure_store": "used"
}
},
{
"create": {
"_index": "ds-no-fs",
"_id": "hxDDvJEB_J3Inuia2jj3",
"status": 400,
"error": {
"type": "document_parsing_exception",
"reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "For input string: \"bla\""
}
}
},
"failure_store": "not_enabled"
},
{
"create": {
"_index": ".ds-my-ds-2024.09.04-000001",
"_id": "iBDDvJEB_J3Inuia2jj3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1,
"status": 201
}
}
]
```
Several `TransportNodesAction` implementations do some kind of top-level
computation in addition to fanning out requests to individual nodes.
Today they all have to do this once the node-level fanout is complete,
but in most cases the top-level computation can happen in parallel with
the fanout. This commit adds support for an additional `ActionContext`
object, created when starting to process the request and exposed to
`newResponseAsync()` at the end, to allow this parallelization.
All implementations use `(Void) null` for this param, except for
`TransportClusterStatsAction` which now parallelizes the computation of
the cluster-state-based stats with the node-level fanout.
Introduces per-field param `synthetic_source_keep` that overrides the
behavior for keeping the field source in synthetic source mode: -
`none` : no source is stored - `arrays`: the incoming source is
recorded as-is for arrays of a given field - `all`: the incoming source
is recorded as is for both singleton and array values of a given field
Related to #112012
In synthetic source, storing array elements to `_ignored_source` may
hide other, regular elements from showing up during source synthesizing.
This is due to contents from `_ignored_source` taking precedence over
matching fields from regular source loading.
To avoid this, arrays are pre-emptively tracked and marked for source
storing, if any of their elements needs to store its source. A second
doc parsing phase is introduced that checks for fields missing values
and records their source, while skipping objects and arrays that don't
contain any such fields.
Fixes#112374
Currently, the entire close pipeline is not hooked up in case of a
channel close while a request is being buffered or executed. This commit
resolves the issue by adding a connection to a stream closure.
Currently, unless a rest handler specifies that it handles "unsafe"
buffers, we must copy the http buffers in releaseAndCopy. Unfortuantely,
the original content was slipping through in the initial stream PR. This
less to memory corruption on index and update requests which depend on
buffers being copied.
The header validator is very aggressive about adjusting autoread on the
belief it is the only place where autoread is tweaked. However, with
stream backpressure, we should only change it when we are starting or
finishing header validation.
Currently the rest.incremental_bulk is read in two different places.
This means that it will be employed in two steps introducing
unpredictable behavior. This commit ensures that it is only read in a
single place.
Allow a single bulk request to be passed to Elasticsearch in multiple
parts. Once a certain memory threshold or number of operations have
been received, the request can be split and submitted for processing.
This commit adds a module emitting a deprecation warning when a
dot-prefixed index is manually or automatically created, or when a
composable index template with an index pattern that uses a dot-prefix
is created. This pattern warns that in the future these indices will not
be allowed. In a future breaking change (10.0.0 maybe?) the deprecation
can then be changed to an exception.
These deprecations are only displayed when a non-operator user is using
the API (one that does not set the `X-elastic-product-origin` header).
Several `TransportNodesAction` implementations do some kind of top-level
computation in addition to fanning out requests to individual nodes.
Today they all have to do this once the node-level fanout is complete,
but in most cases the top-level computation can happen in parallel with
the fanout. This commit adds support for an additional `ActionContext`
object, created when starting to process the request and exposed to
`newResponseAsync()` at the end, to allow this parallelization.
All implementations use `(Void) null` for this param, except for
`TransportClusterStatsAction` which now parallelizes the computation of
the cluster-state-based stats with the node-level fanout.
Mainly motivated by simplifying the reference chains for Netty buffers
and have easier to analyze heap dumps in some spots but also a small
performance win in and of itself.
Some small speedups in here from pre-evaluating `isFiltered(properties)`
in lots of spots and not creating an unused `SimpleKey` in `toConcreteKey`
which runs a costly string interning at some rate.
Other than that, obvious deduplication using existing utilities or
adding obvious missing overloads for them.
[TEST] Assert DSL merge policy respects end date
Backing indexes with an end date in the future may still get writes,
so DSL should not apply the merge policy (first configuring the
settings on the index, then doing the force merge) until that time has
passed. The implementation already does this, because
`DataStreamLifecycleService.run()` calls
`timeSeriesIndicesStillWithinTimeBounds` and adds the resulting
indices to `indicesToExcludeForRemainingRun` before calling
`maybeExecuteForceMerge`. This change simply adds a unit test to
ensure that this behaviour does not regress.
Closes#109030
edgeNGram and NGram tokenizers and token filters were deprecated. They have not been supported in indices created from 8.0,
hence their support can entirely be removed from main.
The version related logic around the min grams can also be removed as it refers to 7.x which we no longer need to support.
Relates to #50376, #50862, #43568
* Fix verbose get data stream API not requiring extra privileges
When a user uses the `GET /_data_stream?verbose` API to retrieve the verbose version of the response (which includes the `maximum_timestamp`, as added in #112303), the response object should be performed with the same privilege-checking as the get-data-stream API, meaning that no extra priveleges should be required return the field.
This commit makes the Transport action use an entitled client so that extra privileges are not required, and adds a test to ensure that it works.
* Update docs/changelog/112973.yaml
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.
Dropping support for pre-8.12 requests from remote nodes, and also
cleaning up some unnecessary abstraction in the request builder
hierarchy.
Relates #101815
Relates #107984 (drops some unnecessary trappy timeouts)
All nodes in a cluster involving v9 nodes will understand that
`MINUS_ONE` means an infinite master-node timeout, so there's no need to
fall back to `MAX_VALUE` when talking to older nodes. This commit
removes the unnecessary bwc code.
Replaces the somewhat-awkward API on `ClusterAdminClient` for
manipulating ingest pipelines with some test-specific utilities that are
easier to use.
Relates #107984 in that this change massively reduces the noise that
would otherwise result from removing the trappy timeouts in these APIs.