Commit graph

8180 commits

Author SHA1 Message Date
Tim Brooks
c5caf84e2d
Move raw path into HttpPreRequest (#113231)
Currently, the raw path is only available from the RestRequest. This
makes the logic to determine if a handler supports streaming more
challenging to evaluate. This commit moves the raw path into pre request
to allow easier streaming support logic.
2024-09-21 05:32:45 +10:00
Mary Gouseti
f4f075a2cc
Add failure store status in index response of data streams (#112816)
The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way:

When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail:
- when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`.
- when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode.

We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are:

- `used`: meaning this document was indexed in the failure store
- `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable.
- `failed`: meaning this failed document, failed to be stored in the failure store.

Example:
```
"errors": true,
  "took": 202,
  "items": [
    {
      "create": {
        "_index": ".fs-my-ds-2024.09.04-000002",
        "_id": "iRDDvJEB_J3Inuia2zgH",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 6,
        "_primary_term": 1,
        "status": 201,
        "failure_store": "used"
      }
    },
    {
      "create": {
        "_index": "ds-no-fs",
        "_id": "hxDDvJEB_J3Inuia2jj3",
        "status": 400,
        "error": {
          "type": "document_parsing_exception",
          "reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "For input string: \"bla\""
          }
        }
      },
      "failure_store": "not_enabled"
    },
    {
      "create": {
        "_index": ".ds-my-ds-2024.09.04-000001",
        "_id": "iBDDvJEB_J3Inuia2jj3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 7,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
```
2024-09-20 10:53:39 +03:00
Mikhail Berezovskiy
cbe7ea0718
Unmute channel when flush last http stream chunk (#113222) 2024-09-19 15:48:13 -07:00
Lee Hinman
4f221bb4c6
Mark data streams stats API as internal-only (again) (#112712)
This is a redo of https://github.com/elastic/elasticsearch/pull/108745 which was reverted. Now that https://github.com/elastic/elasticsearch/pull/112303 has been merged, there is an alternative to retrieve the `maximum_timestamp`.
2024-09-19 13:24:02 -06:00
David Turner
33a366a256
Add extra context to TransportNodesAction invocations (#113140)
Several `TransportNodesAction` implementations do some kind of top-level
computation in addition to fanning out requests to individual nodes.
Today they all have to do this once the node-level fanout is complete,
but in most cases the top-level computation can happen in parallel with
the fanout. This commit adds support for an additional `ActionContext`
object, created when starting to process the request and exposed to
`newResponseAsync()` at the end, to allow this parallelization.

All implementations use `(Void) null` for this param, except for
`TransportClusterStatsAction` which now parallelizes the computation of
the cluster-state-based stats with the node-level fanout.
2024-09-19 17:33:38 +01:00
Kostas Krikellas
e244216c0f
Configure keeping source in FieldMapper (#112706)
Introduces per-field param `synthetic_source_keep` that overrides the
behavior for keeping the field source in synthetic source mode:  -
`none` : no source is stored  - `arrays`: the incoming source is
recorded as-is for arrays of a given field  - `all`: the incoming source
is recorded as is for both singleton and array values of a given field

Related to #112012
2024-09-19 23:29:09 +10:00
Kostas Krikellas
4ff4384550
Retrieve the source for objects and arrays within arrays in a separate parsing phase (#113027)
In synthetic source, storing array elements to `_ignored_source` may
hide other, regular elements from showing up during source synthesizing.
This is due to contents from `_ignored_source` taking precedence over
matching fields from regular source loading. 

To avoid this, arrays are pre-emptively tracked and marked for source
storing, if any of their elements needs to store its source. A second
doc parsing phase is introduced that checks for fields missing values
and records their source, while skipping objects and arrays that don't
contain any such fields.

Fixes #112374
2024-09-19 20:07:31 +10:00
Mikhail Berezovskiy
8e9e6532fe
Release netty ByteBufs in Netty4HttpRequestBodyStreamTests (#113161) 2024-09-19 00:05:36 -07:00
Tim Brooks
3e6acdd48f Merge branch 'partial-rest-requests-rebase' 2024-09-18 16:27:52 -06:00
Tim Brooks
529d349a25 Fix spotless in netty stream class
Spotless broke during a rebase. Fixing in this commit.
2024-09-18 13:59:12 -06:00
Mikhail Berezovskiy
dce8a0bfd3 merge main 2024-09-18 13:52:10 -06:00
Tim Brooks
58e3a39392 Ensure partial bulks released if channel closes (#112724)
Currently, the entire close pipeline is not hooked up in case of a
channel close while a request is being buffered or executed. This commit
resolves the issue by adding a connection to a stream closure.
2024-09-18 13:52:09 -06:00
Tim Brooks
2dbbd7dd45 Ensure http content copied for safe buffers (#112767)
Currently, unless a rest handler specifies that it handles "unsafe"
buffers, we must copy the http buffers in releaseAndCopy. Unfortuantely,
the original content was slipping through in the initial stream PR. This
less to memory corruption on index and update requests which depend on
buffers being copied.
2024-09-18 13:52:09 -06:00
Mikhail Berezovskiy
0d55dc6de4 fix leaking listener (#112629) 2024-09-18 13:51:56 -06:00
Joe Gallo
4d50ab3770
Rework close and shutdown for the geoip processor (#113138) 2024-09-18 15:46:36 -04:00
Tim Brooks
ce2d648d8e Reduce autoread changes in header validator (#112608)
The header validator is very aggressive about adjusting autoread on the
belief it is the only place where autoread is tweaked. However, with
stream backpressure, we should only change it when we are starting or
finishing header validation.
2024-09-18 13:40:39 -06:00
Tim Brooks
95b42a7129 Ensure incremental bulk setting is set atomically (#112479)
Currently the rest.incremental_bulk is read in two different places.
This means that it will be employed in two steps introducing
unpredictable behavior. This commit ensures that it is only read in a
single place.
2024-09-18 13:40:39 -06:00
Tim Brooks
a03fb12b09 Incremental bulk integration with rest layer (#112154)
Integrate the incremental bulks into RestBulkAction
2024-09-18 13:40:39 -06:00
Mikhail Berezovskiy
cbcbc34863 release stream chunk queue on bad request (#112227) 2024-09-18 13:40:39 -06:00
Mikhail Berezovskiy
1b77421cf8 handle 100-continue and oversized streaming request (#112179) 2024-09-18 13:40:39 -06:00
Tim Brooks
478baf1459 Allow incremental bulk request execution (#111865)
Allow a single bulk request to be passed to Elasticsearch in multiple
parts. Once a certain memory threshold or number of operations have
been received, the request can be split and submitted for processing.
2024-09-18 13:40:37 -06:00
Mikhail Berezovskiy
5e1f6554a2 Add http request content stream support (#111438) 2024-09-18 13:38:36 -06:00
Lee Hinman
b94720dca5
Deprecate dot-prefixed indices and composable template index patterns (#112571)
This commit adds a module emitting a deprecation warning when a
dot-prefixed index is manually or automatically created, or when a
composable index template with an index pattern that uses a dot-prefix
is created. This pattern warns that in the future these indices will not
be allowed. In a future breaking change (10.0.0 maybe?) the deprecation
can then be changed to an exception.

These deprecations are only displayed when a non-operator user is using
the API (one that does not set the `X-elastic-product-origin` header).
2024-09-19 05:29:53 +10:00
David Turner
079d680319 Revert "Add extra context to TransportNodesAction invocations (#113086)"
This reverts commit 3fdc8ef554.
2024-09-18 19:28:38 +01:00
David Turner
3fdc8ef554
Add extra context to TransportNodesAction invocations (#113086)
Several `TransportNodesAction` implementations do some kind of top-level
computation in addition to fanning out requests to individual nodes.
Today they all have to do this once the node-level fanout is complete,
but in most cases the top-level computation can happen in parallel with
the fanout. This commit adds support for an additional `ActionContext`
object, created when starting to process the request and exposed to
`newResponseAsync()` at the end, to allow this parallelization.

All implementations use `(Void) null` for this param, except for
`TransportClusterStatsAction` which now parallelizes the computation of
the cluster-state-based stats with the node-level fanout.
2024-09-18 19:07:26 +01:00
Armin Braun
90e343cfef
Use ChannelFutureListener in Netty code to reduce capturing lambdas (#112967)
Mainly motivated by simplifying the reference chains for Netty buffers
and have easier to analyze heap dumps in some spots but also a small
performance win in and of itself.
2024-09-18 18:32:04 +02:00
Armin Braun
e5bcb0c5b3
Remove duplication in settings code and some minor setting speedups (#112897)
Some small speedups in here from pre-evaluating `isFiltered(properties)`
in lots of spots and not creating an unused `SimpleKey` in `toConcreteKey`
which runs a costly string interning at some rate.
Other than that, obvious deduplication using existing utilities or
adding obvious missing overloads for them.
2024-09-18 15:01:49 +02:00
Pete Gillin
81041b47d4
[TEST] Assert DSL merge policy respects end date (#113038)
[TEST] Assert DSL merge policy respects end date

Backing indexes with an end date in the future may still get writes,
so DSL should not apply the merge policy (first configuring the
settings on the index, then doing the force merge) until that time has
passed. The implementation already does this, because
`DataStreamLifecycleService.run()` calls
`timeSeriesIndicesStillWithinTimeBounds` and adds the resulting
indices to `indicesToExcludeForRemainingRun` before calling
`maybeExecuteForceMerge`. This change simply adds a unit test to
ensure that this behaviour does not regress.

Closes #109030
2024-09-18 12:03:49 +01:00
Luca Cavanna
ef37511f0a
Remove deprecations and 7.x related code from analysis common (#113009)
edgeNGram and NGram tokenizers and token filters were deprecated. They have not been supported in indices created from 8.0,
hence their support can entirely be removed from main.

The version related logic around the min grams can also be removed as it refers to 7.x which we no longer need to support.

Relates to #50376, #50862, #43568
2024-09-18 09:03:08 +02:00
Nhat Nguyen
af7ed9515f
Enable ignore_malformed in logsdb (#113072)
This change enables ignore_malformed by default for newly created 
logsdb indices.

Closes #106822
2024-09-17 22:41:41 -07:00
Joe Gallo
ab4c0276d0
Make the GeoIpCache more generic (#113053) 2024-09-17 21:09:37 -04:00
Joe Gallo
e952b76fd6
There's no need to BufferedInputStream within a GZIPInputStream (#113052) 2024-09-17 19:06:50 -04:00
Joe Gallo
ea896f90e7
Rework interfaces for the geoip processor (#113045) 2024-09-17 15:33:11 -04:00
Lee Hinman
4a0ccbf4b4
Fix verbose get data stream API not requiring extra privileges (#112973)
* Fix verbose get data stream API not requiring extra privileges

When a user uses the `GET /_data_stream?verbose` API to retrieve the verbose version of the response (which includes the `maximum_timestamp`, as added in #112303), the response object should be performed with the same privilege-checking as the get-data-stream API, meaning that no extra priveleges should be required return the field.

This commit makes the Transport action use an entitled client so that extra privileges are not required, and adds a test to ensure that it works.

* Update docs/changelog/112973.yaml
2024-09-17 10:03:44 -06:00
Salvatore Campagna
f7880ae85f
LogsDB data migration integration testing (#112710)
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.
2024-09-17 16:26:48 +02:00
Iraklis Psaroudakis
32937109ac
Support writeAtomicBlob from InputStream for repository blob container interface (#112754)
Mostly for fs and hdfs repos, similar to how writeAtomicBlob from
bytes is implemented (write temp file and rename atomically).

Relates ES-9248
2024-09-17 16:08:51 +03:00
Joe Gallo
5efa212db8
Tidy some assertions and code in the GeoIpProcessorTests (#112971) 2024-09-16 18:08:26 -04:00
Oleksandr Kolomiiets
7923870b42
Fix license headers in test files (#112965) 2024-09-16 13:45:33 -07:00
Oleksandr Kolomiiets
9de285e0f1
Add LogsDB challenge tests for reindexing (#112849) 2024-09-16 13:32:40 -07:00
Joe Gallo
e47365f413
Fix getDatabaseType for unusual MMDBs (#112888) 2024-09-16 14:14:34 -04:00
David Turner
3a8835853f
Remove unnecessary bwc from get-aliases API (#112797)
Dropping support for pre-8.12 requests from remote nodes, and also
cleaning up some unnecessary abstraction in the request builder
hierarchy.

Relates #101815
Relates #107984 (drops some unnecessary trappy timeouts)
2024-09-16 06:31:37 +01:00
David Turner
16dcaef3db
Clean up master timeouts bwc in v9 (#112790)
All nodes in a cluster involving v9 nodes will understand that
`MINUS_ONE` means an infinite master-node timeout, so there's no need to
fall back to `MAX_VALUE` when talking to older nodes. This commit
removes the unnecessary bwc code.
2024-09-16 06:31:23 +01:00
Armin Braun
7c6b493829
Remove unused painless package (2 unused classes) (#112898)
It's in the title, this is simply unused code.
2024-09-16 00:30:54 +10:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Kostas Krikellas
86a88d735f
Fix synthetic source field names for multi-fields (#112850)
* Fix synthetic source field names for multi-fields

* enable logsdb in randomized tests

* Revert "enable logsdb in randomized tests"

This reverts commit 2e2c22e2bb.

* Update docs/changelog/112850.yaml

* fix
2024-09-13 15:00:55 +03:00
Quentin Pradet
d18fc4c563
Remove unused parameter in char filter YAML test (#112852) 2024-09-13 12:50:23 +04:00
Oleksandr Kolomiiets
44c9271562
Ensure that fields copied using copy_to are not present in synthetic source (#112625) 2024-09-12 12:18:25 -07:00
Simon Cooper
0d8042e98e
Update last few references in yaml tests from ROOT locale to ENGLISH (#112791) 2024-09-12 11:36:42 +01:00
David Turner
8607d40679
Introduce test utils for ingest pipelines (#112733)
Replaces the somewhat-awkward API on `ClusterAdminClient` for
manipulating ingest pipelines with some test-specific utilities that are
easier to use.

Relates #107984 in that this change massively reduces the noise that
would otherwise result from removing the trappy timeouts in these APIs.
2024-09-12 08:22:50 +01:00
Mark Vieira
4ce661cc48
Bump Elasticsearch version to 9.0.0 (#112570) 2024-09-11 09:40:11 -07:00