Commit graph

8978 commits

Author SHA1 Message Date
Ryan Ernst
05d18a2981
Remove SecurityManager code from ingest attachment (#127291)
Now that SecurityManager is gone, there is no longer a need for a
specialized access control context for interacting with tika.
2025-04-24 06:22:10 -07:00
David Turner
b028c0af56
Upgrade repository-s3 to AWS SDK v2 (#126843)
Closes #120993
2025-04-24 21:21:03 +10:00
Keith Massey
ee2d2f313d
Adding settings to data streams (#126947) 2025-04-23 13:27:40 -05:00
David Turner
85a87e71d6
Add end-to-end bulk splitting test (#127237)
Today we do not have a test that verifies the Netty HTTP pipeline
interacts properly with the incremental bulk handling service and splits
requests when the watermark is hit. This commit adds such a test.
2025-04-24 01:42:38 +10:00
Oleksandr Kolomiiets
5e2b199b94
[TEST] Move test data generation out of logsdb namespace (#119994) 2025-04-23 08:29:32 -07:00
Mary Gouseti
db2992f0f8
[Failure Store] Expose failure store lifecycle information via the GET data stream API (#126668)
To retrieve the effective configuration you need to use the `GET` data
streams API, for example, if a data stream has empty data stream
options, it might still have failure store enabled from a cluster
setting. The failure store is managed by default with a lifecycle with
infinite (for now) retention, so the response will look like this:

```
GET _data_stream/*
{
  "data_streams": [
    {
      "name": "my-data-stream",
      "timestamp_field": {
        "name": "@timestamp"
      },
      .....
      "failure_store": {
        "enabled": true,
        "lifecycle": {
          "enabled": true
        },
        "rollover_on_write": false,
        "indices": [
           {
            "index_name": ".fs-my-data-stream-2099.03.08-000003",
            "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
            "managed_by": "Data stream lifecycle"
          }
        ]
      }
    },...
]
```

In case there is a failure indexed managed by ILM the failure index info
will be displayed as follows.

```
      {
          "index_name": ".fs-my-data-stream-2099.03.08-000002",
          "index_uuid": "PA_JquKGSiKcAKBA8DJ5gw",
          "prefer_ilm": true,
          "ilm_policy": "my-lifecycle-policy",
          "managed_by": "Index Lifecycle Management"
        }
```
2025-04-23 23:44:46 +10:00
Niels Bauman
4207cee3eb
Rename data stream transport actions (#127222)
The new action names are more consistent with the rest of the codebase.
2025-04-23 12:40:38 +02:00
Mary Gouseti
b9917086e1
Create dedicated factory methods for data lifecycle (#126487)
The class `DataStreamLifecycle` is currently capturing the lifecycle
configuration that currently manages all data stream indices, but soon
enough it will be split into two variants, the data and the failures
lifecycle. 

Some pre-work has been done already but as we are progressing in our
POC, we see that it will be really useful if the `DataStreamLifecycle`
is "aware" of the target index component. This will allow us to
correctly apply global retention or to throw an error if a downsampling
configuration is provided to a failure lifecycle.

In this PR, we perform a small refactoring to reduce the noise in
https://github.com/elastic/elasticsearch/pull/125658. Here we introduce
the following:

- A factory method that creates a data lifecycle, for now it's trivial but it will be more useful soon.
- We rename the "empty" builder to explicitly mention the index component it refers to.
2025-04-23 20:00:25 +10:00
David Turner
21813604b4
Skip listing MPUs if TTL set to -1 (#127166)
Recent versions of MinIO will sometimes leak multi-part uploads under
concurrent load, leaving them in the `ListMultipartUploads` output even
though they cannot be aborted. Today this causes repository analysis to
fail since compare-and-exchange operations will not even start if there
are any pre-existing uploads. This commit makes it possible to skip this
pre-flight check (and accept the performance consequences) by adjusting
the relevant settings.

Workaround for minio/minio#21189
Closes #122670
2025-04-23 06:33:40 +01:00
Armin Braun
cd609533bf
Fix duplicate strings in SearchHit serialization (#127180)
The map key is always the field name. We exploited this fact in the get results but not in
search hits, leading to a lot of duplicate strings in many heap dumps.
We could do much better here since the names are generally coming out of a know limited set of names,
but as a first step lets at least align the get- and search-responses and non-trivial amount of bytes
in a number of use-cases. Plus, having a single string instance is faster on lookup etc. and saves on CPU
also.
2025-04-22 22:43:27 +02:00
David Turner
f9d813a443
Improve Netty4IncrementalRequestHandlingIT (#127111)
* Verifies that each call to `Netty4HttpRequestBodyStream#next` yields
  exactly one chunk (or the stream is closed) since the
  `IncrementalBulkService` relies on this property.

* Replaces several busy-waits with ones that block on a future for
  faster test execution.

* Replaces several hard-coded constants with randomized values to
  clarify that the precise value does not matter to the test.

* Reduces the use of unnecessary abbreviations in names.

* Reduce the use of global static state in favour of node-local
  components.
2025-04-22 21:37:32 +10:00
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
James Baiera
d928d1a418
Add node feature for failure store, refactor capability names (#126885)
Adds a node feature that is conditionally added to the cluster state if the failure store 
feature flag is enabled. Requires all nodes in the cluster to have the node feature 
present in order to redirect failed documents to the failure store from the ingest node 
or from shard level bulk failures.
2025-04-18 13:42:48 -04:00
Armin Braun
f461f90d48
Remove redundant marker interfaces that extend Bucket (#127038)
No need to have these marker interfaces around when weäre not using them anywhere, all they do is hide a lot of code duplication actually. Removing them sets up the possible removal of hundreds of lines of downstream code it seems
2025-04-18 18:26:39 +02:00
Joe Gallo
b46bee4e47
Correctly handle non-integers in nested paths in the remove processor (#127006) 2025-04-18 11:46:54 -04:00
Lorenzo Dematté
69f6520b0c
[Entitlements] Validation checks on paths (#126852)
With this PR we restrict the paths we allow access to, forbidding plugins to specify/request entitlements for reading or writing to specific protected directories.

I added this validation to EntitlementInitialization, as I wanted to fail fast and this is the earliest occurrence where we have all we need: PathLookup to resolve relative paths, policies (for plugins, server, agents) and the Paths for the specific directories we want to protect.

Relates to ES-10918
2025-04-18 15:36:07 +02:00
elasticsearchmachine
36af046441 Merge patch/serverless-fix into main 2025-04-18 04:30:44 +00:00
Brian Seeders
2a243d8492
Revert #126441 Add flow-control and remove auto-read in netty4 HTTP pipeline (#127030)
* Revert "Release buffers in netty test (#126744)"

This reverts commit f9f3defe92.

* Revert "Add flow-control and remove auto-read in netty4 HTTP pipeline (#126441)"

This reverts commit c8805b85d2.
2025-04-17 12:37:26 -07:00
Nick Tindall
d378185054
Fix GCS tests broken by idempotency token (#126972) 2025-04-17 04:42:32 +02:00
Nick Tindall
17c6e10846
GCS: Use idempotency token to identify requests (#126887) 2025-04-16 15:56:47 +10:00
Mikhail Berezovskiy
5a7a425bd0
Refactor GCS fixture multipart parser (#125828) 2025-04-15 10:09:53 -07:00
David Turner
aa40147142
Add integ tests for ftp:// URL repository (#126757)
We document support for snapshot repositories using `ftp://` URLs but it
seems this functionality has not worked for many years because of
security-manager restrictions, although nobody noticed because it was
not covered by any tests. The migration to the Entitlements framework
means that this functionality now works again, so this commit adds tests
to make sure we do not break it again in future.
2025-04-15 12:57:00 +01:00
Ryan Ernst
83ce15ae06
Make TransportRequest an interface (#126733)
In order to support a future TransportRequest variant that accepts the
response type, TransportRequest needs to be an interface. This commit
adds AbstractTransportRequest as a concrete implementation and makes
TransportRequest a simple interface that joints together the parent
interfaces from TransportMessage.

Note that this was done entirely in Intellij using structural find and
replace.
2025-04-14 14:22:28 -07:00
Mikhail Berezovskiy
f9f3defe92
Release buffers in netty test (#126744) 2025-04-14 13:09:12 -07:00
Brendan Cully
d02b65308e
S3BlobContainer: Revert broadened exception handler (#126731)
Catching Exception instead of AmazonClientException in copyBlob and
executeMultipart led to failures in S3RepositoryAnalysisRestIT due to
the injected exceptions getting wrapped in IOExceptions that prevented
them from being caught and handled in BlobAnalyzeAction.

Closes #126576
2025-04-14 19:20:11 +02:00
Ignacio Vera
ffdfcec334
Upgrade to Lucene 10.2.0 (#126594)
This commit upgrade Elasticsearch to lucene 10.2.0
2025-04-14 13:50:52 +02:00
Mikhail Berezovskiy
c8805b85d2
Add flow-control and remove auto-read in netty4 HTTP pipeline (#126441) 2025-04-11 14:54:22 -07:00
Jack Conradson
c1ecafad6a
Fix painless return type cast for list shortcut (#126724)
This fixes an issue where if a Painless getter method return type
 didn't match a Java getter method return type we add a cast. 
Currentlythis is adding an extraneous cast.

Closes: #70682
2025-04-11 13:50:19 -07:00
Martijn van Groningen
6012590929
Improve resiliency of UpdateTimeSeriesRangeService (#126637)
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.

The following error was observed in the wild:

```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
        at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```

Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:

```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
2025-04-11 12:58:10 +02:00
Armin Braun
dd1db5031e
Move calls to FeatureFlag.enabled to class-load time (#125885)
I noticed that we tend to create the flag instance and call this method
everywhere. This doesn't compile the same way as a real boolean constant
unless you're running with `-XX:+TrustFinalNonStaticFields`.
For most of the code spots changed here that's irrelevant but at least
the usage in the mapper parsing code is a little hot and gets a small
speedup from this potentially.
Also we're simply wasting some bytes for the static footprint of ES by
using the `FeatureFlag` indirection instead of just a boolean.
2025-04-11 01:46:28 +02:00
David Turner
b10b35fccd
Fix S3RepositoryAnalysisRestIT (#126593)
- Translate a 404 during a multipart copy into a `FileNotFoundException`

- Use multiple threads in `S3HttpHandler` to avoid `CopyObject`/`PutObject` deadlock

Closes #126576
2025-04-11 05:41:20 +10:00
Mary Gouseti
78ac5d58ef
[Failure store] Support failure store for system data streams (#126585)
In this PR we add support for the failure store for system data streams.
Specifically:

- We pass the system descriptor so the failure index can be created based on that.
- We extend the tests to ensure it works
- We remove a guard we had but I wasn't able to test it because it only gets triggered if the data stream gets created right after a failure in the ingest pipeline, and I didn't see how to add one (yet).
- We extend the system data stream migration to ensure this is also working.
2025-04-11 05:14:11 +10:00
Jack Conradson
3d54cc3e52
Add leniency to missing array values in mustache (#126550)
In mustache, this change returns null values which convert to empty strings 
instead of throwing an exception when users have a template with 
something like a.8 where the index 8 is out of bounds. This matches the 
behavior for non-existent keys like a.d.

Closes #55200
2025-04-09 14:51:26 -07:00
Brendan Cully
c1a71ff45c
BlobContainer: add copyBlob method (#125737)
* BlobContainer: add copyBlob method

If a container implements copyBlob, then the copy is
performed by the store, without client-side IO. If the store
does not provide a copy operation then the default implementation
throws UnsupportedOperationException.

This change provides implementations for the FS and S3 blob containers.
More will follow.

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Turner <david.turner@elastic.co>
2025-04-09 10:33:01 -07:00
Alexey Ivanov
ecf9adfc78
[main] System data streams are not being upgraded in the feature migration API (#126409)
This commit adds support for system data streams reindexing. The system data stream migration extends the existing system indices migration task and uses the data stream reindex API.
The system index migration task starts a reindex data stream task and tracks its status every second. Only one system index or system data stream is migrated at a time. If a data stream migration fails, the entire system index migration task will also fail.

Port of #123926
2025-04-08 20:42:58 +02:00
David Turner
aab40b1247
Introduce TestBlobContainerBuilder (#126445)
The mostly-optional parameters to `createBlobContainer` are getting
rather numerous in this test harness which makes the tests hard to read.
This commit introduces a builder to help name the provided parameters
and skip the omitted ones.
2025-04-09 01:52:16 +10:00
Joe Gallo
450516d675
Fix a RemoveProcessor test that never ran (#126464) 2025-04-08 11:21:04 -04:00
Dianna Hohensee
4b2867a0ef
Support maxConnections override in AbstractBlobContainerRetriesTestCase tests (#126435) 2025-04-08 09:55:01 -04:00
Mary Gouseti
060a9b746a
[DLM]Use default lifecycle instance instead of default constructor (#126461)
When creating the an empty lifecycle we used to use the default
constructor. This is not just for efficiency but it will allow us to
separate the default data and failures lifecycle in the future.
2025-04-08 23:37:30 +10:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
Joe Gallo
bead858ccd
Correctly handle nulls in nested paths in the remove processor (#126417) 2025-04-07 16:54:07 -04:00
David Turner
fbbbdd7eec
Allow overriding blob container path in tests (#126391)
Some `AbstractBlobContainerRetriesTestCase#createBlobContainer`
implementations choose a path for the container randomly, but we have a
need for a test which re-creates the same container against a different
`S3Service` and `BlobStore` and must therefore specify the same path
each time. This commit exposes a parameter that lets callers specify a
container path.
2025-04-08 03:54:37 +10:00
Mary Gouseti
a525b3d924
Fix test to anticipate force merge failure (#126282)
This test had a copy paste mistake. When the cluster has only one data
node the replicas cannot be assigned so we end up with a force merge
error. In the case of the failure store this was not asserted correctly.

On the other hand, this test only checked for the existence of an error
and it was not ensuring that the current error is not the rollover error
that should have recovered. We make this test a bit more explicit.

Fixes: https://github.com/elastic/elasticsearch/issues/126252
2025-04-05 05:26:58 +11:00
Alexey Ivanov
fd7efe587e
[main] Move system indices migration to migrate plugin (#125437)
* [main] Move system indices migration to migrate plugin

It seems the best way to fix #122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.

Port of #123551

* [CI] Auto commit changes from spotless

* Fix compilation

* Fix tests

* Fix test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-04 18:49:38 +01:00
David Turner
7239540c91
Replace region with regionSupplier in all AWS tests (#126285)
Rather than hard-coding a region name we should always auto-generate it
randomly during test execution. This commit replaces the remaining fixed
`String` arguments with a `Supplier<String>` argument to enable this.
2025-04-05 02:27:28 +11:00
David Turner
3e35900b07
Add missing test security policies (#126309)
Relates #126274 Closes #126301 Closes #126302 Closes #126303 Closes
#126304 Closes #126305 Closes #126306
2025-04-05 02:27:17 +11:00
David Turner
7402dfdf65
Introduce qa subprojects of :modules:repository-s3 (#126274)
Today we have some special-case test classes in `:modules:repository-s3`
within the same source root as the regular tests, with some trickery to
define separate Gradle tasks to run them with their special-case
configs. This commit simplifies the build by just moving each of these
classes into its own Gradle project.
2025-04-04 21:29:05 +11:00
David Turner
896598570c
Reinstate S3SearchableSnapshotsCredentialsReloadIT in FIPS JVMs (#126109)
These tests only don't work in a FIPS JVM because they use a secret key
that is unacceptably short. This commit replaces the relevant uses of
`randomIdentifier` with `randomSecretKey` so they work whether in FIPS
mode or not.
2025-04-04 18:42:09 +11:00
David Turner
7eee6502de
Misc cleanups in S3BlobContainerRetriesTests (#126101)
- Simplify multi-object-delete request detection
- Replace `AtomicBoolean` with volatile field
- Make `ThrottlingDeleteHandler` static
2025-04-04 18:39:51 +11:00
Mikhail Berezovskiy
70654a3633
Add GCS telemtry with ThreadLocal (#125452) 2025-04-03 23:46:06 -07:00