Commit graph

8867 commits

Author SHA1 Message Date
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
James Baiera
d928d1a418
Add node feature for failure store, refactor capability names (#126885)
Adds a node feature that is conditionally added to the cluster state if the failure store 
feature flag is enabled. Requires all nodes in the cluster to have the node feature 
present in order to redirect failed documents to the failure store from the ingest node 
or from shard level bulk failures.
2025-04-18 13:42:48 -04:00
Armin Braun
f461f90d48
Remove redundant marker interfaces that extend Bucket (#127038)
No need to have these marker interfaces around when weäre not using them anywhere, all they do is hide a lot of code duplication actually. Removing them sets up the possible removal of hundreds of lines of downstream code it seems
2025-04-18 18:26:39 +02:00
Joe Gallo
b46bee4e47
Correctly handle non-integers in nested paths in the remove processor (#127006) 2025-04-18 11:46:54 -04:00
Lorenzo Dematté
69f6520b0c
[Entitlements] Validation checks on paths (#126852)
With this PR we restrict the paths we allow access to, forbidding plugins to specify/request entitlements for reading or writing to specific protected directories.

I added this validation to EntitlementInitialization, as I wanted to fail fast and this is the earliest occurrence where we have all we need: PathLookup to resolve relative paths, policies (for plugins, server, agents) and the Paths for the specific directories we want to protect.

Relates to ES-10918
2025-04-18 15:36:07 +02:00
elasticsearchmachine
36af046441 Merge patch/serverless-fix into main 2025-04-18 04:30:44 +00:00
Brian Seeders
2a243d8492
Revert #126441 Add flow-control and remove auto-read in netty4 HTTP pipeline (#127030)
* Revert "Release buffers in netty test (#126744)"

This reverts commit f9f3defe92.

* Revert "Add flow-control and remove auto-read in netty4 HTTP pipeline (#126441)"

This reverts commit c8805b85d2.
2025-04-17 12:37:26 -07:00
Nick Tindall
d378185054
Fix GCS tests broken by idempotency token (#126972) 2025-04-17 04:42:32 +02:00
Nick Tindall
17c6e10846
GCS: Use idempotency token to identify requests (#126887) 2025-04-16 15:56:47 +10:00
Mikhail Berezovskiy
5a7a425bd0
Refactor GCS fixture multipart parser (#125828) 2025-04-15 10:09:53 -07:00
David Turner
aa40147142
Add integ tests for ftp:// URL repository (#126757)
We document support for snapshot repositories using `ftp://` URLs but it
seems this functionality has not worked for many years because of
security-manager restrictions, although nobody noticed because it was
not covered by any tests. The migration to the Entitlements framework
means that this functionality now works again, so this commit adds tests
to make sure we do not break it again in future.
2025-04-15 12:57:00 +01:00
Ryan Ernst
83ce15ae06
Make TransportRequest an interface (#126733)
In order to support a future TransportRequest variant that accepts the
response type, TransportRequest needs to be an interface. This commit
adds AbstractTransportRequest as a concrete implementation and makes
TransportRequest a simple interface that joints together the parent
interfaces from TransportMessage.

Note that this was done entirely in Intellij using structural find and
replace.
2025-04-14 14:22:28 -07:00
Mikhail Berezovskiy
f9f3defe92
Release buffers in netty test (#126744) 2025-04-14 13:09:12 -07:00
Brendan Cully
d02b65308e
S3BlobContainer: Revert broadened exception handler (#126731)
Catching Exception instead of AmazonClientException in copyBlob and
executeMultipart led to failures in S3RepositoryAnalysisRestIT due to
the injected exceptions getting wrapped in IOExceptions that prevented
them from being caught and handled in BlobAnalyzeAction.

Closes #126576
2025-04-14 19:20:11 +02:00
Ignacio Vera
ffdfcec334
Upgrade to Lucene 10.2.0 (#126594)
This commit upgrade Elasticsearch to lucene 10.2.0
2025-04-14 13:50:52 +02:00
Mikhail Berezovskiy
c8805b85d2
Add flow-control and remove auto-read in netty4 HTTP pipeline (#126441) 2025-04-11 14:54:22 -07:00
Jack Conradson
c1ecafad6a
Fix painless return type cast for list shortcut (#126724)
This fixes an issue where if a Painless getter method return type
 didn't match a Java getter method return type we add a cast. 
Currentlythis is adding an extraneous cast.

Closes: #70682
2025-04-11 13:50:19 -07:00
Martijn van Groningen
6012590929
Improve resiliency of UpdateTimeSeriesRangeService (#126637)
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.

The following error was observed in the wild:

```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
        at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
        at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```

Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:

```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
2025-04-11 12:58:10 +02:00
Armin Braun
dd1db5031e
Move calls to FeatureFlag.enabled to class-load time (#125885)
I noticed that we tend to create the flag instance and call this method
everywhere. This doesn't compile the same way as a real boolean constant
unless you're running with `-XX:+TrustFinalNonStaticFields`.
For most of the code spots changed here that's irrelevant but at least
the usage in the mapper parsing code is a little hot and gets a small
speedup from this potentially.
Also we're simply wasting some bytes for the static footprint of ES by
using the `FeatureFlag` indirection instead of just a boolean.
2025-04-11 01:46:28 +02:00
David Turner
b10b35fccd
Fix S3RepositoryAnalysisRestIT (#126593)
- Translate a 404 during a multipart copy into a `FileNotFoundException`

- Use multiple threads in `S3HttpHandler` to avoid `CopyObject`/`PutObject` deadlock

Closes #126576
2025-04-11 05:41:20 +10:00
Mary Gouseti
78ac5d58ef
[Failure store] Support failure store for system data streams (#126585)
In this PR we add support for the failure store for system data streams.
Specifically:

- We pass the system descriptor so the failure index can be created based on that.
- We extend the tests to ensure it works
- We remove a guard we had but I wasn't able to test it because it only gets triggered if the data stream gets created right after a failure in the ingest pipeline, and I didn't see how to add one (yet).
- We extend the system data stream migration to ensure this is also working.
2025-04-11 05:14:11 +10:00
Jack Conradson
3d54cc3e52
Add leniency to missing array values in mustache (#126550)
In mustache, this change returns null values which convert to empty strings 
instead of throwing an exception when users have a template with 
something like a.8 where the index 8 is out of bounds. This matches the 
behavior for non-existent keys like a.d.

Closes #55200
2025-04-09 14:51:26 -07:00
Brendan Cully
c1a71ff45c
BlobContainer: add copyBlob method (#125737)
* BlobContainer: add copyBlob method

If a container implements copyBlob, then the copy is
performed by the store, without client-side IO. If the store
does not provide a copy operation then the default implementation
throws UnsupportedOperationException.

This change provides implementations for the FS and S3 blob containers.
More will follow.

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: David Turner <david.turner@elastic.co>
2025-04-09 10:33:01 -07:00
Alexey Ivanov
ecf9adfc78
[main] System data streams are not being upgraded in the feature migration API (#126409)
This commit adds support for system data streams reindexing. The system data stream migration extends the existing system indices migration task and uses the data stream reindex API.
The system index migration task starts a reindex data stream task and tracks its status every second. Only one system index or system data stream is migrated at a time. If a data stream migration fails, the entire system index migration task will also fail.

Port of #123926
2025-04-08 20:42:58 +02:00
David Turner
aab40b1247
Introduce TestBlobContainerBuilder (#126445)
The mostly-optional parameters to `createBlobContainer` are getting
rather numerous in this test harness which makes the tests hard to read.
This commit introduces a builder to help name the provided parameters
and skip the omitted ones.
2025-04-09 01:52:16 +10:00
Joe Gallo
450516d675
Fix a RemoveProcessor test that never ran (#126464) 2025-04-08 11:21:04 -04:00
Dianna Hohensee
4b2867a0ef
Support maxConnections override in AbstractBlobContainerRetriesTestCase tests (#126435) 2025-04-08 09:55:01 -04:00
Mary Gouseti
060a9b746a
[DLM]Use default lifecycle instance instead of default constructor (#126461)
When creating the an empty lifecycle we used to use the default
constructor. This is not just for efficiency but it will allow us to
separate the default data and failures lifecycle in the future.
2025-04-08 23:37:30 +10:00
Ryan Ernst
991e80d56e
Remove unnecessary generic params from action classes (#126364)
Transport actions have associated request and response classes. However,
the base type restrictions are not necessary to duplicate when creating
a map of transport actions. Relatedly, the ActionHandler class doesn't
actually need strongly typed action type and classes since they are lost
when shoved into the node client map. This commit removes these type
restrictions and generic parameters.
2025-04-07 16:22:56 -07:00
Joe Gallo
bead858ccd
Correctly handle nulls in nested paths in the remove processor (#126417) 2025-04-07 16:54:07 -04:00
David Turner
fbbbdd7eec
Allow overriding blob container path in tests (#126391)
Some `AbstractBlobContainerRetriesTestCase#createBlobContainer`
implementations choose a path for the container randomly, but we have a
need for a test which re-creates the same container against a different
`S3Service` and `BlobStore` and must therefore specify the same path
each time. This commit exposes a parameter that lets callers specify a
container path.
2025-04-08 03:54:37 +10:00
Mary Gouseti
a525b3d924
Fix test to anticipate force merge failure (#126282)
This test had a copy paste mistake. When the cluster has only one data
node the replicas cannot be assigned so we end up with a force merge
error. In the case of the failure store this was not asserted correctly.

On the other hand, this test only checked for the existence of an error
and it was not ensuring that the current error is not the rollover error
that should have recovered. We make this test a bit more explicit.

Fixes: https://github.com/elastic/elasticsearch/issues/126252
2025-04-05 05:26:58 +11:00
Alexey Ivanov
fd7efe587e
[main] Move system indices migration to migrate plugin (#125437)
* [main] Move system indices migration to migrate plugin

It seems the best way to fix #122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.

Port of #123551

* [CI] Auto commit changes from spotless

* Fix compilation

* Fix tests

* Fix test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-04 18:49:38 +01:00
David Turner
7239540c91
Replace region with regionSupplier in all AWS tests (#126285)
Rather than hard-coding a region name we should always auto-generate it
randomly during test execution. This commit replaces the remaining fixed
`String` arguments with a `Supplier<String>` argument to enable this.
2025-04-05 02:27:28 +11:00
David Turner
3e35900b07
Add missing test security policies (#126309)
Relates #126274 Closes #126301 Closes #126302 Closes #126303 Closes
#126304 Closes #126305 Closes #126306
2025-04-05 02:27:17 +11:00
David Turner
7402dfdf65
Introduce qa subprojects of :modules:repository-s3 (#126274)
Today we have some special-case test classes in `:modules:repository-s3`
within the same source root as the regular tests, with some trickery to
define separate Gradle tasks to run them with their special-case
configs. This commit simplifies the build by just moving each of these
classes into its own Gradle project.
2025-04-04 21:29:05 +11:00
David Turner
896598570c
Reinstate S3SearchableSnapshotsCredentialsReloadIT in FIPS JVMs (#126109)
These tests only don't work in a FIPS JVM because they use a secret key
that is unacceptably short. This commit replaces the relevant uses of
`randomIdentifier` with `randomSecretKey` so they work whether in FIPS
mode or not.
2025-04-04 18:42:09 +11:00
David Turner
7eee6502de
Misc cleanups in S3BlobContainerRetriesTests (#126101)
- Simplify multi-object-delete request detection
- Replace `AtomicBoolean` with volatile field
- Make `ThrottlingDeleteHandler` static
2025-04-04 18:39:51 +11:00
Mikhail Berezovskiy
70654a3633
Add GCS telemtry with ThreadLocal (#125452) 2025-04-03 23:46:06 -07:00
Joe Gallo
950456d38b
Cleanup community_id processor (#126247) 2025-04-03 19:59:09 -04:00
Sam Xiao
b6c6db9861
Add multi-project support for health indicator data_stream_lifecycle (#126056) 2025-04-03 16:26:22 -04:00
Mary Gouseti
488951edf3
Data stream lifecycle does not record error in failure store rollover (#126229)
**Issue** The data stream lifecycle does not register correctly rollover
errors for failure store.

**Observed bahaviour** When data stream lifecycle encounters a rollover
error it records it unless it sees that the current write index of this
data stream doesn't match the source index of the request. However, the
write index check does not use the failure write index but the write
backing index, so the failure gets ignored

**Desired behaviour** When data stream lifecycle encounters a rollover
error it will check the relevant write index before it determines if it
should be recorded or not.
2025-04-04 03:44:09 +11:00
David Turner
69f9914403
Migrate tests away from S3 SDK MD5DigestCalculatingInputStream (#126099)
`S3BlobContainerRetriesTests` uses `MD5DigestCalculatingInputStream`
from the AWS v1 SDK to compute a MD5 checksum, but this feature is not
available in the v2 SDK. With this commit we remove this dependency and
compute the MD5 checksums directly instead.
2025-04-03 14:11:00 +01:00
Mary Gouseti
95257bbf07
Make data stream options multi-project aware (#126141) 2025-04-03 14:33:40 +03:00
Mary Gouseti
25050495b9
Data stream options convert to javaRestTests to yamlRestTests. (#126037)
In this PR we introduce the data stream API in the `es-rest-api` using
the feature flag feature. This enabled us to use the `yamlRestTests`
tests instead of the `javaRestTests`.
2025-04-03 01:32:54 +11:00
David Turner
15899afd26
Remove testWriteBlobWithExceptionThrownAtClosingTime (#126096)
Reverts the test added in #123505 - this is not behaviour on which we
rely any more, and it does not apply with SDKv2 anyway.
2025-04-02 09:43:04 +01:00
Nick Tindall
58c8f4abae
Upgrade to latest GCS SDK (#126087)
Upgrades google cloud SDK used by repository-gcs to com.google.cloud:google-cloud-storage-bom:2.50.0

Closes: ES-9287
2025-04-02 15:41:50 +11:00
Nick Tindall
28dd8e1bae
Make GCS HttpHandler more compliant (#126007)
- Fixed bug where 416 was being erroneously returned for zero-length blobs even with no Range header
- Fixed bug where partial upload wouldn't be completed if the last PUT included no data
- Return 206 (partial content) status when a Range header is specified
- Return an ETag on object get - BlobReadChannel uses this to ensure we fail when the blob is updated between successive chunks being fetched)
- The 416 on zero-length blobs was one of(?) the causes of #125668
2025-04-02 13:05:23 +11:00
David Turner
0d64aab4cc
Clean up request parsing in S3HttpHandler (#126034)
The `METHOD /path/components?and=query` string representation of a
request is becoming increasingly difficult to parse, with slight
variations in parsing between the implementation in `S3HttpHandler` and
the various other implementations. This commit gets rid of the
string-concatenate-and-split behaviour in favour of a proper object that
has predicates for testing all the different kinds of request that might
be made against S3.
2025-04-02 05:49:50 +11:00
Jack Conradson
24e4887748
Remember extraneous Painless code (#126057)
This removes some leftover remnants from using StringBuilder 
as part of String concatenation. Since we no longer support JDK 8, 
this code can be safely removed.
2025-04-01 11:41:54 -07:00