Commit graph

5877 commits

Author SHA1 Message Date
Niels Bauman
1d3cab14e8
Make ESQL join operators project-aware (#130040)
Allows both the enrich and lookup join operators to work with multiple
projects.
2025-06-28 00:17:14 +10:00
Lorenzo Dematté
d6e2b575b7
Stop RecordingApmServer message processing before returning from tests (#130007) 2025-06-27 02:37:32 +10:00
Gal Lalouche
6970bd24a0
ESQL: Aggressive release of shard contexts (#129454)
Keep better track of shard contexts using RefCounted, so they can be released more aggressively during operator processing. For example, during TopN, we can potentially release some contexts if they don't pass the limit filter.

This is done in preparation of TopN fetch optimization, which will delay the fetching of additional columns to the data node coordinator, instead of doing it in each individual worker, thereby reducing IO. Since the node coordinator would need to maintain the shard contexts for a potentially longer duration, it is important we try to release what we can eariler.

An even more advanced optimization is to delay fetching to the main cluster coordinator, but that would be more involved, since we need to first figure out how to transport the shard contexts between nodes.

Summary of main changes:

DocVector now maintains a RefCounted instance per shard.
Things which can build or release DocVectors (e.g., LuceneSourceOperator, TopNOperator), can also hold RefCounted instances, so they can pass them to DocVector and also ensure contexts aren't released if they can still be potentially used later.
Driver's main loop iteration (runSingleLoopIteration), now closes its operators even between different operator processing. This is extra aggressive, and was mostly done to improve testability.
Added a couple of tests to TopNOperator and a new integration test EsqlTopNShardManagementIT, which uses the pausable plugin framework to check that TopNOperator releases things as early as possible..
2025-06-26 09:49:40 +10:00
Keith Massey
528bd9c234
Adding mappings to data streams (#129787) 2025-06-25 15:03:28 -05:00
David Turner
138f350840
Upgrade tests to MinIO RELEASE.2025-06-13T11-33-47Z (#129920)
New MinIO release just dropped, migrating the tests to use it and
dropping the workaround for known issues in older versions.
2025-06-25 19:22:41 +10:00
Ievgen Degtiarenko
56d5009924
Add query plans to profile output (#128828) 2025-06-25 10:50:04 +02:00
Yang Wang
e1c930f8c1
Make RepositoriesService project-aware (#129821)
This PR makes RepositoriesService project aware so that the basic Put,
Get, Delete and Verify repository actions are now project scoped. 

It intentionally leaves the following aspects out of scope for the
current changes: * Repository stats reporting * Repository clean-up,
analysis and integrity verification * Repository usages for searchable
snapshots and CCR

They will be worked on separately. One main reason for leaving them out
is that they are not needed by OBS which is currently blocked by
repository/snapshot changes. They may also have their own complexities,
e.g. stats reporting.

Resolves: ES-10478
2025-06-25 10:34:34 +10:00
Brian Rothermich
0f39ff586c
Bring over merge metrics from stateless (#128617)
Relates to an effort to combine the merge schedulers from stateless and stateful. The stateless merge scheduler has MergeMetrics that we want in both stateless and stateful. This PR copies over the merge metrics from the stateless merge scheduler into the combined merge scheduler.

Relates ES-9687
2025-06-23 19:42:01 -04:00
Mark J. Hoy
a671505c8a
Update sparse_vector field mapping to include default setting for token pruning (#129089)
* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
2025-06-24 08:21:32 +10:00
Mary Gouseti
d859366d4b
Add an error margin when comparing floats. (#129721)
We add a margin of error when comparing floats to the DynamicFieldMatcher to account for a small loss of accuracy when loading fields from synthetic source.
2025-06-23 18:46:46 +03:00
Jonathan Buttner
c7a5c5923c
[ML] Removing Custom Service Feature Flag (#129780)
* Removing feature flag

* Removing missed references
2025-06-23 10:44:59 -04:00
Mikhail Berezovskiy
eeca493860
Move HTTP content aggregation from Netty into RestController (#129302) 2025-06-19 09:05:17 -07:00
Albert Zaharovits
083326e658
Threadpool merge executor does not block aborted merges (#129613)
This PR addresses a bug where aborted merges are blocked if there's
insufficient disk space.

Previously, the merge disk space estimation did not consider if the
operation has been aborted when/while it was enqueued for execution.
Consequently, aborted merges, for e.g. when closing a shard, were
blocked if their disk space estimation was exceeding the available disk
space threshold. In this case, the shard close operation would itself
block.

This fix estimates a disk space budget of `0` for aborted merges, and it
periodically checks if any enqueued merge tasks have been aborted (more
generally, it checks if the budget estimate for any merge tasks has
changed, and reorders the queue if so). This way aborted merges are
prioritized and are never blocked.

Closes https://github.com/elastic/elasticsearch/issues/129335
2025-06-20 00:51:13 +10:00
Mary Gouseti
ee5d652411
Increase node up timeout in AbstractLocalClusterFactory (#129639)
In the last two months a lot of tests were converted to use the newer rest test framework. Some tests start 1 node, other start 3 nodes, others even more, the framework runs tests in parallel but it doesn't know how many nodes its tests needs meaning that running 3 tests in parallel, for example, can be very different when they are single node clusters or 3 node clusters etc. During this execution we saw the 3x more CPU load than what we would want to have ideally.

Currently there is no good solution for this because if dial down the concurrency we will use the nodes inefficiently, but if we keep the concurrency to where it is we risk longer start up times. Considering that the starting time of elasticsearch is not related to this test, we choose to increase the timeout to reduce the noise.
2025-06-19 17:37:37 +03:00
Yang Wang
0932beb1f8
Remove obsolete Metadata.FORMAT field and usages (#129519)
The only production usage is for cleaning up all global state files. It
is replaced by directly calling the relevant method without creating the
FORAMT instance. Test only usages are either replaced by equivalent
method calls or dropped.

Relates: #114698
2025-06-19 15:38:33 +10:00
Yang Wang
92b32b535b
Passing in project-id when creating s3 client (#129301)
Enables creating different s3 clients for different projects

Relates: #127631
2025-06-19 12:34:11 +10:00
Keith Massey
92e4244f8e
Putting the ingest otel processor behind the logs stream feature flag (#129667) 2025-06-18 17:40:12 -05:00
Niels Bauman
398da36f49
Make use of new projectClient method and remove old one (#129393)
We added a new `projectClient` method on `Client` in #129174. We now
update the usages of the old method (on `ProjectResolver`) to use the
new one and we delete the old method.
2025-06-17 13:39:04 +02:00
Carson Ip
466afbab20
[apm-data] Set event.dataset if empty for logs (#129074)
For APM logs, set event.dataset to data_stream.dataset if event.dataset is empty, to satisfy Anomaly Detection's requirement to have event.dataset in every logs-* data stream.
2025-06-17 10:33:46 +01:00
Yang Wang
d4981a40b0
Rename shard heap to estimated heap (#129514)
Relates: ES-11445
2025-06-17 19:33:26 +10:00
Yang Wang
adf4d1005f
Setting for estimated shard heap allocation decider (#128722)
This PR adds a new setting to toggle the collection for shard heap
usages as well as wiring ShardHeapUsage into ClusterInfoSimulator.

Relates: #128723
2025-06-17 13:28:00 +10:00
Ankit Sethi
e437163148
Fixing some test issues (#129295)
* increase the number of retries for apt-get

* Update docs/changelog/129295.yaml

* try without sudo

* increase timeout to see if that fixes it

* mention a couple of more tests

* try inline option

* try inline option

* Delete docs/changelog/129295.yaml

* bump this too

* possibly ok test now
2025-06-16 13:06:48 -05:00
Niels Bauman
19a4ed0188
Remove test dependencies on cluster state API master waiting (#129118)
As preparation for running the cluster state API on the local node, we
need to update these tests that currently depend on that API running on
(and waiting for) the master node.

Relates #127212
2025-06-16 16:04:39 +02:00
Oleksandr Kolomiiets
b24bb3566e
Add new recovery source for reshard split target shards (#129159) 2025-06-13 12:27:18 -07:00
Niels Bauman
af26920f5b
Fix tests depending on _ilm/status API (#129416)
Since #129367 we run the `_ilm/status` API on the local node, which
could cause issues in tests that assume the API runs on the master node
(i.e. they assumed that once the assertion passed, all nodes in the
cluster would have that cluster state, which is not true).
2025-06-13 17:51:07 +02:00
Moritz Mack
9e5cac34a4
Expand bcUpgradeTask to run more test suites. (#128983)
Relates to ES-11904

#128984 contains the changes to the PR buildkite pipeline to test this change while the buildkite changes are not merged yet.
2025-06-13 12:58:49 +02:00
Niels Bauman
87c6fa7e9b
Introduce projectClient method on Client (#129174)
We originally defined the `projectClient` method on `ProjectResolver` as
a convenience method to execute API calls for specific projects. That
method requires a reference to both a `ProjectResolver` and a `Client`.

We now introduce the same method directly on the `Client` interface and
inject a `ProjectResolver` there, removing the need for a
`ProjectResolver` reference in places that just want to execute API
requests on a specific project.

To reduce the number of changes, this change solely focuses on
introducing the new method. Future changes will migrate the uses of the
original method to the new one and remove the original altogether.
2025-06-12 15:19:16 -03:00
David Turner
668fe29a00
Revert MinIO tests to RELEASE.2024-12-18T13-15-44Z (#129337)
This reverses the upgrade in #128424, except it puts the workaround in
the correct test suite and leaves several comments in place.

Relates https://github.com/minio/minio/issues/21377 Closes #129127
Closes #129157
2025-06-13 01:02:31 +10:00
Tim Vernum
f16c2ffcaa
Add a Multi-Project Search Rest Test (#128657)
This commit adds a Rest IT specifically for search in MultiProject.
Everything was already working as expected, but we were a bit light on
explicit testing for search, which as _the_ core capability of
Elasticsearch is worth testing thoroughly and clearly.
2025-06-12 15:09:54 +02:00
Nick Tindall
0702e429f0
Add heap usage estimate to ClusterInfo (#128723)
Co-authored-by: ywangd <yang.wang@elastic.co>
Co-authored-by: rjernst <ryan@elastic.co>
Relates: ES-11445
2025-06-12 13:45:57 +10:00
Lorenzo Dematté
385e0d9259
[BC Upgrage] Fix incorrect version parsing in tests (#129243)
This PR introduces several fixes to various IT tests, related to the use and misuse of the version identifier for the start cluster:

    wherever we can, we replace of versions in test code with features
    where we can't, we make sure we use the actual stack version (the one provided by -Dtests.bwc.main.version and not the bogus "0.0.0" version string)
    when requesting the cluster version we make sure we do use the "unresolved" version identifier (the value of the tests.old_cluster_version system property e.g. 0.0.0 ) so we resolve the right distribution

These changes enabled the tests to be used in BC upgrade tests (and potentially in serverless upgrade tests too, where they would have also failed)

Relates to ES-12010

Precedes #128614, #128823 and #128983
2025-06-11 17:22:54 +02:00
Dimitris Rempapis
0193dadae8
Enable Shard-Level Search-load rate metric (#128660)
Introduces a new search load metric to the stats infrastructure, measured and tracked on a per-shard basis. The metric represents the Exponentially Weighted Moving Rate (EWMR) of search operations, calculated using the "took" time from each completed search phase.
2025-06-11 16:19:48 +03:00
Moritz Mack
b3becfa678
Update ReproduceInfoPrinter to correctly print a reproduction line for Lucene & build candidate upgrade tests (#129044) 2025-06-10 15:42:08 +02:00
Mike Pellegrini
99d7a90e4f
Update Test Framework To Handle Query Rewrites That Rely on Non-Null Searchers (#129160) 2025-06-10 09:02:39 -04:00
Patrick Doyle
7ec8fccf94
Refactor before entitlements for testing (#129099)
* Support multiple plugin source paths

* Refactor: remove unncessary PathLookup method.

It's only called in one place, and there's no need to override it for testing.
Removing it just makes things simpler.

* Refactor: local var for pathLookup

* Fix bugs in test build info parsing

* Fix representative_class in test

* Move BridgeUtilTests.

Tests in org.elasticsearch.entitlement.bridge are going to be uniquely hard to
test once we patch the bridge into java.base, due to Java's prohibition on
split packages.

Let's just move this guy to another package.

* Upcast (?!) Java23EntitlementChecker to EntitlementChecker

* Empty TestPathLookup

* Create PolicyManager during bootstrap, allowing us to share initialization

* Use empty component path list instead of null

* Downcast to the class of the check method.

In our unit test, we have a mock checker that doesn't extend
EntitlementChecker, so downcasting to that would require us to needlessly
rework the unit test.

* Fix javadoc typos
2025-06-09 18:56:07 +02:00
Jeremy Dahlgren
1b49eabc98
Allow missing shard stats for restarted nodes for _snapshot/_status (#128399)
Returns an empty shard stats for shard entries where stats were
unavailable in the case where a node has been restarted or left
the cluster.  The change adds a 'description' field to the
SnapshotIndexShardStatus class that is used to include a message
indicating why the stats are empty. This change was motivated by
a desire to reduce latency for getting the stats for currently
running snapshots.  The stats can still be loaded from the
repository via a _snapshot/<repository>/snapshot/_status call.

Closes ES-10982

Co-authored-by: Dianna Hohensee <artemisapple@gmail.com>
2025-06-09 12:05:41 -04:00
Albert Zaharovits
53f3ab2b01
Threadpool merge executor is aware of available disk space (#127613)
This PR introduces 3 new settings:
indices.merge.disk.check_interval, indices.merge.disk.watermark.high, and indices.merge.disk.watermark.high.max_headroom
that control if the threadpool merge executor starts executing new merges when the disk space is getting low.

The intent of this change is to avoid the situation where in-progress merges exhaust the available disk space on the node's local filesystem.
To this end, the thread pool merge executor periodically monitors the available disk space, as well as the current disk space estimates required by all in-progress (currently running) merges on the node, and will NOT schedule any new merges if the disk space is getting low (by default below the 5% limit of the total disk space, or 100 GB, whichever is smaller (same as the disk allocation flood stage level)).
2025-06-08 17:26:10 +03:00
Niels Bauman
10af017f12
Make ILM ClusterStateWaitStep project-aware (#129042)
This is part of an iterative process to make ILM project-aware.
2025-06-06 18:56:44 +02:00
Benjamin Trent
155c0da00a
Vector test tools (#128934)
This adds some testing tools for verifying vector recall and latency
directly without having to spin up an entire ES node and running a rally
track.

Its pretty barebones and takes inspiration from lucene-util, but I
wanted access to our own formats and tooling to make our lives easier.

Here is an example config file. This will build the initial index, run
queries at num_candidates: 50, then again at num_candidates 100 (without
reindexing, and re-using the cached nearest neighbors).

```
[{
  "doc_vectors" : "path",
  "query_vectors" : "path",
  "num_docs" : 10000,
  "num_queries" : 10,
  "index_type" : "hnsw",
  "num_candidates" : 50,
  "k" : 10,
  "hnsw_m" : 16,
  "hnsw_ef_construction" : 200,
  "index_threads" : 4,
  "reindex" : true,
  "force_merge" : false,
  "vector_space" : "maximum_inner_product",
  "dimensions" : 768
},
{
"doc_vectors" : "path",
"query_vectors" : "path",
"num_docs" : 10000,
"num_queries" : 10,
"index_type" : "hnsw",
"num_candidates" : 100,
"k" : 10,
"hnsw_m" : 16,
"hnsw_ef_construction" : 200,
"vector_space" : "maximum_inner_product",
"dimensions" : 768
}
]
```

To execute:

```
./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json"
```

Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how
to use it, additionally providing a way to run it via java directly
(useful to bypass gradlew guff).
2025-06-07 02:07:32 +10:00
Nik Everett
f2e4201730
ESQL: Check for errors while loading blocks (#129016)
Runs a sanity check after loading a block of values. Previously we were
doing a quick check if assertions were enabled. Now we do two quick
checks all the time. Better - we attach information about how a block
was loaded when there's a problem.

Relates to #128959
2025-06-06 17:38:46 +02:00
Rene Groeschke
342083100b
[Build] Add support for publishing to maven central (#128659)
This ensures we package an aggregation zip with all artifacts we want to publish to maven central as part of a release.
Running zipAggregation will produce a zip file in the build/nmcp/zip folder. The content of this zip is meant to match the maven artifacts we have currently declared as dra maven artifacts.
2025-06-06 17:35:44 +02:00
Niels Bauman
3f037751b4
Remove non-test usages of Metadata.Builder#putCustom (#128801)
This removes all non-test usages of
```
Metadata.Builder.putCustom(String type, ProjectCustom custom)
```
And replaces it with appropriate calls to the equivalent method on
`ProjectMetadata.Builder`.

In most cases this _does not_ make the code project aware, but does
reduce the number of deprecated methods in use.
2025-06-06 09:00:24 +02:00
Jordan Powers
496fb2d5a4
Skip UTF8 to UTF16 conversion during document indexing (#126492)
When parsing documents, we receive the document as UTF-8 encoded data which
we then parse and convert the fields to java-native UTF-16 encoded Strings. 
We then convert these strings back to UTF-8 for storage in lucene.

This patch skips the redundant conversion, instead passing lucene a
direct reference to the received UTF-8 bytes when possible.
2025-06-05 19:50:09 -07:00
Nhat Nguyen
923f029745
Fix block loader with missing ignored source (#129006)
We miss appending null when ignored_source is not available. Our 
randomized tests already cover this case, but we do not check it when
loading fields.

I labelled this non-issue for an unreleased bug.

Closes #128959
Relates #119546
2025-06-06 03:09:58 +02:00
Simon Chase
ee716f11b9
transport: edit TransportConnectionListener for close exceptions (#129015)
The TransportConnectionListener interface has previously included the
Transport.Connection being closed and unregistered in its onNodeDisconnected
callback. This is not in use, and can be removed as it is also available in the
onConnectionClosed callback. It is being replaced with a Nullable exception that
caused the close. This is being used in pending work (ES-11448) to differentiate
network issues from node restarts.

Closes ES-12007
2025-06-05 15:20:08 -07:00
Niels Bauman
560f706801
Make ILM ClusterStateActionStep project-aware (#128880)
This is part of an iterative process to make ILM project-aware.
2025-06-05 06:11:00 -04:00
Jordan Powers
de40ac45d1
Move Text class to libs/xcontent (#128780)
This PR is a precursor to #126492.

It does three things:
1. Move org.elasticsearch.common.text.Text from :server to
   org.elasticsearch.xcontent.Text in :libs:x-content.
2. Refactor the Text class to use a new EncodedBytes record instead of
   the elasticsearch BytesReference.
3. Add the XContentString interface, with the Text class implementing
   that interface.

These changes were originally implemented in #127666 and #128316,
however they were reverted in #128484 due to problems caused by the
mutable nature of java ByteBuffers. This is resolved by instead using a
new immutable EncodedBytes record.
2025-06-04 11:22:03 -07:00
Rene Groeschke
2856923ef0
[Gradle] Use variant aware resolution for deps on hfds-fixture (#128860)
This reworks the dependency resolution for hdfs fixture dependencies to use gradles variant aware dependency resolution instead of relying on outgoing configuration names.
2025-06-04 11:47:26 +03:00
Mayya Sharipova
080a0cdd89
Enable sort optimization on int, short and byte fields (#127968)
Before this PR sorting on integer, short and byte fields types used
SortField.Type.LONG. This made sort optimization impossible for these
field types.

This PR uses SortField.Type.INT for integer, short and byte fields. This
enables sort optimization.

There are several caveats with changing sort type that are addressed: -
Before mixed sort on integer and long fields was automatically
supported, as both field types used SortField.TYPE.LONG. Now when
merging results from different shards, we need to convert sort to LONG
and results to long values. - Similar for collapsing when there is mixed
INT and LONG sort types. - Index sorting. Similarly, before for index
sorting on integer field, SortField.Type.LONG was used. This sort type
is stored in the index writer config on disk and can't be modified. Now
when providing sortField() for index sorting, we need to account for
index version: for older indices return sort with SortField.Type.LONG
and for new indices return SortField.Type.INT.

---

There is only 1 change that  may be considered not backwards compatible:
Before if an integer field was [missing a
value](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/sort-search-results#_missing_values)
, it sort values will return Long.MAX_VALUE in a search response. With
this integer, it sort valeu will return Integer.MAX_VALUE.  But I think
this change is ok, as in our documentation, we don't provide information
what value will be returned, we just say it will be sorted last. 

---

Also closes #127965 (as same type validation in added for collapse
queries)
2025-06-03 07:50:11 +10:00
Ben Chaplin
13bce60be9
Fix inner hits + aggregations concurrency bug (#128036)
Fork InnerHitSubContext instances before source is fetched in 
aggregations to prevent inter-segment race conditions.

Relates to #122419
2025-06-02 16:44:53 -04:00