Commit graph

2514 commits

Author SHA1 Message Date
Benjamin Trent
f478f849e3
Allow reading vectors where dim is in the file (#130138)
This allows configuration to have a `-1` dim to read files that have the
`dim` in the file.

Additionally, allows setting numQuerys to `0` to skip the search phase
easily.
2025-06-27 08:38:21 +10:00
Martijn van Groningen
eef5c14876
Include mapper extras yaml tests into mixed cluster qa module. (#130023) 2025-06-26 10:48:33 +02:00
Lorenzo Dematté
1edf77c1df
Mute testSnapshotRestore in bcUpgradeTest (#129767) 2025-06-20 19:04:09 +01:00
Carlos Delgado
6952cf2b63
Add IVF feature flag to IT tests (#129766) 2025-06-20 23:47:01 +10:00
Rene Groeschke
29db3f3464
[Build] Extract logsdb rolling-upgrade tests (#129673)
- introduce separate subproject for testing logsdb rolling-upgrade tests
- should reduce :qa:rolling-upgrade test task durations
2025-06-19 22:04:36 +02:00
Yang Wang
6858c32529
[Test] Allow allocation in mixed cluster (#129680)
The RunningSnapshotIT upgrade test adds shutdown markers to all nodes
and removes them once all nodes are upgraded. If an index gets created
in a mixed cluster, for example by ILM or deprecation messages, the
index cannot be allocated because all nodes are shutting down. Since the
cluster ready check between node upgrades expects a yellow cluster, the
unassigned index prevents the ready check to succeed and eventually
timeout. This PR fixes it by removing shutdown marker for the 1st
upgrade node to allow it hosting new indices.

Resolves: #129644 Resolves: #129645 Resolves: #129646
2025-06-19 20:13:11 +10:00
Carlos Delgado
f56c6f1b0e
Fix #129104 by adding a FeatureFlag to YAML tests (#129569) 2025-06-18 16:32:10 +02:00
Benjamin Trent
2407358fe0
Adding profiling option to checkVec task (#129502)
Adds simple profiling to checkVec.

```
DO_PROFILING=true ./gradlew :qa:vector:checkVec --args=/path/to/config.json
```
2025-06-17 05:14:03 +10:00
Tommaso Teofili
629a366baa
Make dense_vector fields updatable to bbq_flat/bbq_hnsw (#128291) 2025-06-16 17:15:59 +02:00
Moritz Mack
9e5cac34a4
Expand bcUpgradeTask to run more test suites. (#128983)
Relates to ES-11904

#128984 contains the changes to the PR buildkite pipeline to test this change while the buildkite changes are not merged yet.
2025-06-13 12:58:49 +02:00
John Wagster
be703a034f
Switch IVF Writer to ES Logger (#129224)
update to use ES logger instead of infostream and fixing native access warnings
2025-06-11 17:36:47 -05:00
Lorenzo Dematté
385e0d9259
[BC Upgrage] Fix incorrect version parsing in tests (#129243)
This PR introduces several fixes to various IT tests, related to the use and misuse of the version identifier for the start cluster:

    wherever we can, we replace of versions in test code with features
    where we can't, we make sure we use the actual stack version (the one provided by -Dtests.bwc.main.version and not the bogus "0.0.0" version string)
    when requesting the cluster version we make sure we do use the "unresolved" version identifier (the value of the tests.old_cluster_version system property e.g. 0.0.0 ) so we resolve the right distribution

These changes enabled the tests to be used in BC upgrade tests (and potentially in serverless upgrade tests too, where they would have also failed)

Relates to ES-12010

Precedes #128614, #128823 and #128983
2025-06-11 17:22:54 +02:00
John Wagster
47d4b983af
IVF Hierarchical KMeans Flush & Merge (#128675)
added hierarchical kmeans as a clustering algorithm to better partitionin the space when running ivf on flush and merge
2025-06-10 15:19:27 -05:00
Benjamin Trent
57ef140d2f
Correct index path validation (#129144)
All we care about is if reindex is true or false. We shouldn't worry
about force merge. Because if reindex is true, we will create the
directory, if its false, we won't.
2025-06-09 23:52:16 +10:00
Benjamin Trent
155c0da00a
Vector test tools (#128934)
This adds some testing tools for verifying vector recall and latency
directly without having to spin up an entire ES node and running a rally
track.

Its pretty barebones and takes inspiration from lucene-util, but I
wanted access to our own formats and tooling to make our lives easier.

Here is an example config file. This will build the initial index, run
queries at num_candidates: 50, then again at num_candidates 100 (without
reindexing, and re-using the cached nearest neighbors).

```
[{
  "doc_vectors" : "path",
  "query_vectors" : "path",
  "num_docs" : 10000,
  "num_queries" : 10,
  "index_type" : "hnsw",
  "num_candidates" : 50,
  "k" : 10,
  "hnsw_m" : 16,
  "hnsw_ef_construction" : 200,
  "index_threads" : 4,
  "reindex" : true,
  "force_merge" : false,
  "vector_space" : "maximum_inner_product",
  "dimensions" : 768
},
{
"doc_vectors" : "path",
"query_vectors" : "path",
"num_docs" : 10000,
"num_queries" : 10,
"index_type" : "hnsw",
"num_candidates" : 100,
"k" : 10,
"hnsw_m" : 16,
"hnsw_ef_construction" : 200,
"vector_space" : "maximum_inner_product",
"dimensions" : 768
}
]
```

To execute:

```
./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json"
```

Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how
to use it, additionally providing a way to run it via java directly
(useful to bypass gradlew guff).
2025-06-07 02:07:32 +10:00
Martijn van Groningen
e23a5e7661
Enable security in a number of logsdb and tsdb integration tests. (#128877)
This change enables security in a number of tsdb and logsdb integration tests. A number of java/yaml rest tests in logsdb module, additionally logsdb and tsdb rolling upgrade tests.

A recent bug (#128050) wouldn't have happened if logsdb rolling upgrade tests ran with security enabled.
2025-06-04 17:23:25 +03:00
Mayya Sharipova
080a0cdd89
Enable sort optimization on int, short and byte fields (#127968)
Before this PR sorting on integer, short and byte fields types used
SortField.Type.LONG. This made sort optimization impossible for these
field types.

This PR uses SortField.Type.INT for integer, short and byte fields. This
enables sort optimization.

There are several caveats with changing sort type that are addressed: -
Before mixed sort on integer and long fields was automatically
supported, as both field types used SortField.TYPE.LONG. Now when
merging results from different shards, we need to convert sort to LONG
and results to long values. - Similar for collapsing when there is mixed
INT and LONG sort types. - Index sorting. Similarly, before for index
sorting on integer field, SortField.Type.LONG was used. This sort type
is stored in the index writer config on disk and can't be modified. Now
when providing sortField() for index sorting, we need to account for
index version: for older indices return sort with SortField.Type.LONG
and for new indices return SortField.Type.INT.

---

There is only 1 change that  may be considered not backwards compatible:
Before if an integer field was [missing a
value](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/sort-search-results#_missing_values)
, it sort values will return Long.MAX_VALUE in a search response. With
this integer, it sort valeu will return Integer.MAX_VALUE.  But I think
this change is ok, as in our documentation, we don't provide information
what value will be returned, we just say it will be sorted last. 

---

Also closes #127965 (as same type validation in added for collapse
queries)
2025-06-03 07:50:11 +10:00
Moritz Mack
cdd208704c
Add initial bcUpgradeTask (#128588) 2025-06-02 11:21:51 +02:00
Rene Groeschke
38c90ca8d4
Restructure docker files for docker distributions (#127960)
Restructures docker files for docker distributions

- Put Dockerfiles in specific distro specific folders keeping "Dockerfile" naming convention
- Allows better ide support
- Allows easier renovate integration
- Explicitly set base image in dockerfile
- simplify renovate configuration
- Cleanup DockerBase file to not contain ess fips base image information

This lives now in the Dockerfile content directly

* Workaround docker test issue

* Fix labels for fips image
2025-05-19 19:47:34 +02:00
Rene Groeschke
f0d7ec47b5
[Test] Rework detecting elasticsearch process in docker tests (#128013)
* [Test] Rework detecting elasticsearch process in docker tests

This tweaks detecting the elasticsearch process id by using jps instead of ps which has been problematic in the past exceeding available COLUMN sizes due to es commandline invocation getting longer and longer

* Remove few muted tests

* Reuse ps for detecting processes but use pipe to find the right one

jps doesnt work well with different users

* Tweak java command running lookup to work with wolfi

* Cleanup changes

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-05-16 09:05:59 -07:00
Ryan Ernst
9537388897
Remove doPrivileged uses from server (#127781)
Now that SecurityManager is no longer used, doPrivileged is no longer
necessary. This commit removes uses of it from core and server
2025-05-07 07:24:53 -07:00
Ryan Ernst
22a52a9c64
Remove security manager policy files (#127727)
Now that security manager is gone, the policy files are no longer
needed. This commit removes the server, test and plugin specific policy
files
2025-05-06 19:37:46 +02:00
Ryan Ernst
8047250e4c
Remove SecurityManager policy classes (#127653)
Now that SecurityManager is no longer used, we can remove the
Elasticsearch policy classes and helpers.
2025-05-05 11:43:32 -07:00
Christoph Büscher
529daca66c
Increase timeout for index migration in FullClusterRestartSystemIndexCompatibilityIT (#127710)
This test occasionally fails on version 8.19 clusters when we are waiting
for system index migration to finish. This changes the wait time from the
10s default to 30s to account for the occasional super slow cluster in tests.

Closes #127245
2025-05-05 19:37:31 +02:00
Ryan Ernst
4d907ce2a2
Remove Security manager bootstrap (#127590)
Furthing cleanup of the now unused security manager, this commit removes
the bootstrap Security class that setup SecurityManager.
2025-05-01 12:42:56 -07:00
Ben Chaplin
053895854d
Always log data node failures (#127420)
Log search exceptions as they occur on the data node no matter the value 
of error_trace.
2025-04-29 09:40:31 -04:00
Dianna Hohensee
0700b24dd0
Create some general test utilities (#127407)
Moving around and adding some test utilities.
2025-04-28 14:10:28 -04:00
Niels Bauman
c72d00fd39
Don't start a new node in InternalTestCluster#getClient (#127318)
This method would default to starting a new node when the cluster was
empty. This is pretty trappy as `getClient()` (or things like
`getMaster()` that depend on `getClient()`) don't look at all like
something that would start a new node.

In any case, the intention of tests is much clearer when they explicitly
define a cluster configuration.
2025-04-25 10:07:52 +02:00
Chris Hegarty
19550a838f
Add dense vector off-heap stats to Node stats and Index stats APIs (#126704)
This change enhances the dense_vector section of the Nodes stats and Index stats APIs so that they report the desired size of off-heap memory for all indexed vectors. The dense_vector section of the Custer stats API remains unchanged.

The retrieval mechanism and structure of the new stats is the same across the various three stats APIs, but more fine-grained information is disclosed as when moving from Cluster -> Node -> Index API.

For Node stats, we aggregate the total byte sizes for all vectors, categorised by the data type. For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1
  }
}
Index stats: same as Node stats with included field break down . For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1,
    "fielddata" : {
      "bar" : {
        "veb_size_in_bytes" : 3,
        "vec_size_in_bytes" : 14,
        "vex_size_in_bytes" : 1
      },
      "foo" : {
        "vec_size_in_bytes" : 9
      }
    }
  }
The implementation accesses the actual statistics through reflection. This will be completely removed when Lucene exposes this, which is expected in Lucene 10.3
2025-04-23 15:04:44 +01:00
Ryan Ernst
b5e92db171
Remove security manager from tests (#127087)
Now that entitlements are always used, there is no need to run tests
with security manager (a future enhancement will run tests with
entitlements). This commit removes setting up security manager from
tests.
2025-04-22 18:08:09 +02:00
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
Lorenzo Dematté
69f6520b0c
[Entitlements] Validation checks on paths (#126852)
With this PR we restrict the paths we allow access to, forbidding plugins to specify/request entitlements for reading or writing to specific protected directories.

I added this validation to EntitlementInitialization, as I wanted to fail fast and this is the earliest occurrence where we have all we need: PathLookup to resolve relative paths, policies (for plugins, server, agents) and the Paths for the specific directories we want to protect.

Relates to ES-10918
2025-04-18 15:36:07 +02:00
David Turner
1461820dac
Fix race condition in RestCancellableNodeClient (#126686)
Today we rely on registering the channel after registering the task to
be cancelled to ensure that the task is cancelled even if the channel is
closed concurrently. However the client may already have processed a
cancellable request on the channel and therefore this mechanism doesn't
work. With this change we make sure not to register another task after
draining the registrations in order to cancel them.

Closes #88201
2025-04-12 00:59:46 +10:00
Jordan Powers
4c174a891f
Use Lucene101 postings format by default (#126080)
Update the PerFieldFormatSupplier so that new standard indices use the
Lucene101PostingsFormat instead of the current default ES812PostingsFormat.

Currently, use of the new codec is gated behind a feature flag.
2025-04-04 12:41:27 -07:00
Alexey Ivanov
fd7efe587e
[main] Move system indices migration to migrate plugin (#125437)
* [main] Move system indices migration to migrate plugin

It seems the best way to fix #122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.

Port of #123551

* [CI] Auto commit changes from spotless

* Fix compilation

* Fix tests

* Fix test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-04 18:49:38 +01:00
Ben Chaplin
9f6eb1d4e3
Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node when error_trace=false to reduce unnecessary data transfer and memory on the coordinating node (#118266). However, all logging of exceptions happens on the coordinating node, so stack traces disappeared from any logs. This change logs stack traces directly on the data node when error_trace=false.
2025-04-03 13:45:09 -04:00
Niels Bauman
483f97915c
Run TransportGetIndexAction on local node (#125652)
This action solely needs the cluster state, it can run on any node.
Since this is the last class/action that extends the `ClusterInfo`
abstract classes, we remove those classes too as they're not required
anymore.

Relates #101805
2025-04-02 18:41:35 +01:00
Niels Bauman
eb4d64f94a
Run TransportGetSettingsAction on local node (#126051)
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
2025-04-02 15:05:31 +01:00
Lorenzo Dematté
40dd91b800
[Entitlements] Replace Permissions with Entitlements in InstallPluginAction (#125207)
This PR replaces the parsing and formatting of SecurityManager policies with the parsing and formatting of Entitlements policy during plugin installation.

Relates to ES-10923
2025-04-02 11:03:27 +01:00
Armin Braun
fd2cc97541
Introduce batched query execution and data-node side reduce (#121885)
This change moves the query phase a single roundtrip per node just like can_match or field_caps work already. 
A a result of executing multiple shard queries from a single request we can also partially reduce each node's query results on the data node side before responding to the coordinating node.

As a result this change significantly reduces the impact of network latencies on the end-to-end query performance, reduces the amount of work done (memory and cpu) on the coordinating node and the network traffic by factors of up to the number of shards per data node!

Benchmarking shows up to orders of magnitude improvements in heap and network traffic dimensions in querying across a larger number of shards.
2025-03-29 16:53:18 +01:00
Carlos Delgado
968bddc462
Non existing synonyms sets do not fail shard recovery (#125659) 2025-03-27 18:04:20 +02:00
Yang Wang
0c8daaeca5
Make SnapshotsInProgress project compatible (#125470)
This PR adds project-id to both SnapshotsInProgress and Snapshot so that
they are aware of projects and ready to handle snapshots from multiple
projects.

Relates: ES-10224
2025-03-27 10:54:53 +11:00
Mark Vieira
0388a5980c
Migrate legacy QA projects to new test clusters framework (#125545) 2025-03-26 10:05:56 -07:00
Keith Massey
13110d778b
Using a consistent index template name to avoid undefined behavior (#125624) 2025-03-26 01:00:27 +02:00
Mark Vieira
65751062f7
Re-enable VerifyVersionConstantsIT (#125605) 2025-03-25 12:16:53 -07:00
Armin Braun
50437e79d3
Cleanup missing use of StandardCharsets (#125424)
Random annoyance that I figured, I'd just fix globally:
We can do a bit of a cleaner job when doing byte <-> string conversion here and there.
2025-03-21 20:10:15 +01:00
Tanguy Leroux
8a56518034
[CI] Reenable N-2 BWC tests for non-snapshot builds (#125296)
We can reenable those tests for `release-test`, now the
code exist in 8.18, 8.x and 9.0 branches.

Closes #119550
2025-03-21 10:18:10 +01:00
Rene Groeschke
38f243944e
Fix failing docker packaging tests due to too long commandline (#125053)
* Increate COLUMN for detecting running elasticsearch instance
* Unmute DockerTest test
2025-03-20 08:27:40 +01:00
Martijn van Groningen
ae16016290
Update disable assertion jvm args from bwc/mixed cluster setups. (#125074)
Remove `-da:org.elasticsearch.index.mapper.DocumentMapper` and `-da:org.elasticsearch.index.mapper.MapperService` from mixed cluster/bwc cluster setups. Given that #122606 is now backported to the 8.18 branch.
2025-03-19 08:10:50 +01:00
Rene Groeschke
ae569def9c
[Build] Require reason for usesDefaultDistribution (#124707)
This makes using usesDefaultDistribution in our test setup for explicit by requiring a reason why it's needed.
This is helpful as part of revisiting the need for all those usages in our code base.
2025-03-17 08:25:39 +01:00