Commit graph

16923 commits

Author SHA1 Message Date
Mike Pellegrini
52495aa5fc
Fix incorrect accounting of semantic text indexing memory pressure (#130221) 2025-06-27 14:29:54 -04:00
Tim Brooks
ea2e7b4382
Reapply "Dispatch ingest work to coordination thread pool (#130152)
This reverts commit 73b0a60.

Additionally, it adds thread pool documentation.
2025-06-27 11:34:28 -06:00
Keith Massey
21bb836ce5
Adding actions to get and update data stream mappings (#130042) 2025-06-27 10:29:06 -05:00
Ignacio Vera
ce74df5c0c
Fix iterating for best centroid when algorithm is neighbour aware and decrease SAMPLES_PER_CLUSTER_DEFAULT (#130069)
* KMeansIntermediate shares assigments
2025-06-27 13:28:12 +02:00
Sam Xiao
3200abc4ce
Make EnterpriseGeoIpDownloaderLicenseListener project aware (#129992) 2025-06-27 16:12:09 +08:00
Jim Ferenczi
93e4e01277
Fix ES818BinaryQuantizedVectorsReader to not use directIO during merge (#130114)
This commit fixes the BBQ reader to **not** use directIO when merging the original float vectors.
2025-06-27 09:03:16 +01:00
Simon Cooper
ff65fd1133
Turn direct IO for BBQ rescoring off by default (#130014)
Add a changelog entry for direct io option
2025-06-27 08:31:20 +01:00
Nick Tindall
77b459c454
Improve accuracy of write load forecast when shard numbers change (#129990) 2025-06-27 13:04:50 +10:00
Joe Gallo
5931f08129
Tidy up project metadata fetching (#130130) 2025-06-26 19:00:22 -04:00
Jordan Powers
40a7d02269
Pull match_only_text fixes into main (#130049)
This brings in the fixes from #130020, with minor fixes to address review
nits from that PR.

Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
2025-06-27 04:31:33 +10:00
Tim Brooks
1d3bd46c6a
Allow larger write queues for large nodes (#130061)
With the rise of larger CPU count nodes our current write queue size
might be too conservative. Indexing pressure will still provide protect
against out of memories.
2025-06-26 12:18:38 -06:00
Albert Zaharovits
0a77bdfbb1
Fix ThreadPoolMergeSchedulerTests testSchedulerCloseWaitsForRunningMerge (#130078)
This fixes a race condition in the test scenario, between the merge
scheduler closing and the merge task being scheduled to run. The test
scenario expects that the merge task runs when the scheduler is closed.
If the merge scheduler is closed before the merge task is scheduled, the
merge task will instead be scheduled as aborted.

Fixes: https://github.com/elastic/elasticsearch/issues/125236
2025-06-27 02:05:38 +10:00
elasticsearchmachine
f70ff89456 Bump to version 9.2.0 2025-06-26 15:32:23 +00:00
Keith Massey
49506362cf
Fixing ComposableIndexTemplateTests (#130052) 2025-06-26 09:41:07 -05:00
Nik Everett
59133d5e00
ESQL: Prepare for backport of documents_found (#130039)
Prepares the `main` branch for the backport of #125631. Specifically,
this adds the version constant for 8.19 to main and the serialization
code that lets main talk to 8.19.
2025-06-26 07:51:17 -04:00
Albert Zaharovits
a6004a6067
Fix ThreadPoolMergeExecutorServiceDiskSpaceTests testAvailableDiskSpaceMonitorWhenFileSystemStatErrors (#130025)
The test method needs to distinguish between the available disk space update values, when they are coming from either FS.
So the update values from the 2 FSs mustn't be equal.

Fixes #129149
2025-06-26 14:05:40 +03:00
Tanguy Leroux
65788cb6fe
Make Engine.awaitPendingClose protected (#130004)
So that the method can be used by classes implementing the Engine
abstract class.
2025-06-26 20:53:21 +10:00
Nick Tindall
4bbdfac252
Add WriteLoadConstraintSettings (#130056)
Relates: ES-11989
2025-06-26 05:14:23 +01:00
Parker Timmins
9aaba25d58
Simple version of patterned_text with a single doc value for arguments (#129292)
Initial version of patterned_text mapper. Behaves similarly to match_only_text. This version uses a single SortedSetDocValues for a template and another for arguments. It splits the message by delimiters, the classifies a token as an argument if it contains a digit. All arguments are concatenated and inserted as a single doc value. A single inverted index is used, without positions. Phrase queries are still possible, using the SourceConfirmedTextQuery, but are not fast.
2025-06-25 21:31:32 -05:00
Gal Lalouche
6970bd24a0
ESQL: Aggressive release of shard contexts (#129454)
Keep better track of shard contexts using RefCounted, so they can be released more aggressively during operator processing. For example, during TopN, we can potentially release some contexts if they don't pass the limit filter.

This is done in preparation of TopN fetch optimization, which will delay the fetching of additional columns to the data node coordinator, instead of doing it in each individual worker, thereby reducing IO. Since the node coordinator would need to maintain the shard contexts for a potentially longer duration, it is important we try to release what we can eariler.

An even more advanced optimization is to delay fetching to the main cluster coordinator, but that would be more involved, since we need to first figure out how to transport the shard contexts between nodes.

Summary of main changes:

DocVector now maintains a RefCounted instance per shard.
Things which can build or release DocVectors (e.g., LuceneSourceOperator, TopNOperator), can also hold RefCounted instances, so they can pass them to DocVector and also ensure contexts aren't released if they can still be potentially used later.
Driver's main loop iteration (runSingleLoopIteration), now closes its operators even between different operator processing. This is extra aggressive, and was mostly done to improve testability.
Added a couple of tests to TopNOperator and a new integration test EsqlTopNShardManagementIT, which uses the pausable plugin framework to check that TopNOperator releases things as early as possible..
2025-06-26 09:49:40 +10:00
Keith Massey
528bd9c234
Adding mappings to data streams (#129787) 2025-06-25 15:03:28 -05:00
Albert Zaharovits
72b3343ed3
Fix ThreadPoolMergeExecutorServiceDiskSpaceTests testUnavailableBudgetBlocksNewMergeTasksFromStartingExecution (#130001)
The test might produce over-budget tasks that cannot run even if all the
other tasks that were blocked (and hold up budget) while running
complete. Rather than prevent submitting such over-budget tasks, this
fix simply sets the merge task's queue available budget to
`Long.MAX_VALUE`, in order to ensure that all merge tasks run before the
test ends.

Fixes https://github.com/elastic/elasticsearch/issues/129148
2025-06-26 04:46:43 +10:00
Albert Zaharovits
98a6354ad4
Fix ThreadPoolMergeExecutorServiceDiskSpaceTests testAbortingOrRunningMergeTaskHoldsUpBudget (#129979)
When there are multiple FS with the same available disk space but different total sizes, it's unpredictable (and irrelevant) which one the checker uses.

Fixes #129823
2025-06-25 12:06:10 +03:00
Panagiotis Bailis
f095b3c592
Fix for DenseVectorFieldMapperTests to properly initialize random vector given the dimensions in mappings (#129912) 2025-06-25 19:05:11 +10:00
Ievgen Degtiarenko
56d5009924
Add query plans to profile output (#128828) 2025-06-25 10:50:04 +02:00
Tim Vernum
8b62a55f2f
Watch SSL files instead of directories (#129738)
With the introduction of entitlements (#120243) and exclusive file
access (#123087) it is no longer safe to watch a whole directory.

In a lot of deployments, the parent directory for SSL config files
will be the main config directory, which also contains exclusive files
such as SAML realm metadata or File realm users. Watching that
directory will cause entitlement warnings because it is not
permissible for core/ssl-config to read files that are exclusively
owned by the security module (or other modules)
2025-06-25 18:24:57 +10:00
Yang Wang
e1c930f8c1
Make RepositoriesService project-aware (#129821)
This PR makes RepositoriesService project aware so that the basic Put,
Get, Delete and Verify repository actions are now project scoped. 

It intentionally leaves the following aspects out of scope for the
current changes: * Repository stats reporting * Repository clean-up,
analysis and integrity verification * Repository usages for searchable
snapshots and CCR

They will be worked on separately. One main reason for leaving them out
is that they are not needed by OBS which is currently blocked by
repository/snapshot changes. They may also have their own complexities,
e.g. stats reporting.

Resolves: ES-10478
2025-06-25 10:34:34 +10:00
David Kyle
3a1551e0ef
[ML] Move to the Cohere V2 API for new inference endpoints (#129884) 2025-06-25 07:51:05 +10:00
Brendan Cully
73b0a60a77
Revert "Dispatch ingest work to coordination thread pool (#129820)" (#129949)
This reverts commit 53dae7a3a2.
2025-06-24 14:38:50 -07:00
HYUNSANG HAN (한현상, Travis)
d16271b78d
Add RemoveBlock API to allow DELETE /{index}/_block/{block} (#129128)
Introduces a new `RemoveBlock` API that complements the existing `AddBlock` API by allowing users to remove index blocks using `DELETE /{index}/_block/{block}`.

Resolves #128966

---------

Co-authored-by: Niels Bauman <nielsbauman@gmail.com>
2025-06-25 06:16:14 +10:00
Tim Grein
3b51dd568c
[EIS] Dense Text Embedding task type integration (#129847) 2025-06-24 21:38:16 +02:00
elasticsearchmachine
ba50e26252 Bump versions after 8.18.3 release 2025-06-24 18:30:29 +00:00
elasticsearchmachine
7c13a1553e Bump versions after 9.0.3 release 2025-06-24 18:12:45 +00:00
elasticsearchmachine
8f1d593119 Bump versions after 8.17.8 release 2025-06-24 17:58:58 +00:00
David Turner
ba103f1c24
Reverse disordered-version warning message (#129904)
The comment in `TransportHandshaker` indicates (correctly) that we emit
a warning when talking to a chronologically-newer-yet-numerically-older
version, but the wording of the warning message is inverted and says
that the remote is chronologically-older-yet-numerically-newer. This
commit straightens out the message to match the situation it is
describing.

Relates #123397
2025-06-24 18:30:11 +01:00
Alexey Ivanov
876c456ac1
Move per-project settings out of ProjectMetadata (#129068)
To better support project restoration after deletion, this change moves project Settings from ProjectMetadata to the new custom in the ClusterState. It also introduces a new transport version for cluster state serialization. Reserved cluster state for project settings remains within ProjectMetadata.

Note: In mixed-version multiproject clusters, this may cause existing settings for projects to temporarily disappear until all nodes have been upgraded and restarted.
2025-06-24 18:06:45 +01:00
Panagiotis Bailis
07f65e978a
Fixing race condition in DynamicMappingIT when checking for updates in mappings (#129931) 2025-06-25 02:31:09 +10:00
Keith Massey
0b58a53a98
Adding the ability to unset data stream settings (#129677) 2025-06-24 10:30:15 -05:00
Panagiotis Bailis
b855266bd1
Make bbq_hnsw the default index option for dense-vector fields with more than 384 dimensions (#129825) 2025-06-24 12:20:16 +03:00
Niels Bauman
5ccb772468
Remove unused BulkProcessor (#129875)
The `BulkProcessor` and `BulkRequestHandler` classes were unused and
could thus be removed along with their test classes.
2025-06-24 10:02:02 +10:00
Brian Rothermich
0f39ff586c
Bring over merge metrics from stateless (#128617)
Relates to an effort to combine the merge schedulers from stateless and stateful. The stateless merge scheduler has MergeMetrics that we want in both stateless and stateful. This PR copies over the merge metrics from the stateless merge scheduler into the combined merge scheduler.

Relates ES-9687
2025-06-23 19:42:01 -04:00
Mark J. Hoy
a671505c8a
Update sparse_vector field mapping to include default setting for token pruning (#129089)
* Initial checkin of refactored index_options code

* [CI] Auto commit changes from spotless

* initial unit testing

* complete unit tests; add yaml tests

* [CI] Auto commit changes from spotless

* register test feature for sparse vector

* Update docs/changelog/129089.yaml

* update changelog

* add docs

* explicit set default index_options if null

* [CI] Auto commit changes from spotless

* update yaml tests; update docs

* fix yaml tests

* readd auth for teardown

* only serialize index options if not default

* [CI] Auto commit changes from spotless

* serialization refactor; pass index version around

* [CI] Auto commit changes from spotless

* fix transport versions merge

* fix up docs

* [CI] Auto commit changes from spotless

* fix docs; add include_defaults unit and yaml test

* [CI] Auto commit changes from spotless

* override getIndexReaderManager for SemanticQueryBuilderTests

* [CI] Auto commit changes from spotless

* cleanup mapper/builder/tests; index vers. in type

still need to refactor / clean YAML tests

* [CI] Auto commit changes from spotless

* cleanups to mapper tests for clarity

* [CI] Auto commit changes from spotless

* move feature into mappers; fix yaml tests

* cleanups; add comments; remove redundant test

* [CI] Auto commit changes from spotless

* escape more periods in the YAML tests

* cleanup mapper and type tests

* [CI] Auto commit changes from spotless

* rename mapping for previous index test

* set explicit number of shards for yaml test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
2025-06-24 08:21:32 +10:00
Pat Whelan
aeb37189af
[ML] SageMaker Elastic Payload (#129413)
Send the Elastic API Payload to a SageMaker endpoint, and parse the
response as if it were an Elastic API response.

- SageMaker now supports all task types in the Elastic API format.
- Streaming is supported using the SageMaker client/server rpc,
  rather than SSE. Payloads must be in a complete and valid JSON
  structure.
- Task Settings can be used for additional passthrough settings, but
  they will not be saved alongside the model. Elastic cannot make
  guarantees on the structure or contents of this payload, so Elastic
  will treat it like the other input payloads and only allow them during
  inference.
2025-06-24 06:43:24 +10:00
Julian Kiryakov
caae426cf7
Pushdown for LIKE (LIST) (#129557)
Improved performance of LIKE (LIST)  by pushing an Automaton to do the evaluation down to Lucine.
2025-06-23 14:35:09 -04:00
Ignacio Vera
ffea6ca2bf
Introduce an int4 off-heap vector scorer (#129824)
* Introduce an int4 off-heap vector scorer

* iter

* Update server/src/main/java/org/elasticsearch/index/codec/vectors/DefaultIVFVectorsReader.java

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2025-06-23 18:44:12 +02:00
Tim Brooks
53dae7a3a2
Dispatch ingest work to coordination thread pool (#129820)
The vast majority of ingest pipelines are light CPU
operations. We don't want these to be put behind IO work on the write
executor. Instead, execute these on the coordination pool.
2025-06-23 09:31:36 -06:00
Panagiotis Bailis
7d4bbcc4bb
Fix for RescoreKnnVectorQueryIT to ensure that BBQ_IVF format is enabled (#129830) 2025-06-23 17:57:31 +03:00
Keith Massey
2f3b2b39c5
Using the STREAMS_LOGS_SUPPORT_8_19 transport version (#129796)
* Using the STREAMS_LOGS_SUPPORT_8_19 transport version

* Update StreamsMetadata.java

Returning null from getMinimalSupportedVersion

* Return minimal supported version as 8.19 for metadata object to fix test fail

---------

Co-authored-by: Luke Whiting <luke.whiting@elastic.co>
2025-06-24 00:20:20 +10:00
Jan Kuipers
a3dac7434b
TransportVersion for backporting ES|QL sample (#129831) 2025-06-23 15:28:14 +02:00
Ignacio Vera
72b488cfa9
[IVF] Improve the format of the tmp file written during merging (#129828)
This commit separe vector and docIds on the tmp file.
2025-06-23 14:44:00 +02:00