Commit graph

177 commits

Author SHA1 Message Date
Chris Hegarty
1255a64832
Upgrade to Lucene 10.2.2 (#129546)
This commit upgrades to Upgrade to Lucene 10.2.2.

With the release of 10.2.2, we no longer need to workaround the Lucene bug mentioned in 128671.
2025-06-22 13:37:22 +01:00
Jonathan Buttner
d9b34d43a5
[ML] Custom service add support for input_type, top_n, and return_documents (#129441)
* Making progress on different request parameters

* Working tests

* Adding custom service validator for rerank

* Fixing embedding bug

* Adding transport version check

* Fixing tests

* Fixing license header

* Fixing writeTo

* Moving file and removing commented code

* Fixing test

* Fixing tests

* Refactoring and tests

* Fixing test
2025-06-20 12:23:48 -04:00
Jeremy Dahlgren
d43198ea3e
Add 'state' query param to GET snapshots API (#128635)
This change introduces a new optional 'state' query parameter for the Get Snapshots API,
allowing users to filter snapshots by state.  The parameter accepts comma-separated
values for states: SUCCESS, IN_PROGRESS, FAILED, PARTIAL, INCOMPATIBLE (case-insensitive).

A new 'snapshots.get.state_parameter' NodeFeature has been added with this change.
The new state query parameter will only be supported in clusters where all nodes support
this feature.

---------

Co-authored-by: Elena Stoeva <elenastoeva99@gmail.com>
2025-06-16 17:07:39 -04:00
Benjamin Trent
155c0da00a
Vector test tools (#128934)
This adds some testing tools for verifying vector recall and latency
directly without having to spin up an entire ES node and running a rally
track.

Its pretty barebones and takes inspiration from lucene-util, but I
wanted access to our own formats and tooling to make our lives easier.

Here is an example config file. This will build the initial index, run
queries at num_candidates: 50, then again at num_candidates 100 (without
reindexing, and re-using the cached nearest neighbors).

```
[{
  "doc_vectors" : "path",
  "query_vectors" : "path",
  "num_docs" : 10000,
  "num_queries" : 10,
  "index_type" : "hnsw",
  "num_candidates" : 50,
  "k" : 10,
  "hnsw_m" : 16,
  "hnsw_ef_construction" : 200,
  "index_threads" : 4,
  "reindex" : true,
  "force_merge" : false,
  "vector_space" : "maximum_inner_product",
  "dimensions" : 768
},
{
"doc_vectors" : "path",
"query_vectors" : "path",
"num_docs" : 10000,
"num_queries" : 10,
"index_type" : "hnsw",
"num_candidates" : 100,
"k" : 10,
"hnsw_m" : 16,
"hnsw_ef_construction" : 200,
"vector_space" : "maximum_inner_product",
"dimensions" : 768
}
]
```

To execute:

```
./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json"
```

Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how
to use it, additionally providing a way to run it via java directly
(useful to bypass gradlew guff).
2025-06-07 02:07:32 +10:00
Mark Vieira
8fce15d067
Allow project reserved state handlers to update any cluster state (#128636) 2025-06-03 12:08:03 -07:00
Benjamin Trent
2a44166a2c
Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732 (#128671)
* Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732

* fixing test

* fixing annot
2025-06-02 09:50:25 -04:00
Benjamin Trent
1324ee0115
Reapply "Adds new unexposed and experimental IVF format (#127528)" (#128005) (#128051)
This reverts commit 8a17a5ed5f.

reapplying ivf format, but with a fix.
2025-05-14 08:47:59 +10:00
John Wagster
8a17a5ed5f
Revert "Adds new unexposed and experimental IVF format (#127528)" (#128005)
This reverts commit ebe8ea6136.
2025-05-09 17:10:11 -05:00
Benjamin Trent
ebe8ea6136
Adds new unexposed and experimental IVF format (#127528) 2025-05-07 14:59:57 -04:00
Ryan Ernst
60ad8ba744
Remove custom SecurityManager (#127778)
Since SecurityManager is no longer used, the custom subclass of
SecurityManager, SecureSM, is no longer needed.
2025-05-06 16:16:46 -07:00
Martijn van Groningen
065c5830cb
First step optimizing tsdb doc values codec merging. (#125403)
The doc values codec iterates a few times over the doc value instance that needs to be written to disk. In case when merging and index sorting is enabled, this is much more expensive, as each time the doc values instance is iterated a merge sorting is performed (in order to get the doc ids of new segment in order of index sorting).

There are several reasons why the doc value instance is iterated multiple times:
* To compute stats (num values, number of docs with value) required for writing values to disk.
* To write bitset that indicate which documents have a value. (indexed disi, jump table)
* To write the actual values to disk.
* To write the addresses to disk (in case docs have multiple values)

This applies for numeric doc values, but also for the ordinals of sorted (set) doc values.

This PR addresses solving the first reason why doc value instance needs to be iterated. This is done only when in case of merging and when the segments to be merged with are also of type es87 doc values, codec version is the same and there are no deletes. Note this optimized merged is behind a feature flag for now.
2025-04-09 07:50:16 +02:00
Alexey Ivanov
ecf9adfc78
[main] System data streams are not being upgraded in the feature migration API (#126409)
This commit adds support for system data streams reindexing. The system data stream migration extends the existing system indices migration task and uses the data stream reindex API.
The system index migration task starts a reindex data stream task and tracks its status every second. Only one system index or system data stream is migrated at a time. If a data stream migration fails, the entire system index migration task will also fail.

Port of #123926
2025-04-08 20:42:58 +02:00
Alexey Ivanov
fd7efe587e
[main] Move system indices migration to migrate plugin (#125437)
* [main] Move system indices migration to migrate plugin

It seems the best way to fix #122949 is to use existing data stream reindex API. However, this API is located in the migrate x-pack plugin. This commit moves the system indices migration logic (REST handlers, transport actions, and task) to the migrate plugin.

Port of #123551

* [CI] Auto commit changes from spotless

* Fix compilation

* Fix tests

* Fix test

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-04 18:49:38 +01:00
Niels Bauman
483f97915c
Run TransportGetIndexAction on local node (#125652)
This action solely needs the cluster state, it can run on any node.
Since this is the last class/action that extends the `ClusterInfo`
abstract classes, we remove those classes too as they're not required
anymore.

Relates #101805
2025-04-02 18:41:35 +01:00
Martijn van Groningen
52d68392d0
Prepare tsdb doc values format for merging optimizations. (#125933)
The change contains the following changes:

- The numDocsWithField field moved from SortedNumericEntry to NumericEntry. Making this statistic always available.
- Store jump table after values in ES87TSDBDocValuesConsumer#writeField(...). Currently it is stored before storing values. This will allow us later to iterate over the SortedNumericDocValues once. When merging, this is expensive as a merge sort on the fly is being executed.

This change will allow all the optimizations that are listed in #125403
2025-04-02 13:39:41 +02:00
Luca Cavanna
8fdf44d708
Remove completion postings format extension (#125253)
A while ago we introduced a completion postings format extension to eventually
be able to customize how completion FSTs are loaded. See #111494.

We have never leveraged this extension, and meanwhile Lucene is moving
to always load FSTs off-heap, and no longer allow on-heap.
See https://github.com/apache/lucene/pull/14364 .

This commit removes the SPI extension as it is no longer needed.
2025-03-20 15:07:53 +01:00
Simon Cooper
135b1658c2
Use valid REST version when determining capabilities (#123864)
If the rest api version is not specified, infer the correct one to use from the major versions present on the cluster, determined using features
2025-03-07 08:18:24 +00:00
Tim Vernum
7e890acabb Export reservedstate.service to serverless (MP-1956) 2025-02-20 10:46:23 +01:00
Niels Bauman
da7d58c06c Merge main into multi-project 2025-01-31 11:21:48 +10:00
Chris Hegarty
4baffe4de1
Upgrade to Lucene 10.1.0 (#119308)
This commit upgrades to Lucene 10.1.0.
2025-01-30 13:41:02 +00:00
Tim Vernum
18528dde76 Merge main into multi-project 2025-01-17 16:32:24 +11:00
Ryan Ernst
ac687e0984
Move query interceptor to an internal plugin interface (#120308)
Query interceptor is meant for internal modules to implement, not any
external plugin. Yet it is defined on SearchPlugin that is available to
all plugin authors. This commit creates an InternalSearchPlugin
interface and moves the query interceptor method to that.
2025-01-17 00:58:54 +00:00
Simon Cooper
5a70623d8d Merge remote-tracking branch 'upstream-main/main' into merge-main-16-01-25 2025-01-16 09:23:46 +00:00
Simon Cooper
a2d84b1b90
Remove assumed features in server for 9.0 (#119946)
All features added before 8.18 can now be assumed and removed in 9.0
2025-01-15 08:37:04 +00:00
Simon Cooper
b0cd47de08 Merge remote-tracking branch 'upstream-main/main' into merge-main-10-01-25T17 2025-01-10 17:00:07 +00:00
Kostas Krikellas
5baf5af757
Configure index sorting through index settings for logsdb (#118968)
* Skip injecting `host.name` for incompatible mappings in logsdb mode

* spotless

* Update docs/changelog/118856.yaml

* fix tsdb

* Configure index sorting through index settings for logsdb

* fix synthetic source usage

* skip injecting host.name

* fix test

* fix compat

* more tests

* add index versioning

* add index versioning

* add index versioning

* minor refactoring

* Update docs/changelog/118968.yaml

* address comments

* inject host.name when possible

* check subobjects

* private settings
2025-01-10 13:22:54 +02:00
Simon Cooper
980959b654 Create multi-project FileSettingsService for project settings files (MP-1865)
Create a multi-project version of `FileSettingsService` to handle individual project settings files.
At the moment, there are no handlers for project-specific data, but it will create projects in the cluster based on the settings files that exist in the directory
2025-01-10 09:16:00 +00:00
Tim Vernum
4ff691f066 Merge revision 7fb6ca447a into multi-project 2024-12-31 15:41:02 +11:00
Nikolaj Volgushev
257ad517d2
Bring back automaton minimization (#119309)
The security codebase relies heavily on automata and caching these. The
Lucene 10 upgrade removed automaton minimization which can result in a
memory usage increase of >5x, esp. for roles with many application
privileges. 

This PR brings back Automaton minimization to avoid the explosion in
roles cache size.

Relates: ES-10451
2024-12-31 01:52:58 +11:00
Yang Wang
fda1fa19d4 Merge main into multi-project 2024-12-13 12:15:25 +11:00
Kathleen DeRusso
c9a6a2c841
Add match support for semantic_text fields (#117839)
* Added query name to inference field metadata

* Fix build error

* Added query builder service

* Add query builder service to query rewrite context

* Updated match query to support querying semantic text fields

* Fix build error

* Fix NPE

* Update the POC to rewrite to a bool query when combined inference and non-inference fields

* Separate clause for each inference index (to avoid inference ID clashes)

* Simplify query builder service concept to a single default inference query

* Rename QueryBuilderService, remove query name from inference metadata

* Fix too many rewrite rounds error by injecting booleans in constructors for match query builder and semantic text

* Fix test compilation errors

* Fix tests

* Add yaml test for semantic match

* Add NodeFeature

* Fix license headers

* Spotless

* Updated getClass comparison in MatchQueryBuilder

* Cleanup

* Add Mock Inference Query Builder Service

* Spotless

* Cleanup

* Update docs/changelog/117839.yaml

* Update changelog

* Replace the default inference query builder with a query rewrite interceptor

* Cleanup

* Some more cleanup/renames

* Some more cleanup/renames

* Spotless

* Checkstyle

* Convert List<QueryRewriteInterceptor> to Map keyed on query name, error on query name collisions

* PR feedback - remove check on QueryRewriteContext class only

* PR feedback

* Remove intercept flag from MatchQueryBuilder and replace with wrapper

* Move feature to test feature

* Ensure interception happens only once

* Rename InterceptedQueryBuilderWrapper to AbstractQueryBuilderWrapper

* Add lenient field to SemanticQueryBuilder

* Clean up yaml test

* Add TODO comment

* Add comment

* Spotless

* Rename AbstractQueryBuilderWrapper back to InterceptedQueryBuilderWrapper

* Spotless

* Didn't mean to commit that

* Remove static class wrapping the InterceptedQueryBuilderWrapper

* Make InterceptedQueryBuilderWrapper part of QueryRewriteInterceptor

* Refactor the interceptor to be an internal plugin that cannot be used outside inference plugin

* Fix tests

* Spotless

* Minor cleanup

* C'mon spotless

* Test spotless

* Cleanup InternalQueryRewriter

* Change if statement to assert

* Simplify template of InterceptedQueryBuilderWrapper

* Change constructor of InterceptedQueryBuilderWrapper

* Refactor InterceptedQueryBuilderWrapper to extend QueryBuilder

* Cleanup

* Add test

* Spotless

* Rename rewrite to interceptAndRewrite in QueryRewriteInterceptor

* DOESN'T WORK - for testing

* Add comment

* Getting closer - match on single typed fields works now

* Deleted line by mistake

* Checkstyle

* Fix over-aggressive IntelliJ Refactor/Rename

* And another one

* Move SemanticMatchQueryRewriteInterceptor.SEMANTIC_MATCH_QUERY_REWRITE_INTERCEPTION_SUPPORTED to Test feature

* PR feedback

* Require query name with no default

* PR feedback & update test

* Add rewrite test

* Update server/src/main/java/org/elasticsearch/index/query/InnerHitContextBuilder.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

---------

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-12-12 16:55:00 +01:00
Niels Bauman
33f48b728f Merge main into multi-project 2024-12-10 05:23:29 +00:00
Benjamin Trent
5e859d9301
Even better(er) binary quantization (#117994)
This measurably improves BBQ by adjusting the underlying algorithm to an
optimized per vector scalar quantization.

This is a brand new way to quantize vectors. Instead of there being a
global set of upper and lower quantile bands, these are optimized and
calculated per individual vector. Additionally, vectors are centered on
a common centroid. 

This allows for an almost 32x reduction in memory, and even better
recall than before at the cost of slightly increasing indexing time.

Additionally, this new approach is easily generalizable to various other
bit sizes (e.g. 2 bits, etc.). While not taken advantage of yet, we may
update our scalar quantized indices in the future to use this new
algorithm, giving significant boosts in recall.

The recall gains spread from 2% to almost 10% for certain datasets with
an additional 5-10% indexing cost when indexing with HNSW when compared
with current BBQ.
2024-12-10 03:06:27 +11:00
Niels Bauman
04da446e42 Merge main into multi-project 2024-12-04 23:18:13 +01:00
Niels Bauman
032b42fcf7
Make TransportLocalClusterStateAction wait for cluster to unblock (#117230)
This will make `TransportLocalClusterStateAction` wait for a new state
that is not blocked. This means we need a timeout (again). For
consistency's sake, we're reusing the REST param `master_timeout` for
this timeout as well.

The only class that was using `TransportLocalClusterStateAction` was
`TransportGetAliasesAction`, so its request needed to accept a timeout
again as well.
2024-12-04 12:17:13 +01:00
Simon Cooper
73645b2daf Merge remote-tracking branch 'upstream-main/main' into merge-main-031224 2024-12-03 15:48:16 +00:00
Benjamin Trent
6c2f6071b2
Refactor/bbq format (#117847)
* Refactor bbq format to be contained in a package

* fixing license headers

* fixing module

* fix style
2024-12-02 16:04:31 -05:00
Yang Wang
92867cdf50 Merge main into multi-project 2024-11-29 08:50:54 +11:00
John Verwolf
8350ff29ba
Extensible Completion Postings Formats (#111494)
Allows the Completion Postings Format to be extensible by providing an implementation of the CompletionsPostingsFormatExtension SPIs.
2024-11-28 13:25:02 -08:00
Martijn van Groningen
6a4b68d263
Add source mode stats to MappingStats (#117463) 2024-11-28 10:53:39 +01:00
Tim Vernum
4cfb619448 Merge main into multi-project 2024-11-19 18:22:02 +11:00
Simon Cooper
cc35f1dc6a
Remove transport versions fixup listener and associated code (#116941) 2024-11-18 16:19:14 +00:00
Simon Cooper
c832572709
Remove some historical features (#116926)
Historical features are now trivially true on v9 - so we can remove the features, and the check.
Historical features do not affect cluster state, so this has no compatibility restrictions.
2024-11-18 14:33:05 +00:00
Niels Bauman
0edb9fa778 Merge remote-tracking branch 'public/main' into merge-main
# Conflicts:
#	server/src/main/java/org/elasticsearch/action/search/TransportSearchShardsAction.java
#	server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationStatsService.java
#	server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java
#	server/src/main/java/org/elasticsearch/plugins/Plugin.java
#	server/src/test/java/org/elasticsearch/gateway/GatewayMetaStateTests.java
#	server/src/test/java/org/elasticsearch/ingest/IngestMetadataTests.java
2024-11-18 10:53:12 +01:00
Alexis Charveriat
e0af1238fc
Index stats enhancement: creation date and tier_preference (#116339)
* Expose tier preference as part of the index stats
* Also expose index creation date in index stats
* Added test
2024-11-15 09:08:42 +01:00
Tim Vernum
da5da54f3f Merge main into multi-project 2024-11-06 16:05:33 +11:00
Patrick Doyle
338c0538b7
Dynamic entitlement agent (#116125)
* Refactor: treat "maybe" JVM options uniformly

* WIP

* Get entitlement running with bridge all the way through, with qualified
exports

* Cosmetic changes to SystemJvmOptions

* Disable entitlements by default

* Bridge module comments

* Fixup forbidden APIs

* spotless

* Rename EntitlementChecker

* Fixup InstrumenterTests

* exclude recursive dep

* Fix some compliance stuff

* Rename asm-provider

* Stop using bridge in InstrumenterTests

* Generalize readme for asm-provider

* InstrumenterTests doesn't need EntitlementCheckerHandle

* Better javadoc

* Call parseBoolean

* Add entitlement to internal module list

* Docs as requested by Lorenzo

* Changes from Jack

* Rename ElasticsearchEntitlementChecker

* Remove logging javadoc

* exportInitializationToAgent should reference EntitlementInitialization, not EntitlementBootstrap.

They're currently in the same module, but if that ever changes, this code would have become wrong.

* Some suggestions from Mark

---------

Co-authored-by: Ryan Ernst <ryan@iernst.net>
2024-11-06 00:07:52 +01:00
Nhat Nguyen
fa6c5296d4
Add num docs and size to logsdb telemetry (#116128)
Follow-up on #115994 to add telemetry for the total number of documents 
and size in bytes of logsdb indices.

Relates #115994
2024-11-05 08:46:46 -08:00
Tim Vernum
2ba2d2a995 Merge main into multi-project 2024-10-31 11:55:04 +11:00
Ying Mao
4ecdfbb214
[Inference API] Add API to get configuration of inference services (#114862)
* Adding API to get list of service configurations

* Update docs/changelog/114862.yaml

* Fixing some configurations

* PR feedback -> Stream.of

* PR feedback -> singleton

* Renaming ServiceConfiguration to SettingsConfiguration. Adding TaskSettingsConfiguration

* Adding task type settings configuration to response

* PR feedback
2024-10-30 13:29:58 -04:00