Backporting #126637 to 8.18 branch.
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.
The following error was observed in the wild:
```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```
Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:
```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
* Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node
when error_trace=false to reduce unnecessary data transfer and memory on the coordinating
node (#118266). However, all logging of exceptions happens on the coordinating node, so stack
traces disappeared from any logs. This change logs stack traces directly on the data node when
error_trace=false.
(cherry picked from commit 9f6eb1d4e3)
When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug.
This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case.
If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`.
Fixes#124667
Relates to #18613
(cherry picked from commit 36874e8663)
# Conflicts:
# test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
* Fix concurrency issue in ScriptSortBuilder (#123757)
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.
The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.
The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
* iter
All CLIs in elasticsearch support command line flags for controlling the
output level. When --silent is used, the expectation is that normal
logging is omitted. Yet the log4j logger is still configured to output
error level logs. This commit sets the appropriate log level for log4j
depending on the Terminal log level.
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.
Backport of #123346 to 8.x
Closes#123247
In test-scoped internal ITs the `cluster().assertAfterTest()` method was invoked
*after* the cluster nodes were closed. Consequently, the assertions that iterated
over the internal nodes (and asserted some state on nodes after the test) were
all effectively noops.
This PR reverses that order, so that after-test assertions are effective again.
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Conflicts:
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Rename environment dir accessors (#121803)
The node environment has many paths. The accessors for these currently use a "file" suffix, but they are always directories. This commit renames the accessors to make it clear these paths are directories.
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. SearchHitsTests#testFromXContent)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the Latin1 code page. That specific character
doesn't get parsed back to its original form for YAML xContent, which can
lead to rare but hard to diagnose test failures.
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. [SearchHitsTests
testFromXContent](https://github.com/elastic/elasticsearch/issues/97716)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the [Latin1 code
page](https://de.wikipedia.org/wiki/Unicodeblock_Lateinisch-1,_Erg%C3%A4nzung).
That specific character doesn't get parsed back to its original form for
YAML xContent, which can lead to [rare but hard to diagnose test
failures](https://github.com/elastic/elasticsearch/issues/97716#issuecomment-2464465939).
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.
Closes#110649
* Support ignore_above for keywords in test data generation (#119416)
(cherry picked from commit d3f2956116)
* Update DefaultMappingParametersHandler.java
A new query parameter `?include_source_on_error` was added for create / index,
update and bulk REST APIs to control if to include the document source
in the error response in case of parsing errors. The default value is `true`.
Relates to ES-9186.
* Enable queryable built-in roles feature by default (#120323)
Making the `es.queryable_built_in_roles_enabled` feature flag enabled by default.
This feature makes the built-in roles automatically indexed in `.security` index and available
for querying via Query Role API. The consequence of this is that `.security` index is now
created eagerly (if it's not existing) on cluster formation.
In order to keep the scope of this PR small, the feature is disabled for some of the tests,
because they are either non-trivial to adjust or the gain is not worthy the effort to do it now.
The tests will be adjusted in a follow-up PR and later the flag will be removed completely.
Relates to #117581
(cherry picked from commit 52e0f21bdd)
# Conflicts:
# modules/dot-prefix-validation/build.gradle
# test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
# x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
* Update InternalTestCluster.java
remove line snuck after resolving merge confilcs
* Update build.gradle
fix build.gradle
* Update build.gradle
fix build.gradle by removing invalid task
* remove non-existing timeout parameter on 8.x branch
This patch adds a property to CountedKeywordMapper to track the
synthetic_source_keep index setting. This property is then used to properly
implement synthetic source support in the counted_keyword field type, with
fallback to the ignore_source mechanism when synthetic_source_keep is set
in either the field mapping or the index settings.
Applies the fix in `SourceMatcher` from #120756, along with disabling
`SCALED_FLOAT` and `HALF_FLOAT` that have accuracy issues leading to
false positives.
* ES|QL async queries: Partial result on demand (#118122)
Add capability to stop async query on demand
The theory:
- User initiates async search request
- User sends the stop request (POST _query/async/<ID>/stop)
- If the async is finished by that time, it's like regular async get
- If it's not finished, the sinks are closed and the request is forcefully finished
(cherry picked from commit f27f74666f)
# Conflicts:
# x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponse.java
# x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/action/EsqlQueryResponseTests.java
# x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS1UnavailableRemotesIT.java
# x-pack/plugin/security/qa/multi-cluster/src/javaRestTest/java/org/elasticsearch/xpack/remotecluster/CrossClusterEsqlRCS2UnavailableRemotesIT.java
* fix tests
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This call has the side effect that if you are iterating a number of hits
calling this method, you will be increasing the memory usage by a non
trivial number which in most of cases is unwanted. Therefore this commit
removes this caching all together and add an assertion so the method is
call once during the lifetime of the object.
backport #119888
* Test ML model server (#120270)
* Fix model downloading for very small models.
* Test MlModelServer
* Tiny ELSER
* unmute TextEmbeddingCrudIT and DefaultEndPointsIT
* update ELSER
* Improve MlModelServer
* tiny E5
* more logging
* improved E5 model
* tiny reranker
* scan for ports
* [CI] Auto commit changes from spotless
* Serve default models when optimized model is requested
* @ClassRule
* polish code
* Respect dynamic setting ML model repo
* fix metadata for optimized models
* improve logging
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* backport HttpHeaderParser
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Rebuild Inference Metadata Fields During Snapshot Recovery (#120045)
This PR introduces support for reconstructing inference metadata fields that are removed from `_source` by `SourceFieldMapper#applyFilters` during operations recovery.
The inference metadata fields are retrieved using value fetchers and are re-added to `_source` under the `_inference_fields` metadata field.
* fix compil
Here we move the `index.mapping.source.mode` setting to `IndexSettings` because of dependencies
and because of the initialisation order of static fields for classes `IndexSettings` and `SourceFieldMapper`.
Not initialising settings `index.mode`, `index.mapping.source.mode`, and `index.recovery.use_synthetic_source`
in the right order results in multiple `NullPointerException`.
This work is done to simplify another PR #119110
Closeselastic/security-team#11102Closeselastic/security-team#11104
This allows agentless integrations (via elastic/beats#41446, elastic/kibana#203810) to write to agentless-* indices. Each index is created on-demand by the filebeat client and kibana conditionally extends the API key permissions to allow writing to the index.
(cherry picked from commit 3c184b912c)
# Conflicts:
# docs/reference/rest-api/security/get-service-accounts.asciidoc
# x-pack/plugin/security/qa/service-account/src/javaRestTest/java/org/elasticsearch/xpack/security/authc/service/ServiceAccountIT.java
# x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/service/ElasticServiceAccounts.java
* Do not try to enable SecurityManager on JDK 24 (#117999)
* cleanup
* [CI] Auto commit changes from spotless
* more
* [CI] Auto commit changes from spotless
---------
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Metrics for indexing failures due to version conflicts (#119067)
This exposes new OTel node and index based metrics for indexing failures due to version conflicts.
In addition, the /_cat/shards, /_cat/indices and /_cat/nodes APIs also expose the same metric, under the newly added column iifvc.
Relates: #107601
(cherry picked from commit 12eb1cfda1)
# Conflicts:
# server/src/main/java/org/elasticsearch/TransportVersions.java
* types
* Fix NodeIndexingMetricsIT
* [CI] Auto commit changes from spotless
* Fix RestShardsActionTests
* Fix test/cat.shards/10_basic.yml for bwc
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Add new experimental rank_vectors mapping for late-interaction second order ranking (#118804)
Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.
This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models.
For example, this is how you would use it via the API:
```
PUT index
{
"mappings": {
"properties": {
"late_interaction_vectors": {
"type": "rank_vectors"
}
}
}
}
```
Then to index:
```
POST index/_doc
{
"late_interaction_vectors": [[0.1, ...],...]
}
```
For querying, scoring can be exposed with scripting:
```
POST index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "maxSimDotProduct(params.query_vector, 'my_vector')",
"params": {
"query_vector": [[0.42, ...], ...]
}
}
}
}
}
```
Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.
* Update docs/changelog/119601.yaml
When synthetic sources are used in peer recoveries, the translog
operations via peer recoveries may differ from those created through
replication. This change relaxes the translog operation assertion to
account for synthetic source, allowing these operations to be considered
equivalent.
Closes#119191