This lowers the number of documents used to test lookup because we have
a few failures over the last few months. These are all cases that we
expect to *pass* so fewer documents should make them even more likely to
pass.
Closes#125913Closes#125779
Backporting #126637 to 8.18 branch.
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.
The following error was observed in the wild:
```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```
Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:
```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
* Log stack traces on data nodes before they are cleared for transport (#125732)
We recently cleared stack traces on data nodes before transport back to the coordinating node
when error_trace=false to reduce unnecessary data transfer and memory on the coordinating
node (#118266). However, all logging of exceptions happens on the coordinating node, so stack
traces disappeared from any logs. This change logs stack traces directly on the data node when
error_trace=false.
(cherry picked from commit 9f6eb1d4e3)
For creating and deleting projects in multi-project tests, we need
create and delete settings and secrets files on the fly. This PR adds
such feature to the Java test cluster with an option to specify the
config directory.
(cherry picked from commit a1b0ed104b)
Co-authored-by: Yang Wang <yang.wang@elastic.co>
When `ExecutorScalingQueue` rejects work to make the worker pool scale up while already being at max pool size (and a new worker consequently cannot be added), available workers might timeout just about at the same time as the task is then force queued by `ForceQueuePolicy`. This has caused starvation of work as observed for `masterService#updateTask` in #124667 where max pool size 1 is used. This configuration is most likely to expose the bug.
This PR changes `EsExecutors.newScaling` to not use `ExecutorScalingQueue` if max pool size is 1 (and core pool size is 0). A regular `LinkedTransferQueue` works perfectly fine in this case.
If max pool size > 1, a probing approach is used to ensure the worker pool is adequately scaled to at least 1 worker after force queueing work in `ForceQueuePolicy`.
Fixes#124667
Relates to #18613
(cherry picked from commit 36874e8663)
# Conflicts:
# test/framework/src/main/java/org/elasticsearch/test/transport/MockTransportService.java
In case when file with `.attach_pid` in name was stored in distribution
and then deleted, the exception could stop copying/linking files
without any sign of issue. The files were then missing in the cluster
used in the test causing them sometimes to fail (depending on which
files haven't been copied).
When using `Files.walk` it is impossible to catch the IOException and
continue walking through files conditionally. It has been replaced with
FileVisitor implementation to be able to continue if the exception is
caused by files left temporarily by JVM but no longer available.
* Fix concurrency issue in ScriptSortBuilder (#123757)
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.
The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.
The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
* iter
* Fix Gradle Deprecation warning as declaring an is- property with a Boolean type has been deprecated.
* Make use of new layout.settingsFolder api to address some cross project references
* Fix buildParams snapshot check for multiprojet projects
(cherry picked from commit e19b2264af)
# Conflicts:
# build-tools-internal/gradle/wrapper/gradle-wrapper.properties
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/BaseInternalPluginBuildPlugin.java
# build-tools-internal/src/main/resources/minimumGradleVersion
# docs/build.gradle
# gradle/wrapper/gradle-wrapper.properties
# plugins/examples/gradle/wrapper/gradle-wrapper.properties
# qa/lucene-index-compatibility/build.gradle
# x-pack/qa/multi-project/core-rest-tests-with-multiple-projects/build.gradle
# x-pack/qa/multi-project/xpack-rest-tests-with-multiple-projects/build.gradle
All CLIs in elasticsearch support command line flags for controlling the
output level. When --silent is used, the expectation is that normal
logging is omitted. Yet the log4j logger is still configured to output
error level logs. This commit sets the appropriate log level for log4j
depending on the Terminal log level.
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.
Backport of #123346 to 8.x
Closes#123247
The reserved roles are already returned from the `ReservedRolesStore`
in `TransportGetRolesAction`. There is no need to query and deserialize
reserved roles from the `.security` index just to be merged with "static" definitions.
In test-scoped internal ITs the `cluster().assertAfterTest()` method was invoked
*after* the cluster nodes were closed. Consequently, the assertions that iterated
over the internal nodes (and asserted some state on nodes after the test) were
all effectively noops.
This PR reverses that order, so that after-test assertions are effective again.
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Conflicts:
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* Rename environment dir accessors (#121803)
The node environment has many paths. The accessors for these currently use a "file" suffix, but they are always directories. This commit renames the accessors to make it clear these paths are directories.
* [CI] Auto commit changes from spotless
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This causes the ESQL heap attack tests to grow their memory usage if
they first don't cause a circuit breaking exception. It just tries again
with more data. That's slow, but it should stop this from failing quite
as much. And it'll give us even more information about failures.
Closes#121465
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. SearchHitsTests#testFromXContent)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the Latin1 code page. That specific character
doesn't get parsed back to its original form for YAML xContent, which can
lead to rare but hard to diagnose test failures.
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
Reenables some heap attack tests, bumping their memory requirements to
try and force a failure on all CI machines. Previously some CI machines
weren't failing, invalidating the test on those machines.
Close#121481Close#121465
Under very unfortunate conditions tests that check xContent objects
roundtrip parsing (like i.e. [SearchHitsTests
testFromXContent](https://github.com/elastic/elasticsearch/issues/97716)
can fail when we happen to randomly pick YAML xContent type and create
random (realistic)Unicode character sequences that may contain the
character U+0085 (133) from the [Latin1 code
page](https://de.wikipedia.org/wiki/Unicodeblock_Lateinisch-1,_Erg%C3%A4nzung).
That specific character doesn't get parsed back to its original form for
YAML xContent, which can lead to [rare but hard to diagnose test
failures](https://github.com/elastic/elasticsearch/issues/97716#issuecomment-2464465939).
This change adds logic to AbstractXContentTestCase#test() which lies at
the core of most of our xContent roundtrip tests that disallows test
instances containing that particular character when using YAML xContent
type.
Closes#97716
* ESQL: Expand HeapAttack for LOOKUP
This expands the heap attack tests for LOOKUP. Now there are three
flavors:
1. LOOKUP a single geo_point - about 30 bytes or so.
2. LOOKUP a one mb string.
3. LOOKUP no fields - just JOIN to alter cardinality.
Fetching a geo_point is fine with about 500 repeated docs before it
circuit breaks which works out to about 256mb of buffered results.
That's sensible on our 512mb heap and likely to work ok for most folks.
We'll flip to a streaming method eventually and this won't be a problem
any more. But for now, we buffer.
The no lookup fields is fine with like 7500 matches per incoming row.
That's quite a lot, really.
The 1mb string is trouble! We circuit break properly which is great and
safe, but if you join 1mb worth of columns in LOOKUP you are going to
need bigger heaps than our test. Again, we'll move from buffering these
results to streaming them and it'll work better, but for now we buffer.
* updates
Adds non-grouping support for min, max, sum, and count, using
CompositeBlock as the underlying block type and an internal
FromAggregateMetricDouble function to handle converting from
CompositeBlock to the correct metric subfields.
Closes#110649
* Support ignore_above for keywords in test data generation (#119416)
(cherry picked from commit d3f2956116)
* Update DefaultMappingParametersHandler.java
A new query parameter `?include_source_on_error` was added for create / index,
update and bulk REST APIs to control if to include the document source
in the error response in case of parsing errors. The default value is `true`.
Relates to ES-9186.
* Enable queryable built-in roles feature by default (#120323)
Making the `es.queryable_built_in_roles_enabled` feature flag enabled by default.
This feature makes the built-in roles automatically indexed in `.security` index and available
for querying via Query Role API. The consequence of this is that `.security` index is now
created eagerly (if it's not existing) on cluster formation.
In order to keep the scope of this PR small, the feature is disabled for some of the tests,
because they are either non-trivial to adjust or the gain is not worthy the effort to do it now.
The tests will be adjusted in a follow-up PR and later the flag will be removed completely.
Relates to #117581
(cherry picked from commit 52e0f21bdd)
# Conflicts:
# modules/dot-prefix-validation/build.gradle
# test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
# x-pack/plugin/security/src/internalClusterTest/java/org/elasticsearch/xpack/security/authc/esnative/ReservedRealmElasticAutoconfigIntegTests.java
* Update InternalTestCluster.java
remove line snuck after resolving merge confilcs
* Update build.gradle
fix build.gradle
* Update build.gradle
fix build.gradle by removing invalid task
* remove non-existing timeout parameter on 8.x branch
This patch adds a property to CountedKeywordMapper to track the
synthetic_source_keep index setting. This property is then used to properly
implement synthetic source support in the counted_keyword field type, with
fallback to the ignore_source mechanism when synthetic_source_keep is set
in either the field mapping or the index settings.