This updates the gradle wrapper to 8.12
We addressed deprecation warnings due to the update that includes:
- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
(cherry picked from commit ba61f8c7f7)
# Conflicts:
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerCloudElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/distribution/DockerUbiElasticsearchDistributionType.java
# build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/test/Fixture.java
# plugins/repository-hdfs/hadoop-client-api/build.gradle
# server/src/main/java/org/elasticsearch/inference/ChunkingOptions.java
# x-pack/plugin/kql/build.gradle
# x-pack/plugin/migrate/build.gradle
# x-pack/plugin/security/qa/security-basic/build.gradle
* Propagate scoring function through random sampler.
* Update docs/changelog/116957.yaml
* Correct score mode in random sampler weight
* Fix random sampling with scores and p=1.0
* Unit test with scores
* YAML test
* Add capability
Today the overloads of `XContentBuilder#timeField` do two rather
different things: one formats an object as a `String` representation of
a time (where the object is either an unambiguous time object or else a
`long`) and the other formats only a `long` as one or two fields
depending on the `?human` flag.
This is trappy in a number of ways:
- `long` means an absolute (epoch) time, but sometimes folks will
mistakenly use this for time intervals too.
- `long` means only milliseconds, there is no facility to specify a
different unit.
- the dependence on the `?human` flag in exactly one of the overloads is
kinda weird.
This commit removes the confusion by dropping support for considering a
`Long` as a valid representation of a time at all, and instead requiring
callers to either convert it into a proper time object or else call a
method that is explicitly expecting an epoch time in milliseconds.
It's in the title, lots of duplication here that deserves cleanup
in isolation. Also, bucket instances are a perpetual source of
memory consuption in aggs. There are lots of possible improvements
we can make to reduce their footprint, drying up this code enables
cleaner PRs for these improvements.
* Add an override to the aggs tests to override the allow list default setting. This makes it possible to run the scripted metric aggs tests on Serverless, even when we disallow these aggs per default on Serverless.
* Move the allow list tests next to the scripted metric tests since these belong together.
After running the elastic/logs track with logs index mode enabled, I noticed that _source was still getting stored.
The issue was that other index modes than time_series weren't propagated to Indexmetadata and IndexSettings classes. Additionally the synthetic source defaults in SourceFieldMapper were geared towards time series index mode only. This change addresses this.
ime series dimensions are by definition single value field. Therefore let's take advantage of that property in time-series
aggregation and stop trying to iterate over dimension doc values. This change might bring better performance.
We want to validate stats formatting before we serialize to XContent, as chunked x-content serialization
assumes that we don't throw exceptions at that point. It is not necessary to do it in the StreamInput constructor
as this one has been serialise from an already checked object.
This commit adds starts formatting validation to the standard InternalStats constructor.
During aggregation collection, we use BigArrays to hold the values on a compact way for metrics aggregations.
We are currently resizing those arrays whenever the collect method is call, regardless if there is an actual value
in the provided doc.
This can be wasteful for sparse fields as we might never have a value but still we are resizing those arrays.
Therefore this commit moves the resize after checking that there is a value in the provided document.
To simplify the migration away from version based skip checks in YAML specs,
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.
New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
Some tests rely on the default number_of_shards to be 1. This may not
hold if the default number_of_shards changes. This PR removes that
assumption in the tests by explicitly configuring the number_of_shards
to 1 at index creation time.
Relates: #100171
Relates: ES-7911
After tsid hashing was introduced (#98023), the time series aggregator generates the tsid (from all dimension fields) instead of using the value from the _tsid field directly. This generation of the tsid happens for every time serie, parent bucket and segment combination.
This changes alters that by only generating the tsid once per time serie and segment. This is done by just locally recording the current tsid.
* Use String.replace() instead of replaceAll() for non-regexp replacements
When arguments do not make use of regexp features replace() is a more efficient option, especially the char-variant.
We are building a list of InternalAggregations from a list of Buckets, therefore we can use an AbstractList to create the actual list and save some allocations.
A Lucene limitation on doc values for UTF-8 fields does not allow us to
write keyword fields whose size is larger then 32K. This limits our
ability to map more than a certain number of dimension fields for time
series indices. Before introducing this change the tsid is created as a
catenation of dimension field names and values into a keyword field.
To overcome this limitation we hash the tsid. This PR is intended to be
used as a draft to test different options.
Note that, as a side effect, this reduces the size of the tsid field as
a result of storing far less data when the tsid is hashed. Anyway, we
expect tsid hashing to affect compression of doc values and resulting in
larger storage footprint. Effect on query latency needs to be evaluated
too.
Resolves#93564
A number of aggregations that rely on deferred collection don't work
with time series index searcher and will produce incorrect result. These
aggregation usages should fail. The documentation has been updated to
describe these limitations.
In case of multi terms aggregation, the depth first collection is
forcefully used when time series aggregation is used. This behaviour is
inline with the terms aggregation.
`StreamInput#readMap()` is quite different from the other `readMap`
overloads, and pairs up with `StreamOutput#writeGenericMap`. This commit
renames it to avoid accidental misuse and so that the names line up
better between writer and reader.
* Convert Collections.sort() to List.sort()
* Use Map.computeIfAbsent()
* Use primitive double over Double
* Replace some lambdas with method references.
* Replaces for loops with index variable to use foreach iteration style for loops.