* ESQL: Speed up VALUES for many buckets (#123073)
Speeds up the VALUES agg when collecting from many buckets.
Specifically, this speeds up the algorithm used to `finish` the
aggregation. Most specifically, this makes the algorithm more tollerant
to large numbers of groups being collected. The old algorithm was
`O(n^2)` with the number of groups. The new one is `O(n)`
```
(groups)
1 219.683 ± 1.069 -> 223.477 ± 1.990 ms/op
1000 426.323 ± 75.963 -> 463.670 ± 7.275 ms/op
100000 36690.871 ± 4656.350 -> 7800.332 ± 2775.869 ms/op
200000 89422.113 ± 2972.606 -> 21920.288 ± 3427.962 ms/op
400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op
```
The `1` group version was not changed at all. That's just noise in the
measurement. The small bump in the `1000` case is almost certainly worth
it and real. The huge drop in the `100000` case is quite real.
* Fix
* Compile
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.
Backport of #123346 to 8.x
Closes#123247
These things can be quite expensive and there's no need to recompute
them in parallel across all management threads as done today. This
commit adds a deduplicator to avoid redundant work.
Backport of #123246 to `8.x`
* Bump json-smart and oauth2-oidc-sdk (#122737)
* Bump json-smart and oauth2-oidc-sdk
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit e16664573e)
# Conflicts:
# gradle/verification-metadata.xml
* fixup! Add back verification data for test dep
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate.
The _metric_names_hash field will be set by the OTel ES exporter.
As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics.
The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created.
That has an impact on the rate aggregation for counters.
This commit makes sure we reuse the existing static instance when deserializing to avoid excessive heap usage.
# Conflicts:
# server/src/main/java/org/elasticsearch/ingest/IngestStats.java
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.
Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.
---------
Conflicts:
server/src/main/java/org/elasticsearch/search/aggregations/bucket/BucketsAggregator.java
test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
* [8.x] ESQL: use field_caps native nested fields filtering (#117201) (#117375) (#121645)
* Just filter the nested fields natively with field_caps support
(cherry picked from commit 73381dbeb1)
* Add import
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.
With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
Adding `security_solution-*-*` in list of index nae to avoid the pattern collisions.
(cherry picked from commit 0638d3977a)
Co-authored-by: Smriti <152067238+smriti0321@users.noreply.github.com>
* Improve memory aspects of enrich cache (#120256)
This commit reduces the occupied heap space of the enrich cache and
corrects inaccuracies in tracking the occupied heap space (for cache
size limitation purposes).
---------
Co-authored-by: Joe Gallo <joegallo@gmail.com>
* Fix compilation
---------
Co-authored-by: Joe Gallo <joegallo@gmail.com>