Commit graph

17076 commits

Author SHA1 Message Date
Nik Everett
3263429a78
ESQL: Speed up VALUES for many buckets (#123073) (#123229)
* ESQL: Speed up VALUES for many buckets (#123073)

Speeds up the VALUES agg when collecting from many buckets.
Specifically, this speeds up the algorithm used to `finish` the
aggregation. Most specifically, this makes the algorithm more tollerant
to large numbers of groups being collected. The old algorithm was
`O(n^2)` with the number of groups. The new one is `O(n)`

```
(groups)
      1     219.683 ±    1.069  ->   223.477 ±    1.990 ms/op
   1000     426.323 ±   75.963  ->   463.670 ±    7.275 ms/op
 100000   36690.871 ± 4656.350  ->  7800.332 ± 2775.869 ms/op
 200000   89422.113 ± 2972.606  -> 21920.288 ± 3427.962 ms/op
 400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op
```

The `1` group version was not changed at all. That's just noise in the
measurement. The small bump in the `1000` case is almost certainly worth
it and real. The huge drop in the `100000` case is quite real.

* Fix

* Compile
2025-02-27 07:35:57 +11:00
Ioana Tagirta
e40319c7a0
Remove references to doc types in percolator docs (#123508) (#123529) 2025-02-27 03:26:57 +11:00
David Turner
19402e2c68
Reduce licence checks in LicensedWriteLoadForecaster (#123369) (#123408)
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.

Backport of #123346 to 8.x
Closes #123247
2025-02-26 06:59:14 +11:00
Joe Gallo
b8f8723e6c
Register IngestGeoIpMetadata as a NamedXContent (#123079) (#123329) 2025-02-25 12:53:21 +11:00
Nik Everett
e12d7775e7
ESQL: Add known issue for slow VALUES (#123222)
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-02-24 16:41:09 +00:00
David Turner
cc3c3870ec
Deduplicate allocation stats calls (#123267) (#123280)
These things can be quite expensive and there's no need to recompute
them in parallel across all management threads as done today. This
commit adds a deduplicator to avoid redundant work.

Backport of #123246 to `8.x`
2025-02-25 03:33:42 +11:00
Oleksandr Kolomiiets
9cc75734d0
fix stale data in synthetic source for string stored field (#123105) (#123277)
Co-authored-by: jeffganmr <106223805+jeffganmr@users.noreply.github.com>
2025-02-25 03:26:32 +11:00
Johannes Fredén
33f973ba70
[8.16] Bump json-smart and oauth2-oidc-sdk (#122737) (#122915)
* Bump json-smart and oauth2-oidc-sdk (#122737)

* Bump json-smart and oauth2-oidc-sdk

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit e16664573e)

# Conflicts:
#	gradle/verification-metadata.xml

* fixup! Add back verification data for test dep
2025-02-19 09:54:53 +01:00
Felix Barnsteiner
bfd77c9485
Add _metric_names_hash field to OTel metric mappings (#120952) (#122881)
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate.
The _metric_names_hash field will be set by the OTel ES exporter.
As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics.
The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created.
That has an impact on the rate aggregation for counters.
2025-02-19 05:40:06 +11:00
Mike Pellegrini
4d408d4591
[8.16] Fix ArrayIndexOutOfBoundsException in ShardBulkInferenceActionFilter (#122538) (#122854)
* Fix ArrayIndexOutOfBoundsException in ShardBulkInferenceActionFilter (#122538)

(cherry picked from commit 229d392e63)

# Conflicts:
#	x-pack/plugin/inference/src/internalClusterTest/java/org/elasticsearch/xpack/inference/action/filter/ShardBulkInferenceActionFilterIT.java

* Fix compilation & test failures
2025-02-19 02:26:23 +11:00
Joe Gallo
a55e76936c
Fix redact processor arraycopy bug (#122640) (#122767) 2025-02-18 03:21:45 +11:00
Johannes Fredén
4f9c33f546
Improve jwt logging on failed auth (#122247) (#122784)
Update docs/changelog/122247.yaml
2025-02-18 03:18:57 +11:00
Joe Gallo
4b338f88ae
Canonicalize processor names and types in IngestStats (#122610) (#122633) 2025-02-15 05:38:06 +11:00
Ignacio Vera
963f2556e9
Deduplicate IngestStats and IngestStats.Stats identity records when deserializing (#122496) (#122516)
This commit makes sure we reuse the existing static instance when deserializing to avoid excessive heap usage.
# Conflicts:
#	server/src/main/java/org/elasticsearch/ingest/IngestStats.java
2025-02-13 18:36:56 +01:00
elasticsearchmachine
c0da9daf91 Prune changelogs after 8.16.4 release 2025-02-11 20:19:19 +00:00
elasticsearchmachine
8350b129ee Finalize release notes for v8.16.4 2025-02-12 06:00:19 +11:00
Luigi Dell'Aquila
622c3c924d
EQL: fix JOIN command validation (not supported) (#122011) (#122172) 2025-02-11 01:23:37 +11:00
elasticsearchmachine
17baef4d53
Update docs for v8.16.4 release (#122106) 2025-02-10 11:33:56 +01:00
Luigi Dell'Aquila
e1176cdfce
ES|QL: fix ENRICH validation for use of wildcards (#121911) (#122020) 2025-02-07 23:46:35 +11:00
Mark Tozzi
cf36d97a32
Aggregations cancellation after collection (#120944) (#121936)
This PR addresses issues around aggregations cancellation, mentioned in https://github.com/elastic/elasticsearch/issues/108701 and other places. In brief, during aggregations collection time, we respect cancellation via the mechanisms in the searcher to poison cancelled queries. But once the aggregation finishes collection, there is no further need to interact with the searcher, so we cannot rely on that for cancellation checking. In particular, deeply nested aggregations can spend a long time constructing the results tree.

Checking for cancellation is a trade off, as the check itself is somewhat expensive (it involves a volatile read), so we want to balance checking often enough that cancelled queries aren't taking up resources for a long time, but not so frequently that it slows down most aggregation queries. Our first attempt to this is to check once when we go to build sub-aggregations, as the worst cases for this that we've seen involve needing to build deep sub-aggregation trees. Checking at sub-aggregation construction time also provides a conveniently centralized method call to add the check to.

---------



 Conflicts:
	server/src/main/java/org/elasticsearch/search/aggregations/bucket/BucketsAggregator.java
	test/framework/src/main/java/org/elasticsearch/search/aggregations/AggregatorTestCase.java

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-02-07 06:51:21 +11:00
Andrei Stefan
bb77d4979e
ESQL: use field_caps native nested fields filtering (#121918)
* [8.x] ESQL: use field_caps native nested fields filtering (#117201) (#117375) (#121645)

* Just filter the nested fields natively with field_caps support

(cherry picked from commit 73381dbeb1)

* Add import
2025-02-06 19:39:53 +02:00
Oleksandr Kolomiiets
28635f09d8
[8.16] Fix synthetic source issue with deeply nested ignored source fields (#121715) (#121790)
* Fix synthetic source issue with deeply nested ignored source fields (#121715)

* Fix synthetic source issue with deeply nested ignored source fields

* Update docs/changelog/121715.yaml

* fix tests
2025-02-06 07:13:24 +11:00
Joe Gallo
24c39085ca
Update geolocation database documentation (#121472) (#121671) 2025-02-05 02:22:49 +11:00
Simon Cooper
9fa215a68f
[8.16] Update transport and index version id numbers to S_PP (#121380) (#121523)
Backport #121380 to 8.16
2025-02-03 13:56:48 +00:00
David Turner
12a39baef2
Cheaper snapshot-related toString() impls (#121283) (#121308)
If the `MasterService` needs to log a create-snapshot task description
then it will call `CreateSnapshotTask#toString`, which today calls
`RepositoryData#toString` which is not overridden so ends up calling
`RepositoryData#hashCode`. This can be extraordinarily expensive in a
large repository. Worse, if there's masses of create-snapshot tasks to
execute then it'll do this repeatedly, because each one only ends up
yielding a short hex string so we don't reach the description length
limit very easily.

With this commit we provide a more efficient implementation of
`CreateSnapshotTask#toString` and also override
`RepositoryData#toString` to protect against some other caller running
into the same issue.
2025-01-31 04:09:56 +11:00
Liam Thompson
13441bc9b1
Update recovery.asciidoc (#114889) (#121218)
(cherry picked from commit d8874b6524)

Co-authored-by: Paulo <paulletilly@gmail.com>
2025-01-30 04:45:20 +11:00
Liam Thompson
7e736e0def
[DOCS] Update getting-started.asciidoc (#116151) (#121172)
Update `new_field` to `language` which is the actual new field added in dynamic mapping

Co-authored-by: Ekwinder <ekwindersaini@gmail.com>
2025-01-30 00:51:21 +11:00
Valeriy Khakhutskyy
1538e0d29e
Extend documentation note. (#121146) (#121160) 2025-01-29 23:30:26 +11:00
István Zoltán Szabó
a201f549d2
[8.16] [DOCS] Documents that deployment_id can be used as inference_id in certain cases. (#121055) (#121072)
* [DOCS] Resolves conflict.

* Apply suggestions from code review
2025-01-28 21:56:22 +01:00
István Zoltán Szabó
cfddc26697
[DOCS] Resolves conflict. (#121069) 2025-01-28 21:07:05 +01:00
George Wallace
e7be978b3a
Adjusted alias doc for clarity (#120437) (#121063)
Co-authored-by: Kofi B <kofi.bartlett@elastic.co>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-01-29 03:51:38 +11:00
Panagiotis Bailis
c5a57fc690
[8.16] backporting fix for negative scores in text_similarity_ranker retriever (#121056) 2025-01-28 18:30:16 +02:00
Carlos Delgado
97c4bdca28
Fix incorrect use of "updateable" flag in synonyms documentation (#120866) (#121044)
Co-authored-by: Amine GANI <gani.amine@gmail.com>
Co-authored-by: Amine GANI <amine.gani@adelean.com>
2025-01-29 02:07:26 +11:00
Charlotte Hoblik
5d00b7e8cc
Fix typo in tutorial (#120928) (#121040) 2025-01-29 01:36:28 +11:00
Liam Thompson
80039d6d25
Update match-phrase-query.asciidoc (#118828) (#121035)
(cherry picked from commit 8e9cccba6a)

Co-authored-by: Damien RENIER <153135842+damien-renier-elastic@users.noreply.github.com>
2025-01-29 01:10:09 +11:00
Liam Thompson
abad04d97a
Update README.asciidoc (#96455) (#121027)
Co-authored-by: ARPIT SHARMA <93235104+ARPIT2128@users.noreply.github.com>
2025-01-28 15:01:01 +01:00
Pius Fung
e1c635b336
Add warning on scripted metric aggregation's intermediate state memory usage (#119379) (#121003) 2025-01-28 21:39:26 +11:00
Sean Story
46361e4d70
Clarify need to submit for authorization (#119460) (#121002) 2025-01-28 21:34:12 +11:00
Maxim Kholod
7fbe99db8a
Update index-templates.asciidoc (#113461) (#120893)
Adding `security_solution-*-*` in list of index nae to avoid the pattern collisions.

(cherry picked from commit 0638d3977a)

Co-authored-by: Smriti <152067238+smriti0321@users.noreply.github.com>
2025-01-27 12:30:07 +01:00
Aurélien FOUCRET
12ea3b2f64
[8.16] LTR - Fix explain failure when index has multiple shards (#120717) (#120794)
* LTR - Fix explain failure when index has multiple shards  (#120717)

* Fix test failing in 8.x branch.
2025-01-24 23:21:43 +01:00
Aurélien FOUCRET
149fbf215f
LTR sometines throw NullPointerException: Cannot read field "approximation" because "top" is null (#120809) (#120827)
* Add check on the DisiPriorityQueue size.

* Update docs/changelog/120809.yaml

* Add a unit test.
2025-01-25 06:15:42 +11:00
Niels Bauman
8adafb01d7
[8.16] Improve memory aspects of enrich cache (#120256) (#120762)
* Improve memory aspects of enrich cache (#120256)

This commit reduces the occupied heap space of the enrich cache and
corrects inaccuracies in tracking the occupied heap space (for cache
size limitation purposes).

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>

* Fix compilation

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2025-01-24 16:18:14 +11:00
Liam Thompson
8f58b770c3
Removes outdated admonition (#120556) (#120705)
Resolves /security-docs/https://github.com/elastic/security-docs/issues/6430. Removes an outdated admonition.

(cherry picked from commit 63074d8e70)

Co-authored-by: Benjamin Ironside Goldstein <91905639+benironside@users.noreply.github.com>
2025-01-23 23:42:35 +11:00
Marci W
ce90795b2d
[DOCS] Count API: clarify ways to specify search query (#120564) (#120681)
* Clarify query methods; other sprucing

* Apply suggestions from review
2025-01-23 10:31:10 +11:00
Andrei Stefan
faeeb31822
Update search-across-clusters.asciidoc to reflect the true default value of skip_unavailable setting. (#120592) (#120634) 2025-01-23 01:36:51 +11:00
Felix Barnsteiner
ae7ae7b9e4
Map scope.name as a dimension (#120590) (#120615) 2025-01-23 00:00:12 +11:00
elasticsearchmachine
24ff286e59 Finalize release notes for v8.16.3 2025-01-22 22:30:08 +11:00
elasticsearchmachine
943c61e335 Prune changelogs after 8.16.3 release 2025-01-21 16:32:28 +00:00
István Zoltán Szabó
2f1c2f82d4
[8.16] [DOCS] Rename inference services to inference integrations in docs (#120517)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2025-01-21 12:31:31 +01:00
Liam Thompson
ee18ffe583
[DOCS] Updated wording for clarity for new users (#120257) (#120506)
Co-authored-by: Kofi B <kofi.bartlett@elastic.co>
2025-01-21 20:30:37 +11:00