Commit graph

18066 commits

Author SHA1 Message Date
Jan Kuipers
bd1a638c03
ES|QL random sampling (#125570) 2025-04-24 01:48:07 +10:00
Chris Hegarty
19550a838f
Add dense vector off-heap stats to Node stats and Index stats APIs (#126704)
This change enhances the dense_vector section of the Nodes stats and Index stats APIs so that they report the desired size of off-heap memory for all indexed vectors. The dense_vector section of the Custer stats API remains unchanged.

The retrieval mechanism and structure of the new stats is the same across the various three stats APIs, but more fine-grained information is disclosed as when moving from Cluster -> Node -> Index API.

For Node stats, we aggregate the total byte sizes for all vectors, categorised by the data type. For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1
  }
}
Index stats: same as Node stats with included field break down . For example:

"dense_vector" : {
  "value_count" : 5,
  "off_heap" : {
    "total_size_in_bytes" : 27,
    "total_veb_size_in_bytes" : 3,
    "total_vec_size_in_bytes" : 23,
    "total_veq_size_in_bytes" : 0,
    "total_vex_size_in_bytes" : 1,
    "fielddata" : {
      "bar" : {
        "veb_size_in_bytes" : 3,
        "vec_size_in_bytes" : 14,
        "vex_size_in_bytes" : 1
      },
      "foo" : {
        "vec_size_in_bytes" : 9
      }
    }
  }
The implementation accesses the actual statistics through reflection. This will be completely removed when Lucene exposes this, which is expected in Lucene 10.3
2025-04-23 15:04:44 +01:00
kanoshiou
00654c8046
ESQL: Retain aggregate when grouping (#126598) 2025-04-23 15:40:04 +02:00
Patrick Doyle
76c2ab58c0
Collapse transport versions (#127186)
* Initial TV collapse based on TransportVersions.json

* [CI] Auto commit changes from spotless

* Fix ByteSizeValueTests.

This test had been using the version INITIAL_ELASTICSEARCH_9_0 as an example of
a version that used the older ByteSizeValue transport format. Now that no such
version exists anymore, it doesn't make sense to substitute a version that uses
the new format!

* Tips for collapsing transport versions

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-23 09:04:08 -04:00
Nik Everett
ef0a177d3a
ESQL: Disable a bugged commit (#127199)
The PR #126641 has a bug with `!=`.
2025-04-23 07:37:15 -04:00
Liam Thompson
2c2e9a5266
[DOCS][ESQL] Cleanup and cross-reference LOOKUP JOIN reference and landing pages (#127215)
* [DOCS][ESQL] Cleanup and cross-reference LOOKUP JOIN reference and landing pages

**lookup-join.md (syntax reference)**:
- removed tip formatting for simpler direct link to landing page
- improved parameter formatting and descriptions
- fixed template variable from `{esql}` to `{{esql}}`

**esql-lookup-join.md (landing page)**:
- added "compare with enrich" section header
- simplified "how the command works" with clearer parameter explanation
- added code example in how it works section
- improved image alt text for accessibility
- organized example section with better context and SQL comparison
- added dropdown for sample tables to reduce visual clutter
- added "query" subheading for clearer organization
- included reference to additional examples in command reference
- removed excessive whitespace

* Improve example, add setup code

replaced abstract employee/language example with security monitoring use case
added setup instructions for creating test indices
included sample data loading via bulk api
new practical query example joining firewall logs with threat data
simplified results output showing threat detection scenario
added note about left-join behavior
improved code comments and structure
added required index.mode: lookup setting info
2025-04-23 13:22:42 +02:00
István Zoltán Szabó
1e7c6abaf6
[DOCS] Fixes formatting issue on dense vector reference page. (#127214) 2025-04-23 11:24:17 +02:00
Ahmed Khan
98a3719e46
Update elasticsearch-keystore.md with special character handling and echo command to enter the password. (#127135)
* Update elasticsearch-keystore.md

Customer needs document update for handling special characters and how we can use the echo command to enter the password.

* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update elasticsearch-keystore.md

Moving the section out of Examples as advised.

* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/elasticsearch/command-line-tools/elasticsearch-keystore.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-04-23 09:40:38 +02:00
Nhat Nguyen
874a69727e
Emit ordinal output block for values aggregate (#127201)
Time-series aggregations rely heavily on the `values` aggregation for 
collecting grouping values. For example:

```
TS k8s | STATS max(rate(request)) BY host
```

is translated to:

```
TS k8s
| STATS rate(request), VALUES(host) BY _tsid
| STATS max(`rate(request)`) BY host=`VALUES(host)`
```

We might change how these are executed later, but for now, we need to 
optimize the `values` aggregation for `BytesRef`, especially in cases
with low cardinality. This change emits ordinal blocks as the output of
the `values` aggregation, allowing the second aggregation to execute
more efficiently. I will also open a PR to handle incoming ordinal
blocks for the `values` aggregation.
2025-04-22 22:30:39 -07:00
Nik Everett
b527e4b79e
ESQL: Push more ==s on text fields to lucene (#126641)
If you do:
```
| WHERE text_field == "cat"
```
we can't push to the text field because it's search index is for
individual words. But most text fields have a `.keyword` sub field and
we *can* query it's index. EXCEPT! It's normal for these fields to have
`ignore_above` in their mapping. In that case we don't push to the
field. Very sad.

With this change we can push down `==`, but only when the right hand
side is shorter than the `ignore_above`.

This has pretty much infinite speed gain. An example using a million
documents:
```
Before:  "took" : 391,
 After:  "took" :   4,
```

But this is going from totally un-indexed linear scans to totally
indexed. You can make the "Before" number as high as you want by loading
more data.
2025-04-22 21:24:59 +02:00
Nik Everett
c2fdc06465
ESQL: Fix sneaky bug in single value query (#127146)
Fixes a sneaky bug in single value query that happens when run against
a `keyword` field that:
* Is defined on every field
* Contains the same number of distinct values as documents

The simplest way to reproduce this is to build a single shard index
with two documents:
```
{"a": "foo"}
{"a": ["foo", "bar"]}
```

I don't think this is particularly likely in production, but it's quite
likely in tests. Which is where I hit this - in the serverless tests we
index an index with four documents into three shards and two of the
documents look just like this. So about 1/3 or the time we triggered
this bug.

Mechanically this is triggered by the `SingleValueMatchQuery`
incorrectly rewriting itself to `MatchAll` in the scenario above. This
fixes that.
2025-04-22 14:23:50 -04:00
David Turner
a5f935a352
Fix shard size of initializing restored shard (#126783)
For shards being restored from a snapshot we use `SnapshotShardSizeInfo`
to track their sizes while they're unassigned, and then use
`ShardRouting#expectedShardSize` when they start to recover. However we
were incorrectly ignoring the `ShardRouting#expectedShardSize` value
when accounting for the movements of shards in the
`ClusterInfoSimulator`, which would sometimes cause us to assign more
shards to a node than its disk space should have allowed.

Closes #105331
2025-04-23 03:08:06 +10:00
Charlotte Hoblik
838bb0bbd7
fix superscript (#127147) 2025-04-22 18:48:15 +02:00
Carlos Delgado
4d4b962fd1
Synonyms API - Add refresh parameter to check synonyms index and reload analyzers (#126935)
* Add timeout to SynonymsManagementAPIService put synonyms

* Remove replicas 0, as that may impact serverless

* Add timeout to put synonyms action, fix tests

* Fix number of replicas

* Remove cluster.health checks for synonyms index

* Revert debugging

* Add integration test for timeouts

* Use TimeValue instead of an int

* Add YAML tests and REST API specs

* Fix a validation bug in put synonym rule

* Spotless

* Update docs/changelog/126314.yaml

* Remove unnecessary checks for null

* Fix equals / HashCode

* Checks that timeout is passed correctly to the check health method

* Use correctly the default timeout

* spotless

* Add monitor cluster privilege to internal synonyms user

* [CI] Auto commit changes from spotless

* Add capabilities to avoid failing on bwc tests

* Replace timeout for refresh param

* Add param to specs

* Add YAML tests

* Fix changelog

* [CI] Auto commit changes from spotless

* Use BWC serialization tests

* Fix bug in test parser

* Spotless

* Delete doesn't need reloading 🤦 removing it

* Revert "Delete doesn't need reloading 🤦 removing it"

This reverts commit 9c8e0b62be.

* [CI] Auto commit changes from spotless

* Fix refresh for delete synonym rule

* Fix tests

* Update docs/changelog/126935.yaml

* Add reload analyzers test

* reload_analyzers is not available on serverless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-04-22 17:23:06 +02:00
Ievgen Degtiarenko
3a6963afd4
Retry shard movements during ESQL query (#126653) 2025-04-22 13:15:15 +02:00
Lorenzo Dematté
bc94cc12c2
Add entitlements known issues (#127061)
Add 2 known issues with workarounds for Entitlements.
2025-04-22 09:57:32 +02:00
Julio
76f6006a42
Update Elasticsearch main with snapshot version of Lucene (#127125) 2025-04-22 00:25:08 +02:00
kanoshiou
65c350cff2
ESQL: Preserve single aggregate when all attributes are pruned (#126397)
* Avoid using `EmptyAttribute`
2025-04-21 14:17:16 -04:00
George Wallace
b98a4fa067
Fixing external link (#127114) 2025-04-21 17:57:48 +02:00
weizijun
d854b1c625
Bugfix: fixed scroll with knn query (#126035)
Although scrolling is not recommended for knn queries, it is effective.
But I found a bug that when use scroll in the knn query, the But I found
a bug that when using scroll in knn query, knn_score_doc will be lost in
query phase, which means knn query does not work. In addition, the
operations for directly querying the node where the shard is located and
querying the node with transport are different. It can be reproduced on
the local node. Because the query phase uses the previous
ShardSearchRequest object stored before the dfs phase. But when it run
in the local node, it don't do the encode and decode processso the
operation is correct. I wrote an IT to reproduce it and fixed it by
adding the new source to the LegacyReaderContext.
2025-04-21 01:55:59 +10:00
Craig Taverner
f6a05c6a7c
Support depthOffset in MD docs headings for nesting functions (#126984)
While this change appears subtle at this point, I am using this in a later PR that adds a lot more spatial functions, where nesting them in related groups like this looks much better.

The main impact of this is that the On this page navigator on the right panel of the docs will show the nesting

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-04-19 11:28:05 +02:00
Kathleen DeRusso
e280aa5d50
Revert semantic_text model registry changes (#127075) 2025-04-18 18:36:33 -04:00
James Baiera
7b89f4d4a6
Add ability to redirect ingestion failures on data streams to a failure store (#126973)
Removes the feature flags and guards that prevent the new failure store functionality 
from operating in production runtimes.
2025-04-18 16:33:03 -04:00
Dianna Hohensee
72b4ed255b
Add to allocation architecture guide (#125328)
How master and data nodes communicate
about shard allocation
2025-04-18 14:56:27 -04:00
Joe Gallo
b46bee4e47
Correctly handle non-integers in nested paths in the remove processor (#127006) 2025-04-18 11:46:54 -04:00
Lorenzo Dematté
69f6520b0c
[Entitlements] Validation checks on paths (#126852)
With this PR we restrict the paths we allow access to, forbidding plugins to specify/request entitlements for reading or writing to specific protected directories.

I added this validation to EntitlementInitialization, as I wanted to fail fast and this is the earliest occurrence where we have all we need: PathLookup to resolve relative paths, policies (for plugins, server, agents) and the Paths for the specific directories we want to protect.

Relates to ES-10918
2025-04-18 15:36:07 +02:00
Lorenzo Dematté
b6c9584c28
[Entitlements] Add missing outbound_network entitlement to x-pack-core (#126992)
Add missing outbound_network entitlement to x-pack-core
Closes #127003
2025-04-18 10:19:51 +02:00
elasticsearchmachine
36af046441 Merge patch/serverless-fix into main 2025-04-18 04:30:44 +00:00
Brian Seeders
af6dac5c05
Revert "Forward port release notes for v8.17.5 (#127024)"
This reverts commit 66b504a881.
2025-04-17 16:16:21 -04:00
elasticsearchmachine
66b504a881
Forward port release notes for v8.17.5 (#127024) 2025-04-17 16:15:42 -04:00
Brian Seeders
2a243d8492
Revert #126441 Add flow-control and remove auto-read in netty4 HTTP pipeline (#127030)
* Revert "Release buffers in netty test (#126744)"

This reverts commit f9f3defe92.

* Revert "Add flow-control and remove auto-read in netty4 HTTP pipeline (#126441)"

This reverts commit c8805b85d2.
2025-04-17 12:37:26 -07:00
David Turner
7e62862eab
Clarify queues in thread pool settings (#127027)
The docs about the queue in a `fixed` pool are a little awkwardly
worded, and there is no mention of the queue in a `scaling` pool at all.
This commit cleans this area up.
2025-04-17 19:58:02 +01:00
Liam Thompson
b6c9b9b54d
[DOCS] Update URLs for ESQL Kibana generated docs (#127011) 2025-04-17 18:25:24 +02:00
Samiul Monir
afb83b7551
Updating text_similarity_reranker documentation (#127004)
* updating documentation to remove duplicate and redundant wording from 9.x

* Update links to rerank model landing page

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-04-17 11:54:19 -04:00
Luca Cavanna
f274ab7402
Remove empty results before merging (#126770)
We addressed the empty top docs issue with #126385 specifically for scenarios where
empty top docs don't go through the wire. Yet they may be serialized from data node
back to the coord node, in which case they will no longer be equal to Lucene#EMPTY_TOP_DOCS.

This commit expands the existing filtering of empty top docs to include also those that
did go through serialization.

Closes #126742
2025-04-17 17:36:20 +02:00
Kathleen DeRusso
a72883e8e3
Default new semantic_text fields to use BBQ when models are compatible (#126629)
* Default new semantic_text fields to use BBQ when models are compatible

* Update docs/changelog/126629.yaml

* Gate default BBQ by IndexVersion

* Cleanup from PR feedback

* PR feedback

* Fix test

* Fix test

* PR feedback

* Update test to test correct options

* Hack alert: Fix issue where mapper service was always being created with current index version
2025-04-17 08:25:10 -04:00
Nick Tindall
270ca0a80a
Add thread pool utilisation metric (#120363)
There are existing metrics for the active number of threads, but it seems tricky to go from those to a "utilisation" number because all the pools have different sizes.

This commit adds `es.thread_pool.{name}.threads.utilization.current` which will be published by all  `TaskExecutionTimeTrackingEsThreadPoolExecutor` thread pools (where `EsExecutors.TaskTrackingConfig#trackExecutionTime` is true).

The metric is a double gauge indicating what fraction (in [0.0, 1.0]) of the maximum possible execution time was utilised over the polling interval.

It's calculated as actualTaskExecutionTime / maximumTaskExecutionTime, so effectively a "mean" value. The metric interval is 60s so brief spikes won't be apparent in the measure, but the initial goal is to use it to detect hot-spotting so the 60s average will probably suffice.

Relates ES-10530
2025-04-17 11:49:30 +10:00
Tim Vernum
e53d3ff64b
Update docs to reflect removal of TLSv1.1 (#126892)
In ES9 and later, we do not enable TLSv1.1 by default,
even if the JDK supports it.

This updates the docs accordingly.

Relates: #121731
2025-04-17 10:15:29 +10:00
Julio
d19b525eb1
Temporarily bypass competitive iteration for filters aggregation (#12… (#126962)
* Temporarily bypass competitive iteration for filters aggregation (#126956)

* Bump versions after 9.0.0 release

* fix merge conflict

* Remove 8.16 from branches.json

* Bring version-bump related changes from main

* [bwc] Add bugfix3 project (#126880)

* Sync version bump changes from main again

---------

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: elasticsearchmachine <58790826+elasticsearchmachine@users.noreply.github.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
2025-04-16 18:10:01 -06:00
Benjamin Trent
b1f766258b
Temporarily bypass competitive iteration for filters aggregation (#126956) 2025-04-16 23:08:17 +02:00
Samiul Monir
2e1101cf5e
Updating text_similarity_reranker documentation (#126175)
* Updating text_similarity_reranker documentation

* Updating docs to include urls

* remove extra THE from the text

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-04-16 17:05:30 -04:00
Ryan Ernst
a813949c34
Fix uniquify to handle multiple successive duplicates (#126889)
CollectionUtils.uniquify is based on C++ std::unique. However, C++
iterators are not quite the same as Java iterators. In particular,
advancing them only allows grabbing the value once. This commit reworks
uniquify to be based on list indices instead of iterators.

closes #126883
2025-04-16 21:00:27 +02:00
Jonathan Buttner
7a0f63c1a0
[ML] Refactor inference request executor to leverage scheduled execution (#126858)
* Using threadpool schedule and fixing tests

* Update docs/changelog/126858.yaml

* Clean up

* change log
2025-04-16 14:14:02 -04:00
Jonathan Buttner
e42c118ec6
[ML] Adding missing onFailure call for Inference API start model request (#126930)
* Adding missing onFailure call

* Update docs/changelog/126930.yaml
2025-04-16 14:07:13 -04:00
Nik Everett
128144dd6d
ESQL: Add documents_found and values_loaded (#125631)
This adds `documents_found` and `values_loaded` to the to the ESQL response:
```json
{
  "took" : 194,
  "is_partial" : false,
  "documents_found" : 100000,
  "values_loaded" : 200000,
  "columns" : [
    { "name" : "a", "type" : "long" },
    { "name" : "b", "type" : "long" }
  ],
  "values" : [[10, 1]]
}
```

These are cheap enough to collect that we can do it for every query and
return it with every response. It's small, but it still gives you a
reasonable sense of how much work Elasticsearch had to go through to
perform the query.

I've also added these two fields to the driver profile and task status:
```json
    "drivers" : [
      {
        "description" : "data",
        "cluster_name" : "runTask",
        "node_name" : "runTask-0",
        "start_millis" : 1742923173077,
        "stop_millis" : 1742923173087,
        "took_nanos" : 9557014,
        "cpu_nanos" : 9091340,
        "documents_found" : 5,   <---- THESE
        "values_loaded" : 15,    <---- THESE
        "iterations" : 6,
...
```

These are at a high level and should be easy to reason about. We'd like to
extract this into a "show me how difficult this running query is" API one
day. But today, just plumbing it into the debugging output is good.

Any `Operator` can claim to "find documents" or "load values" by overriding
a method on its `Operator.Status` implementation:
```java
/**
 * The number of documents found by this operator. Most operators
 * don't find documents and will return {@code 0} here.
 */
default long documentsFound() {
    return 0;
}

/**
 * The number of values loaded by this operator. Most operators
 * don't load values and will return {@code 0} here.
 */
default long valuesLoaded() {
    return 0;
}
```

In this PR all of the `LuceneOperator`s declare that each `position` they
emit is a "document found" and the `ValuesSourceValuesSourceReaderOperator`
says each value it makes is a "value loaded". That's pretty pretty much
true. The `LuceneCountOperator` and `LuceneMinMaxOperator` sort of pretend
that the count/min/max that they emit is a "document" - but that's good
enough to give you a sense of what's going on. It's *like* document.
2025-04-16 17:15:25 +02:00
Lorenzo Dematté
115062c643
Fix vec_caps to test for OS support too (on x64) (#126911)
On x64, we are testing if we support vector capabilities (1 = "basic" = AVX2, 2 = "advanced" = AVX-512) in order to enable and choose a native implementation for some vector functions, using CPUID.

However, under some circumstances, this is not sufficient: the OS on which we are running also needs to support AVX/AVX2 etc; basically, it needs to acknowledge it knows about the additional register and that it is able to handle them e.g. in context switches. To do that we need to a) test if the CPU has xsave feature and b) use the xgetbv to test if the OS set it (declaring it supports AVX/AVX2/etc).

In most cases this is not needed, as all modern OSes do that, but for some virtualized situations (hypervisors, emulators, etc.) all the component along the chain must support it, and in some cases this is not a given.

This PR introduces a change to the x64 version of vec_caps to check for OS support too, and a warning on the Java side in case the CPU supports vector capabilities but those are not enabled at OS level.

Tested by passing noxsave to my linux box kernel boot options, and ensuring that the avx flags "disappear" from /proc/cpuinfo, and we fall back to the "no native vector" case.

Fixes #126809
2025-04-16 16:06:46 +02:00
Luca Cavanna
df83e881f9
Cancel expired async search task when a remote returns its results (#126583)
A while ago we enabled using ccs_minimize_roundtrips in async search.
This makes it possible for users of async search to send a single search
request per remote cluster, and minimize the impact of network latency.

With non minimized roundtrips, we have pretty recurring cancellation checks:
as part of the execution, we detect that a task expired whenever each shard comes
back with its results.

In a scenario where the coord node does not hold data, or only remote data is
targeted by an async search, we have much less chance of detecting cancellation
if roundtrips are minimized. The local coordinator would do nothing other than
waiting for the minimized results from each remote cluster.
One scenario where we can check for cancellation is when each cluster comes
back with its full set of results. This commit adds such check, plus some testing
for async search cancellation with minimized roundtrips.
2025-04-16 14:21:59 +02:00
Niels Bauman
5383f0fcdf
Fix PolicyStepsRegistry cache concurrency issue (#126840)
The following order of events was possible:
- An ILM policy update cleared `cachedSteps`
- ILM retrieves the step definition for an index, this populates `cachedSteps` with the outdated policy
- The updated policy is put in `lifecyclePolicyMap`

Any subsequent cache retrievals will see the old step definition.

By clearing `cachedSteps` _after_ we update `lifecyclePolicyMap`, we
ensure eventual consistency between the policy and the cache.

Fixes #118406
2025-04-16 13:58:12 +02:00
Liam Thompson
92148cfde3
[DOCS] Update esql-lookup-join.md to mention index mode requirement (#126901)
*  Update esql-lookup-join.md to mention index mode requirement

* fix 8.x page mapping metadata
2025-04-16 12:15:45 +02:00
Carson Ip
5860ccb113
[otel-data] Bump plugin version to release _metric_names_hash changes (#126850)
Bump otel-data plugin version as #120952 missed the bump.
2025-04-16 10:27:19 +01:00