Commit graph

18179 commits

Author SHA1 Message Date
Colleen McGinnis
ab5ff67bce
[docs] Add products to docset.yml (#128274)
* add products to docset.yml

* add page-level painless tags
2025-05-21 13:55:32 -05:00
Kathleen DeRusso
b335c1a8eb
Fix: Add NamedWriteable for RuleQueryRankDoc (#128153)
* Add NamedWriteable for QueryRule rank doc

* Update test

* Update docs/changelog/128153.yaml

* Add multi cluster test for query rules

* Commenting out code - explicitly trying to spur a test failure

* [CI] Auto commit changes from spotless

* Streamline test for multi cluster

* Revert changes to try to break test

* Fix compile error

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-05-21 13:19:05 -04:00
Liam Thompson
960222e0dc
[DOCS] Make ESQL functions/operators/commands overview accordions open by default (#128197) 2025-05-21 12:08:04 +02:00
kanoshiou
88b61e3621
ESQL: Avoid unintended attribute removal (#127563)
---------

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
2025-05-21 10:19:10 +03:00
Keith Massey
928fe1ee69
System data streams incorrectly show up in the list of template validation problems (#128161) 2025-05-20 16:14:57 +01:00
Fabrizio Ferri-Benedetti
d10ef76ba3
[DOCS] Replace irregular whitespaces in docs (#128199)
* Replace irregular whitespaces

* More chars
2025-05-20 16:20:22 +02:00
Pat Whelan
c0f5e00378
[Transform] Check alias during update (#124825)
When the Transform System Index has been reindexed and aliased, we
should check the Transform Update index against the alias when updating
the Transform Config.
2025-05-20 09:16:52 -04:00
Iván Cea Fontenla
07aff0a739
ESQL: Limit Replace function memory usage (#127924)
The Replace string result limit was fixed to 1MB, same as Repeat
2025-05-20 13:59:48 +01:00
kanoshiou
557f1f12b3
ESQL: Fix alias removal in regex extraction with JOIN (#127687)
* Disallow removal of regex extracted fields
---------

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-05-20 15:06:24 +03:00
Jeremy Dahlgren
4e7b99cc73
Add cancellation support in TransportGetAllocationStatsAction (#127371)
Replaces the use of a SingleResultDeduplicator by refactoring the cache as a
subclass of CancellableSingleObjectCache. Refactored the AllocationStatsService
and NodeAllocationStatsAndWeightsCalculator to accept the Runnable used to test
for cancellation.

Closes #123248
2025-05-20 07:25:19 -04:00
David Turner
18c60791c3
Make S3 custom query parameter optional (#128043)
Today Elasticsearch will record the purpose for each request to S3 using
a custom query parameter[^1]. This isn't believed to be necessary
outside of the ECH/ECE/ECK/... managed services, and it adds rather a
lot to the request logs, so with this commit we make the feature
optional and disabled by default.

[^1]:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/LogFormat.html#LogFormatCustom
2025-05-20 17:14:39 +10:00
Nhat Nguyen
c2561b5cba
Fix union types in CCS (#128111)
Currently, union types in CCS is broken. For example, FROM 
*:remote-indices | EVAL port = TO_INT(port) returns all nulls if the
types of the port field conflict. This happens because converters are a
map of the fully qualified cluster:index -name (defined in
MultiTypeEsField), but we are looking up the converter using only the
index name, which leads to a wrong or missing converter on remote
clusters. Our tests didn't catch this because MultiClusterSpecIT
generates the same index for both clusters, allowing the local converter
to be used for remote indices.
2025-05-19 22:47:30 -07:00
David Turner
a84dff876e
More efficient sort in tryRelocateShard (#128063)
No need to do this via an allocation-heavy `Stream`, we can just put the
objects straight into an array, sort them in-place, and keep hold of the
array to avoid having to allocate anything on the next iteration.

Also slims down `BY_DESCENDING_SHARD_ID`: it's always sorting the same
index so we don't need to look at `ShardId#index` in the comparison, nor
do we really need multiple layers of vtable lookups, we can just compare
the shard IDs directly.

Relates #128021
2025-05-20 05:45:31 +10:00
Jan-Kazlouski-elastic
d1ad917855
Add Hugging Face Chat Completion support to Inference Plugin (#127254)
* Add Hugging Face Chat Completion support to Inference Plugin

* Add support for streaming chat completion task for HuggingFace

* [CI] Auto commit changes from spotless

* Add support for non-streaming completion task for HuggingFace

* Remove RequestManager for HF Chat Completion Task

* Refactored Hugging Face Completion Service Settings, removed Request Manager, added Unit Tests

* Refactored Hugging Face Action Creator, added Unit Tests

* Add Hugging Face Server Test

* [CI] Auto commit changes from spotless

* Removed parameters from media type for Chat Completion Request and unit tests

* Removed OpenAI default URL in HuggingFaceService's configuration, fixed formatting in InferenceGetServicesIT

* Refactor error message handling in HuggingFaceActionCreator and HuggingFaceService

* Update minimal supported version and add Hugging Face transport version constants

* Made modelId field optional in HuggingFaceChatCompletionModel, updated unit tests

* Removed max input tokens field from HuggingFaceChatCompletionServiceSettings, fixed unit tests

* Removed if statement checking TransportVersion for HuggingFaceChatCompletionServiceSettings constructor with StreamInput param

* Removed getFirst() method calls for backport compatibility

* Made HuggingFaceChatCompletionServiceSettingsTests extend AbstractBWCWireSerializationTestCase for future serialization testing

* Refactored tests to use stripWhitespace method for readability

* Refactored javadoc for HuggingFaceService

* Renamed HF chat completion TransportVersion constant names

* Added random string generation in unit test

* Refactored javadocs for HuggingFace requests

* Refactored tests to reduce duplication

* Added changelog file

* Add HuggingFaceChatCompletionResponseHandler and associated tests

* Refactor error handling in HuggingFaceServiceTests to standardize error response codes and types

* Refactor HuggingFace error handling to improve response structure and add streaming support

* Allowing null function name for hugging face models

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co>
2025-05-19 17:37:19 +01:00
kanoshiou
54f26680ea
ESQL: Keep DROP attributes when resolving field names (#127009) 2025-05-19 18:41:13 +03:00
Ryan Ernst
d6ffe01122
Avoid nested docs in painless execute api (#127991)
Painless does not support accessing nested docs (except through
_source). Yet the painless execute api indexes any nested docs that are
found when parsing the sample document. This commit changes the ram
indexing to only index the root document, ignoring any nested docs.

fixes #41004
2025-05-19 08:18:09 -07:00
Johannes Fredén
acc8ae74af
Add Microsoft Graph Delegated Authorization Realm Plugin (#127910)
* Add Microsoft Graph Delegated Authorization Realm Plugin

* Update docs/changelog/127910.yaml
2025-05-19 14:15:28 +02:00
David Turner
20c02f430d
Set connection: close header on shutdown (#128025)
Lets clients using HTTP pipelining know to cease usage of connections to
shutting-down nodes.

Closes #127984
2025-05-19 06:14:46 +10:00
David Turner
cd1fa77990
Add missing entitlement to repository-azure (#128047)
This entitlement is required, but only if validating the metadata
endpoint against `https://login.microsoft.com/` which isn't something we
can do in a test. Kind of a SDK bug, we should be using an existing
event loop rather than spawning threads randomly like this.
2025-05-14 09:28:15 +01:00
Ian Wagner
d4b387c015
Minor subject/verb agreement fix (#127955)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-05-13 12:59:42 +01:00
István Zoltán Szabó
989032be19
Updates 9.0 breaking changes with ES|QL item. (#128038) 2025-05-12 13:19:30 -07:00
Nhat Nguyen
1609bb0d5a
Add emit time to hash aggregation status (#127988)
The hash aggregation operator may take time to emit the output pages, 
including keys and aggregated values. This change adds an emit_time
field to the status. While I considered including this in hash_nanos and
aggregation_nanos, having a separate section feels more natural. I am
open to suggestions.
2025-05-11 09:22:15 -07:00
Parker Timmins
c04a9569fe
Do not respect synthetic_source_keep=arrays if type parses arrays (#127796)
Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type.
2025-05-09 14:49:15 -05:00
Nik Everett
da553b11e3
Fix a bug in significant_terms (#127975)
Fix a bug in the `significant_terms` agg where the "subsetSize" array is
too small because we never collect the ordinal for the agg "above" it.

This mostly hits when the you do a `range` agg containing a
`significant_terms` AND you only collect the first few ranges. `range`
isn't particularly popular, but `date_histogram` is super popular and it
rewrites into a `range` pretty commonly - so that's likely what's really
hitting this - a `date_histogram` followed by a `significant_text` where
the matches are all early in the date range held by the shard.
2025-05-09 13:48:19 -04:00
Nhat Nguyen
2ea23da0af
Ensure ordinal builder emit ordinal blocks (#127949)
Currently, if a field has high cardinality, we may mistakenly disable 
emitting ordinal blocks. For example, with 10,000 `tsid` values, we
never emit ordinal blocks during reads, even though we could emit blocks
for 10 `tsid` values across 1,000 positions. This bug disables
optimizations for value aggregation and block hashing.

This change tracks the minimum and maximum seen ordinals and uses them 
as an estimate for the number of ordinals. However, if a page contains
`ord=1` and `ord=9999`, ordinal blocks still won't be emitted.
Allocating a bitset or an array for `value_count` could track this more
accurately but would require additional memory. I need to think about
this trade off more before opening another PR to fix this issue
completely.

This is a quick, contained fix that significantly speeds up time-series 
aggregation (and other queries too).

The execution time of this query is reduced from 3.4s to 1.9s with 11M documents.

```
POST /_query
{
    "profile": true,
    "query": "TS metrics-hostmetricsreceiver.otel-default
            | STATS cpu = avg(avg_over_time(`metrics.system.cpu.load_average.1m`)) BY host.name, BUCKET(@timestamp, 5 minute)"
}
```

```
"took": 3475,
"is_partial": false,
"documents_found": 11368089,
"values_loaded": 34248167
```

```
"took": 1965,
"is_partial": false,
"documents_found": 11368089,
"values_loaded": 34248167
```
2025-05-09 09:03:47 -07:00
Carlos Delgado
7ff25b5c86
ES|QL - Allow full text functions to be used in STATS ... WHERE (#125479) 2025-05-09 15:57:04 +02:00
Chris Hegarty
1ed02784f6
[9.x] Revert "Enable madvise by default for all builds (#110159)" (#127921)
9.x port of: Revert "Enable madvise by default for all builds (#110159)" #126308

This change did not apply cleanly. In fact this is not strictly a revert, since the change was never actually in 9.x post the Lucene 10 upgrade. However, the semantics of the change still apply - avoid RANDOM everywhere. Even though in 9.x we do set -Dorg.apache.lucene.store.defaultReadAdvice=normal, it is not enough to avoid RANDOM when random is explicitly requested by code.
2025-05-09 10:01:03 +01:00
Ryan Ernst
ab690ba23f
Check hidden frames in entitlements (#127877)
Entitlements do a stack walk to find the calling class. When method
refences are used in a lambda, the frame ends up hidden in the stack
walk. In the case of using a method reference with
AccessController.doPrivileged, the call looks like it is the jdk itself,
so the call is trivially allowed. This commit adds hidden frames to the
stack walk so that the lambda frame created for the method reference is
included. Several internal packages are then necessary to filter out of
the stack.
2025-05-08 16:59:03 -07:00
Nik Everett
3551494b9a
ESQL: text == and text != pushdown (#127355)
Reenables `text ==` pushdown and adds support for `text !=` pushdown.

It does so by making `TranslationAware#translatable` return something
we can turn into a tri-valued function. It has these values:
* `YES`
* `NO`
* `RECHECK`

`YES` means the `Expression` is entirely pushable into Lucene. They will
be pushed into Lucene and removed from the plan.

`NO` means the `Expression` can't be pushed to Lucene at all and will stay
in the plan.

`RECHECK` mean the `Expression` can push a query that makes *candidate*
matches but must be rechecked. Documents that don't match the query won't
match the expression, but documents that match the query might not match
the expression. These are pushed to Lucene *and* left in the plan.

This is required because `txt != "b"` can build a *candidate* query
against the `txt.keyword` subfield but it can't be sure of the match
without loading the `_source` - which we do in the compute engine.

I haven't plugged rally into this, but here's some basic
performance tests:
```
Before:
not text eq {"took":460,"documents_found":1000000}
    text eq {"took":432,"documents_found":1000000}

After:
    text eq {"took":5,"documents_found":1}
not text eq {"took":351,"documents_found":800000}    
```

This comes from:
```
rm -f /tmp/bulk*
for a in {1..1000}; do
    echo '{"index":{}}' >> /tmp/bulk
    echo '{"text":"text '$(printf $(($a % 5)))'"}' >> /tmp/bulk
done
ls -l /tmp/bulk*

passwd="redacted"
curl -sk -uelastic:$passwd -HContent-Type:application/json -XDELETE https://localhost:9200/test
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPUT https://localhost:9200/test -d'{
    "settings": {
        "index.codec": "best_compression",
        "index.refresh_interval": -1
    },
    "mappings": {
        "properties": {
            "many": {
                "enabled": false
            }
        }
    }
}'
for a in {1..1000}; do
    printf %04d: $a
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_bulk?pretty --data-binary @/tmp/bulk | grep errors
done
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_forcemerge?max_num_segments=1
curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST https://localhost:9200/test/_refresh
echo
curl -sk -uelastic:$passwd https://localhost:9200/_cat/indices?v

text_eq() {
    echo -n "    text eq "
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'https://localhost:9200/_query?pretty' -d'{
        "query": "FROM test | WHERE text == \"text 1\" | STATS COUNT(*)",
        "pragma": {
            "data_partitioning": "shard"
        }
    }' | jq -c '{took, documents_found}'
}

not_text_eq() {
    echo -n "not text eq "
    curl -sk -uelastic:$passwd -HContent-Type:application/json -XPOST 'https://localhost:9200/_query?pretty' -d'{
        "query": "FROM test | WHERE NOT text == \"text 1\" | STATS COUNT(*)",
        "pragma": {
            "data_partitioning": "shard"
        }
    }' | jq -c '{took, documents_found}'
}


for a in {1..100}; do
    text_eq
    not_text_eq
done
```
2025-05-08 10:00:05 -04:00
Craig Taverner
7d06f815f3
Initial kibana definition files for command, currently only providing License information (#127829)
Initial Kibana definition files for commands, currently only providing License information. We leave the license field out if it works with BASIC, so the only two files that actually have a license line are:

* CHANGE_POINT: PLATINUM
* RRF: ENTERPRISE
2025-05-08 09:39:34 +02:00
David Turner
85d9990d70
Replace auto-read with proper flow-control in HTTP pipeline (#127817)
Re-applying #126441 (cf. #127259) with:

- the extra `FlowControlHandler` needed to ensure one-chunk-per-read
  semantics (also present in #127259).

- no extra `read()` after exhausting a `Netty4HttpRequestBodyStream`
  (the bug behind #127391 and #127391).

See #127111 for related tests.
2025-05-08 17:35:10 +10:00
Mary Gouseti
077b6b949b
Skip the validation when retrieving the index mode during reindexing a time series data stream. (#127824)
During reindexing we retrieve the index mode from the template settings. However, we do not fully resolve the settings as we do when validating a template or when creating a data stream. This results on throwing the error reported in #125607.

I do not see a reason to not fix this as suggested in #125607 (comment).

Fixes: #125607
2025-05-08 10:25:53 +03:00
Mary Gouseti
e97efd264d
Change the handling of passthrough dimenensions (#127752)
When downsampling an index that has a mapping with passthrough dimensions the downsampling process identifies the wrapper object as a dimension and it fails when it tried to retrieve the type.

We did some prework to establish a shared framework in the internalClusterTest. For now it only includes setting up time series data stream helpers and a limited assertion helper for dimensions and metrics. This allows us to setup an internalClusterTest that captures this issue during downsampling in #125156.

To fix this we refine the check that determines if a field is dimension, to skip wrapper field.

Fixes #125156.
2025-05-08 09:04:41 +03:00
Jan Kuipers
9cf2a64067
ES|QL SAMPLE aggregation function (#127629)
* ES|QL SAMPLE aggregation function

* [CI] Auto commit changes from spotless

* ThreadLocalRandom -> SplittableRandom

* Update docs/changelog/127629.yaml

* fix yaml test

* Add SampleTests

* docs + example

* polish code

* mark generated imports

* comment with algorith description

* use Randomness.get()

* close properly

* type checks

* reuse hash

* regen some files

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-05-08 08:01:53 +02:00
Nhat Nguyen
7b87266d6c
Optimize ordinal inputs in Values aggregation (#127849)
Currently, time-series aggregations use the `values` aggregation to 
collect dimension values. While we might introduce a specialized
aggregation for this in the future, for now, we are using `values`, and
the inputs are likely ordinal blocks. This change speeds up the `values`
aggregation when the inputs are ordinal-based.

Execution time reduced from 461ms to 192ms for 1000 groups.

```
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  461.938 ± 6.089  ms/op
ValuesAggregatorBenchmark.run    BytesRef     10000  avgt    7  192.898 ± 1.781  ms/op
```
2025-05-07 18:24:27 -07:00
Jonathan Buttner
746b1df367
[ML] Fixing Google Vertex AI Rerank task type location field (#127856)
* Fixing rerank location

* Update docs/changelog/127856.yaml

* Refactor changelog
2025-05-07 17:47:38 -04:00
Oleksandr Kolomiiets
5d6dffaa51
Fix more typos in new text docs (#127855) 2025-05-08 06:20:08 +10:00
Charlotte Hoblik
d0e3af7990
[DOCS]: Add connector release notes page for 9.x (#127803)
* Add connector release notes page

* Add 9.0.0 release notes

* Add 9.0.1 Release notes

* Update docs/reference/search-connectors/release-notes.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Align IDs to MINOR_VERSION variable

* Update docs/reference/search-connectors/release-notes.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-05-07 17:29:11 +02:00
David Turner
4ba94c7dfd
Handle streaming request body in audit log (#127798)
The audit event for a successfully-authenticated REST request occurs
when we start to process the request. For APIs that accept a streaming
request body this means we have received the request headers, but not
its body, at the time of the audit event. Today such requests will fail
with a `ClassCastException` if the `emit_request_body` flag is set. This
change fixes the handling of streaming requests in the audit log to now
report that the request body was not available when writing the audit
entry.
2025-05-08 01:16:44 +10:00
Keith Massey
47909108f0
Adding a known issue for watcher during upgrade (#127834) 2025-05-07 10:09:24 -05:00
Arianna Laudazzi
afbd3319c1
[Reference] Revisit ES and index management landing page (#127571)
* Update landing page

* Fix links

* Update docs/reference/elasticsearch/index.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-05-07 15:56:57 +02:00
Iván Cea Fontenla
a2fc1caa32
ESQL: Specialize aggs AddInput for each block type (#127582)
* Specialize block parameters on AddInput

(cherry picked from commit a5855c1664)

* Call the specific add() methods for eacj block type

(cherry picked from commit 5176663f43)

* Implement custom add in HashAggregationOperator

(cherry picked from commit fb670bdbbc)

* Migrated everything to the new add() calls

* Update docs/changelog/127582.yaml

* Spotless format

* Remove unused ClassName for IntVectorBlock

* Fixed tests

* Randomize groupIds block types to check most AddInput cases

* Minor fix and added some docs

* Renamed BlockHashWrapper
2025-05-07 12:21:41 +02:00
Richard Dennehy
736e2e6eb7
add documentation for JWT realm proxy settings (#127605) 2025-05-07 10:31:31 +01:00
David Turner
d934a0c540
Reinstate use of S3 protocol client setting (#127744)
The `s3.client.CLIENT_NAME.protocol` setting became unused in #126843 as
it is inapplicable in the v2 SDK. However, the v2 SDK requires the
`s3.client.CLIENT_NAME.endpoint` setting to be a URL that includes a
scheme, so in #127489 we prepend a `https://` to the endpoint if needed.
This commit generalizes this slightly so that we prepend `http://` if
the endpoint has no scheme and the `.protocol` setting is set to `http`.
2025-05-07 10:02:22 +01:00
Alexander Spies
9e3ae5b224
ESQL: Document LU JOIN/MV_EXPAND not respecting SORT (#127718) 2025-05-07 10:59:48 +02:00
Craig Taverner
543aeb8c19
Output function signature license requirements to Kibana definitions (#127717)
Output function signature license requirements to Kibana definition files, and also test that this matches the actual licensing behaviour of the functions.

ES|QL functions that enforce license checks do so with the `LicenseAware` interface. This does not expose what that functions license level is, but only whether the current active license will be sufficient for that function and its current signature (data types passed in as fields). Rather than add to this interface, we've made the license level information test-only information. This means if a function implements LicenseAware, it also needs to add a method to its test class to specify the license level for the signature being called. All functions will be tested for compliance, so failing to add this will result in test failure. Also if the test license level does not match the enforced license, that will also cause a failure.
2025-05-07 10:02:17 +02:00
Arianna Laudazzi
1df4a90943
[Reference] Revisit query language landing page (#127632)
* Update query languauge landing page

* Update index.md

* Update docs/reference/query-languages/index.md

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2025-05-07 08:44:49 +02:00
Johannes Fredén
bb9d1d6232
Add Support for Providing a custom ServiceAccountTokenStore through SecurityExtensions (#126612)
* Add Project Service Account Auth
2025-05-07 08:13:39 +02:00
Arianna Laudazzi
e9fe219067
[Reference] Revisit scripting language landing page (#127675)
* Update scripting language landing page

* Update index.md
2025-05-07 08:02:12 +02:00
Arianna Laudazzi
d90121f048
Update es plugins landing page (#127682) 2025-05-07 07:51:22 +02:00