Commit graph

17719 commits

Author SHA1 Message Date
Luca Cavanna
905222613a
Disable concurrency when top_hits sorts on anything but _score (#123610)
We already disable inter-segment concurrency in SearchSourceBuilder whenever
the top-level sort provided is not _score. We shoudl apply the same rules
in top_hits. We recenly stumbled upon non deterministic behaviour caused by
script sorting defined within top hits. That is to be expected given that
script sorting does not support search concurrency.

The sort script can be replaced with a runtime field, either defined in the
mapping or in the search request, which does support concurrency and guarantees
predictable behaviour.
2025-02-27 21:22:17 +01:00
Colleen McGinnis
b7e3a1e14b
[docs] Migrate docs from AsciiDoc to Markdown (#123507)
* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648d.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
2025-02-27 17:56:14 +01:00
Yang Wang
c7e7dbe904
Abort pending deletion on IndicesService stop (#123569)
When IndicesService is closed, the pending deletion may still be in
progress due to indices removed before IndicesService gets closed. If
the deletion stucks for some reason, it can stall the node shutdown.
This PR aborts the pending deletion more promptly by not retry after
IndicesService is stopped.

Resolves: #121717 Resolves: #121716  Resolves: #122119
2025-02-27 23:43:53 +11:00
Iván Cea Fontenla
ca5d251807
ESQL: Fix function registry concurrency issues on constructor (#123492)
Fixes https://github.com/elastic/elasticsearch/issues/123430

There were 2 problems here:
- We were filling a static field (used to auto-cast string literals) within a constructor, which is also called in multiple places
- The field was only filled with non-snapshot functions, so snapshot function auto-casting wasn't possible

Fixed both bugs by making the field non-static instead, and a fix to use the snapshot registry (if available) in the string casting rule.
2025-02-27 11:05:18 +01:00
Costin Leau
e4604a4432
ESQL: Reduce iteration complexity for plan traversal (#123427) 2025-02-26 08:30:58 -08:00
David Turner
4be53f50f7
Small resiliency status update (#123497) 2025-02-27 01:49:16 +11:00
Joe Gallo
af6014ecb5
Use ordered maps for PipelineConfiguration xcontent deserialization (#123403) 2025-02-25 15:20:01 -05:00
Keith Massey
88cf2487e7
Fixing serialization of ScriptStats cache_evictions_history (#123384) 2025-02-25 16:46:22 +00:00
Kathleen DeRusso
ae6474db63
Deprecate Behavioral Analytics CRUD apis (#122960)
* Deprecate Behavioral Analytics CRUD APIs

* Add allowed warning for REST Compatibility tests

* Update docs/changelog/122960.yaml

* Update changelog

* Update docs to add deprecation flags and fix failing tests

* Update changelog

* Update changelog again

* Update docs formatting

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Skip asciidoc test

---------

Co-authored-by: Efe Gürkan YALAMAN <efeyalaman@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Efe Gürkan YALAMAN <efeguerkan.yalaman@elastic.co>
2025-02-25 16:02:50 +01:00
Ying Mao
e8438490ea
Updates to allow using Cohere binary embedding response in semantic search queries. (#121827)
* wip

* wip

* [CI] Auto commit changes from spotless

* updating tests

* [CI] Auto commit changes from spotless

* Update docs/changelog/121827.yaml

* Updates after the refactor

* [CI] Auto commit changes from spotless

* Updating error message

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-02-25 09:14:20 -05:00
Martijn van Groningen
6c55099784
Store arrays offsets for ip fields natively with synthetic source (#122999)
Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.
2025-02-25 13:42:41 +00:00
David Turner
d0db4cd085
Reduce licence checks in LicensedWriteLoadForecaster (#123346)
Rather than checking the license (updating the usage map) on every
single shard, just do it once at the start of a computation that needs
to forecast write loads.

Closes #123247
2025-02-25 23:50:43 +11:00
Craig Taverner
ec82c24a87
Add support to VALUES aggregation for spatial types (#122886)
The original work at https://github.com/elastic/elasticsearch/pull/106065 did not support geospatial types with this comment:

> I made this work for everything but geo_point and cartesian_point because I'm not 100% sure how to integrate with those. We can grab those in a follow up.

The geospatial types should be possible to collect using the VALUES aggregation with similar behavior to the `ST_COLLECT` OGC function, based on the Elasticsearch convention that treats multi-value geospatial fields as behaving similarly to any geometry collection. So this implementation is a trivial addition to the existing values types support.
2025-02-25 11:38:51 +01:00
Joe Gallo
6315b8a8aa
Register IngestGeoIpMetadata as a NamedXContent (#123079) 2025-02-24 17:25:25 -05:00
Samiul Monir
5664f4f2ba
Improved error message when index field type is unknown (#122860)
* Updating error message when index field type is unknown

* Fix style issue

* Add yaml test for invalid field type error message

* Update docs/changelog/122860.yaml

* Updating error message for runtime and multi field type parser

* add and fix yaml tests

* Fix code styles by running spotlessApply

* Update changelog

* Updatig the test in yml

* Updating error message for runtime

* Fix failing yaml tests

* Update error message to Fix unit tests

* fix serverless qa test

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-02-24 13:16:22 -05:00
Pat Whelan
4c3ceae986
[ML] Set Connect Timeout to 5s (#123272)
Reduced connection timeout from infinite to a system configurable
setting that defaults to 5s.

Increased EIS auth token timeout from 30s to 1m.
2025-02-24 18:03:15 +00:00
Nhat Nguyen
4d2b8dc4f2
Fix early termination in LuceneSourceOperator (#123197)
The LuceneSourceOperator is supposed to terminate when it reaches the 
limit; unfortunately, we don't have a test to cover this. Due to this
bug, we continue scanning all segments, even though we discard the
results as the limit was reached. This can cause performance issues for
simple queries like FROM .. | LIMIT 10, when Lucene indices are on the
warm or cold tier. I will submit a follow-up PR to ensure we only
collect up to the limit across multiple drivers.
2025-02-24 08:49:54 -08:00
Andrei Dan
760b2312ea
Periodically check the available memory when fetching search hits source (#121920)
When fetching documents, sometimes we need to load the  entire source of
search hits. Document sources can be large,  and with support for up to
10k hits per search request, this creates  a significant untracked
memory load on Elasticsearch that can  potentially cause out-of-memory
errors.

This PR adds memory checking for hits source in the fetch phase. We
check with the parent (the real memory) circuit breaker every  1MiB of
loaded source and when fetching the last document of every segment. This
gives the real memory breaker a chance to interrupt  running operations
when we're running low on memory, and prevent  potential OOMs. 

The amount of local accounting to buffer is controlled by the 
`search.memory_accounting_buffer_size` dynamic setting and defaults to
`1MiB`.

Fixes #89656
2025-02-25 03:25:14 +11:00
jeffganmr
22103de150
fix stale data in synthetic source for string stored field (#123105) 2025-02-24 07:22:48 -08:00
Iván Cea Fontenla
c40c5a6c0a
ESQL: Fix functions emitting warnings with no source (#122821)
Fixes https://github.com/elastic/elasticsearch/issues/122588

- Replaced `Source.EMPTY.writeTo(out)` to `source().writeTo(out)` in functions emitting warnings
  - Did the same on all aggs, as Top emits an error on type resolution. This is not a bug, as type resolution errors should only happen in the coordinator. Another option would be changing Top to not generate that error there, and make it implement instead `PostAnalysisVerificationAware`
- In some cases, we don't even serialize an empty source. So I had to add a new `TransportVersion` to do so
  - As an special case, `ToLower` and `ToUpper` weren't serializing a source, but they don't emit warnings. As they were the only remaining functions not serializing the source, I added it there too
2025-02-24 13:52:41 +00:00
David Turner
187b192dfe
Deduplicate allocation stats calls (#123246)
These things can be quite expensive and there's no need to recompute
them in parallel across all management threads as done today. This
commit adds a deduplicator to avoid redundant work.
2025-02-25 00:21:10 +11:00
Nik Everett
67293ba8f4
ESQL: Speed up VALUES for many buckets (#123073)
Speeds up the VALUES agg when collecting from many buckets.
Specifically, this speeds up the algorithm used to `finish` the
aggregation. Most specifically, this makes the algorithm more tollerant
to large numbers of groups being collected. The old algorithm was
`O(n^2)` with the number of groups. The new one is `O(n)`

```
(groups)
      1     219.683 ±    1.069  ->   223.477 ±    1.990 ms/op
   1000     426.323 ±   75.963  ->   463.670 ±    7.275 ms/op
 100000   36690.871 ± 4656.350  ->  7800.332 ± 2775.869 ms/op
 200000   89422.113 ± 2972.606  -> 21920.288 ± 3427.962 ms/op
 400000 timed out at 10 minutes -> 40051.524 ± 2011.706 ms/op
```

The `1` group version was not changed at all. That's just noise in the
measurement. The small bump in the `1000` case is almost certainly worth
it and real. The huge drop in the `100000` case is quite real.
2025-02-23 18:29:55 +00:00
Sam Xiao
4233310846
Add health indicator impact to HealthPeriodicLogger (#122390) 2025-02-21 17:06:25 -05:00
Alexey Ivanov
2bda4c1fa8
Converting an Existing Data Stream to a System DataStream is Broken (#121392)
Adds support of converting existing data stream to a system data stream as part of existing system_index_metadata_upgrade_service task
2025-02-21 19:50:57 +00:00
Pat Whelan
bd52363bde
[ML] Add ElasticInferenceServiceCompletionServiceSettings (#123155)
Adding the missing NamedWriteable to the registry.
2025-02-21 12:27:12 -05:00
Costin Leau
21845ad7a1
ESQL: Remove duplicated nested commands (#123085)
Fork grammar duplicated nested command declaration causing additional
 lexing to occur resulting in invalid field name declaration

Relates to #121948
2025-02-21 06:56:09 -08:00
Joe Gallo
a8958755a7
Fix geoip databases index access after system feature migration (again) (#122938) 2025-02-21 08:00:10 -05:00
Martijn van Groningen
8d1f5d3223
Hold store reference in InternalEngine#performActionWithDirectoryReader(...) (#123010)
This method gets called from `InternalEngine#resolveDocVersion(...)`, which gets during indexing (via `InternalEngine.index(...)`).

When `InternalEngine.index(...)` gets invoked, the InternalEngine only ensures that it holds a ref to the engine via Engine#acquireEnsureOpenRef(), but this doesn't ensure whether it holds a reference to the store.

Closes #122974

* Update docs/changelog/123010.yaml
2025-02-21 11:48:21 +01:00
Fang Xing
412e6c2b39
[ES|QL] Implicit numeric casting for CASE/GREATEST/LEAST (#122601)
* implicit numeric casting for conditional functions
2025-02-20 22:20:49 -05:00
kanoshiou
de41d5704b
ESQL: Fix precision of scaled_float field values retrieved from stored source (#122586) 2025-02-20 14:01:34 -08:00
fzowl
521f8554c3
feat: VoyageAI integration (#122134)
* VoyageAI embeddings and rerank:
 - embeddings works, tested
 - initial rerank code

What's missing:
 - unit and integration tests
 - rerank request/response mapping and verification

* VoyageAI embeddings and rerank:
 - embeddings works, tested
 - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank)

What's missing:
 - unit and integration tests

* VoyageAI embeddings and rerank:
 - embeddings works, tested
 - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank)

What's missing:
 - unit and integration tests

* VoyageAI embeddings and rerank:
 - embeddings works, tested
 - rerank works, tested (https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank)

What's missing:
 - unit and integration tests

* Adding initial tests
Moving dimensions to ServiceSettings

* Correcting the TransportVersions.java

* Correcting due to comments

* Adding BIT support

* Initial tests

* More tests

* More tests/corrections

* Removing warnings

* Further tests

* Transport version correction

* Adding changelog and correcting TransportVersions

* Spotless tests

* Changes due to the comments

* Changes due to the comments

* Correcting QA tests

* Correcting QA tests

---------

Co-authored-by: Jonathan Buttner <jonathan.buttner@elastic.co>
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
2025-02-20 16:11:58 -05:00
Ruben van Staden
171a3b93f9
apm-data: use representative count as event.success_count if available (#119995) 2025-02-20 14:45:06 -05:00
Dan Rubinstein
99897b1b39
Add enterprise license check to inference action for semantic text fields (#122293)
* Add enterprise license check to inference action for semantic text fields

* Update docs/changelog/122293.yaml

* Set license to trial in ShardBulkInferenceActionFilterIT

* Move license check to only block semantic_text fields that require inference call

* Cleaning up tests

* Add parameterization on useLegacyFormat back in ShardBulkInferenceActionFilterBasicLicenseIT

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-02-20 14:06:40 -05:00
Kostas Krikellas
5c129786f1
Use min node version to guard injecting settings in logs provider (#123005)
* Use min node version to guard injecting settings in logs provider

* Update docs/changelog/123005.yaml

* no random in cluster init
2025-02-20 18:31:16 +02:00
Keith Massey
41dae025e7
Updating TransportRolloverAction.checkBlock so that non-write-index blocks do not prevent data stream rollover (#122905) 2025-02-20 17:20:44 +01:00
Nhat Nguyen
091ea9aa1d
Support partial results in CCS in ES|QL (#122708)
A follow-up to #121942 that adds support for partial results in CCS in ES|QL.

Relates #121942
2025-02-20 07:27:32 -08:00
Luke Whiting
e3792d19b5
Allow data stream reindex tasks to be re-run after completion (#122510)
* Allow data stream reindex tasks to be re-run after completion

* Docs update

* Update docs/reference/migration/apis/data-stream-reindex.asciidoc

Co-authored-by: Keith Massey <keith.massey@elastic.co>

---------

Co-authored-by: Keith Massey <keith.massey@elastic.co>
2025-02-20 15:03:51 +00:00
Ioana Tagirta
a26b596cbd
ES|QL: Initial grammar and changes for FORK (snapshot) (#121948)
* Grammar changes

* Generate grammar changes

* Fork planning

* Fix field resolution

* Cleanup

* Add CSV tests

* Update docs/changelog/121948.yaml

* [CI] Auto commit changes from spotless

* fix forbidden apis

* javadoc

* remove serialization of fork and Merge

* fix equality

* fix EsqlNodeSubclassTests

* add statement parser tests

* remove unnecessary serialization

* automatic fork branch ids start at 1

* add analyzer test

* more tests

* more tests

* minor itr

* replace [] with ()

* move fork eval to initial logical plan

* simplify MergeOperator finished state

* enable CVS tests

* rework Fork to use StubRelation and Merge to be Nary

* reverts

* fail hard if not LocalSourceExec

* spotless

* no fork in fork yet

* itr

* itr

* itr

* fix EsqlNodeSubclassTests

* more tests and restrict NESTED_XX to snapshot

* fix method name

* check for fork cap before testing ForkIT

* Move fork id alias logic to parser

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com>
2025-02-20 13:25:08 +01:00
David Turner
cdaa5dd7ad
Clarify breaking change note for #112903 (#122998)
Closes #122994
2025-02-20 12:11:56 +00:00
Larisa Motova
e4ee91a08a
[ES|QL] Render aggregate_metric_double (#122660)
This commit allows users to read aggregate_metric_double fields from
indices in ES|QL, with any subset of metrics.
2025-02-19 22:38:49 -10:00
Martijn van Groningen
43665f0a35
Store arrays offsets for keyword fields natively with synthetic source (#113757)
The keyword doc values field gets an extra sorted doc values field, that encodes the order of how array values were specified at index time. This also captures duplicate values. This is stored in an offset to ordinal array that gets zigzag vint encoded into a sorted doc values field.

For example, in case of the following string array for a keyword field: ["c", "b", "a", "c"].
Sorted set doc values: ["a", "b", "c"] with ordinals: 0, 1 and 2. The offset array will be: [2, 1, 0, 2]

Null values are also supported. For example ["c", "b", null, "c"] results into sorted set doc values: ["b", "c"] with ordinals: 0 and 1. The offset array will be: [1, 0, -1, 1]

Empty arrays are also supported by encoding a zigzag vint array of zero elements.

Limitations:

currently only doc values based array support for keyword field mapper.
multi level leaf arrays are flattened. For example: [[b], [c]] -> [b, c]
arrays are always synthesized as one type. In case of keyword field, [1, 2] gets synthesized as ["1", "2"].
These limitations can be addressed, but some require more complexity and or additional storage.

With this PR, keyword field array will no longer be stored in ignored source, but array offsets are kept track of in an adjacent sorted doc value field. This only applies if index.mapping.synthetic_source_keep is set to arrays (default for logsdb).
2025-02-20 09:20:49 +01:00
Keith Massey
463dc4a8a5
Updates the deprecation info API to not warn about system indices and data streams (#122951) 2025-02-19 15:30:17 -06:00
Dan Rubinstein
bea8df3c8e
Adding endpoint creation validation to ElasticInferenceService (#117642)
* Adding endpoint creation validation to ElasticInferenceService

* Fix unit tests

* Update docs/changelog/117642.yaml

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2025-02-19 12:24:21 -05:00
Dianna Hohensee
2bfc700683
DesiredBalanceReconciler always returns AllocationStats (#122458)
Ensures that the DesiredBalanceReconciler always returns a non-empty
AllocationStats object, eliminating edge cases where the stats
available to DesiredBalanceMetrics may not be updated due to some
kind of throttling or the balancer being disabled via cluster
settings.

Adds documentation around
AllocationDecider#canRebalance(RoutingAllocation)

Closes ES-10581
2025-02-19 10:34:24 -05:00
David Turner
cd15d09adf
Fork post-snapshot-delete cleanup off master thread (#122731)
We shouldn't run the post-snapshot-delete cleanup work on the master
thread, since it can be quite expensive and need not block subsequent
cluster state updates. This commit forks it onto a `SNAPSHOT` thread.
2025-02-19 21:02:27 +11:00
Niels Bauman
c65596b62e
Run TransportGetWatcherSettingsAction on local node (#122857)
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
2025-02-19 08:15:00 +01:00
Lee Hinman
2ae80c799d
Allow setting the type in the reroute processor (#122409)
* Allow setting the `type` in the reroute processor

This allows configuring the `type` from within the ingest `reroute` processor. Similar to `dataset`
and `namespace`, the type defaults to the value extracted from the index name. This means that
documents sent to `logs-mysql.access.default` will have a default value of `logs` for the type.

Resolves #121553

* Update docs/changelog/122409.yaml
2025-02-18 12:38:00 -07:00
Dianna Hohensee
befc6a03e3
Start the allocation architecture guide section (#121940)
This is a high-level overview of the main rebalancing components and
how they interact to move shards around the cluster, and decide where
shards should go.

Relates ES-10423
2025-02-18 13:33:39 -05:00
Felix Barnsteiner
5e8865deac
Add _metric_names_hash field to OTel metric mappings (#120952)
If metrics that have the same timestamp and dimensions aren't grouped into the same document, ES will consider them to be a duplicate.
The _metric_names_hash field will be set by the OTel ES exporter.
As it's mapped as a time_series_dimensions, it creates a different _tsid for documents with different sets of metrics.
The tradeoff is that if the composition of the metrics grouping changes over time, a different _tsid will be created.
That has an impact on the rate aggregation for counters.
2025-02-18 18:30:37 +01:00
Oleksandr Kolomiiets
ba8c5764f8
Use FallbackSyntheticSourceBlockLoader for unsigned_long and scaled_float fields (#122637) 2025-02-18 09:28:26 -08:00