Commit graph

18047 commits

Author SHA1 Message Date
Benjamin Trent
b2c1c4e0f0
New vector_rescore parameter as a quantized index type option (#124581)
This adds a new parameter to the quantized index mapping that allows
default oversampling and rescoring to occur. 

This doesn't adjust any of the defaults. It allows it to be configured.
When the user provides `rescore_vector: {oversample: <number>}` in the
query it will overwrite it.

For example, here is how to use it with bbq:

```
PUT rescored_bbq
{
  "mappings": {
    "properties": {
      "vector": {
        "type": "dense_vector",
        "index_options": {
          "type": "bbq_hnsw",
          "rescore_vector": {"oversample": 3.0}
        }
      }
    }
  }
}
```

Then, when querying, it will auto oversample the `k` by `3x` and rerank
with the raw vectors.

```
POST _search
{
  "knn": {
    "query_vector": [...],
    "field": "vector"
  }
}
```
2025-03-14 00:40:08 +11:00
Craig Taverner
d5ddb909a4
ESQL autogenerate docs v3 (#124312)
Building on the work started in https://github.com/elastic/elasticsearch/pull/123904, we now want to auto-generate most of the small subfiles from the ES|QL functions unit tests.

This work also investigates any remaining discrepancies between the original asciidoc version and the new markdown, and tries to minimize differences so the docs do not look too different.

The kibana json and markdown files are moved to a new location, and the operator docs are a little more generated than before (although still largely manual).
2025-03-13 14:16:46 +01:00
Slobodan Adamović
cac356ae64
Disable queryable built-in feature in docs YAML tests (#124684)
The .security index is created asynchronously on a cluster startup. This
affects some of the docs YAML tests in a way that they need to account
for the existence of the .security index or wait for the index to be
created and green. This PR disables the feature for docs YAML tests.
Disabling the feature in docs YAML tests will solve the flakiness
without affecting the coverage.

Resolves https://github.com/elastic/elasticsearch/issues/122343 Resolves
https://github.com/elastic/elasticsearch/issues/121748 Resolves
https://github.com/elastic/elasticsearch/issues/121611 Resolves
https://github.com/elastic/elasticsearch/issues/121345 Resolves
https://github.com/elastic/elasticsearch/issues/121338 Resolves
https://github.com/elastic/elasticsearch/issues/121337 Resolves
https://github.com/elastic/elasticsearch/issues/121288 Resolves
https://github.com/elastic/elasticsearch/issues/121287 Resolves
https://github.com/elastic/elasticsearch/issues/121867 Resolves
https://github.com/elastic/elasticsearch/issues/122335 Resolves
https://github.com/elastic/elasticsearch/issues/122681 Resolves
https://github.com/elastic/elasticsearch/issues/121976 Resolves
https://github.com/elastic/elasticsearch/issues/123094 Resolves
https://github.com/elastic/elasticsearch/issues/123192 Resolves
https://github.com/elastic/elasticsearch/issues/122983 Resolves
https://github.com/elastic/elasticsearch/issues/124671 Resolves
https://github.com/elastic/elasticsearch/issues/124103
2025-03-13 23:13:45 +11:00
Jan Kuipers
a503497bce
Add max.chunks to EmbeddingRequestChunker to prevent OOM (#123150)
* add max number of chunks

* wire merge function

* implement sparse merge function

* move tests to correct package/file

* float merge function

* bytes merge function

* more accurate byte average

* spotless

* Fix/improve EmbeddingRequestChunkerTests

* Remove TODO

* remove unnecessary field

* remove Chunk generic

* add TODO

* Remove specialized chunks

* add comment

* Update docs/changelog/123150.yaml

* update changelog
2025-03-13 11:38:12 +01:00
Martijn van Groningen
ce3a778fa1
Improve downsample performance by buffering docids and do bulk processing. (#124477) 2025-03-13 07:46:08 +01:00
Andrei Stefan
c48f9a9e1c
ESQL: Change the order of the optimization rules (#124335) 2025-03-13 07:45:37 +02:00
Nick Tindall
74d61a4052
Retry when the server can't be resolved (#123852) 2025-03-13 12:38:04 +11:00
Joe Gallo
d565304f4b
Fix geoip databases index access after system feature migration (take 3) (#124604) 2025-03-12 14:03:57 -04:00
Tommaso Teofili
c971d79a95
Let MLTQuery throw IAE when no analyzer is set (#124662)
* Let MLTQuery throw IAE when no analyzer is set
2025-03-12 18:37:31 +01:00
Charlotte Hoblik
9e754ec8f6
[DOCS] Plugin management reference cleanup (#124578)
* add content to plugin management

* add content to Plugin Management

* Update docs/reference/elasticsearch-plugins/plugin-management.md

Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>

* fix applies-to tag

* add ech to docset.yml

---------

Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>
2025-03-12 17:01:10 +01:00
Valeriy Khakhutskyy
44fba7213d
[ML] Provide model_size_stats as soon as an anomaly detection job is opened (#124638)
Fixes #121168
2025-03-12 16:57:58 +01:00
Pat Whelan
9f89a3b318
[ML] Integrate with DeepSeek API (#122218)
Integrating for Chat Completion and Completion task types, both calling
the chat completion API for DeepSeek.
2025-03-12 15:24:39 +01:00
Nik Everett
50aaa1c2a6
ESQL: Pragma to load from stored fields (#122891)
This creates a `pragma` you can use to request that fields load from a
stored field rather than doc values. It implements that pragma for
`keyword` and number fields.

We expect that, for some disk configuration and some number of fields,
that it's faster to load those fields from _source or stored fields than
it is to use doc values. Our default is doc values and on my laptop it's
*always* faster to use doc values. But we don't ship my laptop to every
cluster.

This will let us experiment and debug slow queries by trying to load
fields a different way.

You access this pragma with:
```
curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
    "query": "FROM foo",
    "pragma": {
        "field_extract_preference": "STORED"
    }
}'
```

On a release build you'll need to add `"accept_pragma_risks": true`.
2025-03-12 09:40:42 -04:00
Mridula
f6538e86e2
Prevent Query Rule Creation with Invalid Numeric Match Criteria (#122823)
* SEARCH-802 - bug fixed - Query rules allows for creation of rules with invalid match criteria

* [CI] Auto commit changes from spotless

* Worked on the comments given in the PR

* [CI] Auto commit changes from spotless

* Fixed Integration tests

* [CI] Auto commit changes from spotless

* Made changes from the PR

* Update docs/changelog/122823.yaml

* [CI] Auto commit changes from spotless

* Fixed the duplicate code issue in queryRuleTests

* Refactored code to clean it up based on PR comments

* [CI] Auto commit changes from spotless

* Logger statements were removed

* Cleaned up the QueryRule tests

* [CI] Auto commit changes from spotless

* Update x-pack/plugin/ent-search/src/test/java/org/elasticsearch/xpack/application/EnterpriseSearchModuleTestUtils.java

Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2025-03-12 13:56:13 +01:00
Tim Grein
0b83425d17
[Inference API] Propagate product use case http header to EIS (#124025) 2025-03-12 12:48:24 +01:00
Mikhail Berezovskiy
053b037a9b
GCS blob store: add OperationPurpose/Operation stats counters (#122991) 2025-03-11 17:57:15 -07:00
kanoshiou
deff3df9f0
ES|QL: Support ::date in inline cast (#123460)
* Inline cast to date

* Update docs/changelog/123460.yaml

* New capability for `::date` casting

* More tests

* Update tests

---------

Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>
2025-03-11 17:08:10 -04:00
Iván Cea Fontenla
2fff041077
ESQL: Push down StartsWith and EndsWith functions to Lucene (#123381)
Fixes https://github.com/elastic/elasticsearch/issues/123067

Just like WildcardLike and RLike, some functions can be converted to Lucene queries. Here it's those two, which are nearly identical to WildcardLike

This, like some other functions, needs a FoldContext. I'm using the static method for this here, but it's fixed in https://github.com/elastic/elasticsearch/pull/123398, which I kept separated as it changes many files
2025-03-11 19:14:05 +01:00
Simon Cooper
d8e889acb6
Restore TextSimilarityRankBuilder XContent output (#124564) 2025-03-11 16:03:04 +00:00
Mark Tozzi
3e949479d8
ESQL - Include thread names in profile output (#124262)
Resolves #123053

This adds the thread name to the driver sleep profile output.
---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-03-11 15:53:22 +01:00
Carlos Delgado
2b40e73fe9
ES|QL - Add scoring for full text functions disjunctions (#121793) 2025-03-11 15:29:15 +01:00
Johannes Fredén
e11d89d76b
Bump nimbus-jose-jwt to 10.0.2 (#124544)
This bumps nimbus-jose-jwt from 10.0.1 -> 10.0.2
2025-03-12 00:23:33 +11:00
Jan Calanog
435d1db5b9
Remove subs attribute (#124551) 2025-03-11 12:14:58 +01:00
David Kyle
444b8eab75
[ML] Avoid potentially throwing calls to Task#getDescription in model download 2025-03-11 09:48:07 +00:00
Ioana Tagirta
cda82554aa
ES|QL: Add initial grammar and planning for RRF (snapshot) (#123396) 2025-03-11 10:18:11 +01:00
Costin Leau
2761af000b
ESQL: Lazy collection copying during node transform (#124424)
* ESQL: Lazy collection copying during node transform

A set of optimization for tree traversal:
1. perform lazy copying during children transform
2. use long hashing to avoid object creation
3. perform type check first before collection checking

Relates #124395
2025-03-10 16:11:47 -07:00
Luca Cavanna
def4c890bc
Fix concurrency issue in ScriptSortBuilder (#123757)
Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request.

The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc.

The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.
2025-03-10 21:10:53 +01:00
Nhat Nguyen
79a1626160
Speed up block serialization (#124394)
Currently, we use NamedWriteable for serializing blocks. While 
convenient, it incurs a noticeable performance penalty when pages 
contain thousands of blocks. Since block types are small and already
centered in ElementType, we can safely switch from NamedWriteable to
typed code. For example, the NamedWriteable alone of a small page with
10K fields would be 180KB, whereas the new method reduces it to 10KB.
Below are the serialization improvements with FROM idx | LIMIT 10000
where the target index has 10K fields:

- write_exchange_response executed 173 times took: 73.2ms -> 26.7ms
- read_exchange_response executed 173 times took: 49.4ms -> 25.8ms
2025-03-10 11:54:38 -07:00
Martijn van Groningen
6afd3ecc58
Avoid reading unnecessary dimension values when downsampling (#124451)
Read dimension values once per tsid/bucket docid range instead of for each document being processed.
The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads.

Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).
2025-03-10 12:12:42 +01:00
Mark Hopkin
a5f186bb5d
Give Kibana user 'all' permissions for .entity_analytics.* indices (#123588) 2025-03-10 11:57:12 +01:00
Charlotte Hoblik
e51b50139b
Fix external URI images (#124350) 2025-03-10 11:31:47 +01:00
Niels Bauman
9cecc89fed
Run TransportExplainLifecycleAction on local node (#122885)
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.

Relates #101805
2025-03-10 09:43:13 +01:00
Samiul Monir
f0d5220178
Handle empty input inference (#123763)
* Added check for blank string to skip generating embeddings with unit test

* Adding yaml tests for skipping embedding generation

* dynamic update not required if model_settings stays null

* Updating node feature for handling empty input name and description

* Update yaml tests with refresh=true

* Update unit test to follow more accurate behavior

* Added yaml tests for multu chunks

* [CI] Auto commit changes from spotless

* Adding highlighter yaml tests for empty input

* Update docs/changelog/123763.yaml

* Update changelog and test reason to have more polished documentation

* adding input value into the response source and fixing unit tests by reformating

* Adding highligher test for backward compatibility and refactor existing test

* Added bwc tests for  empty input and multi chunks

* Removed reindex for empty input from bwc

* [CI] Auto commit changes from spotless

* Fixing yaml test

* Update unit tests helper function to support both format

* [CI] Auto commit changes from spotless

* Adding cluster features for bwc

* Centralize logic for assertInference helper

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-03-07 23:38:42 -05:00
Mike Pellegrini
db03788b17
Add bit vector support to semantic text (#123187) 2025-03-07 16:00:48 -05:00
David Kilfoyle
e158cd868b
[Docs] Fix cross-repo links to Beats docs (#124360)
Co-authored-by: Colleen McGinnis <colleen.mcginnis@elastic.co>
2025-03-07 14:38:46 -05:00
Parker Timmins
d83176d32a
Set cause on create index request in create from action (#124363)
In the create-index-from-source action, we should set the cause of the create index request so that it is clear in the logs. Without setting the cause on the request, the default value of api is used.
2025-03-07 13:14:12 -06:00
Svilen Mihaylov
ee4bcac1db
Added optional parameters to QSTR ES|QL function (#121787)
Adds options to QSTR function.

#118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES|QL.

Closes #120933
2025-03-07 13:00:22 -05:00
Tommaso Teofili
74bb0f9826
Do not let ShardBulkInferenceActionFilter unwrap / rewrap ESExceptions (#123890)
* do not let ShardBulkInferenceActionFilter unwrap / rewrap ESExceptions
2025-03-07 16:53:19 +01:00
Parker Timmins
10a8dcf0fb
Retry ILM async action after reindexing data stream (#124149)
When reindexing a data stream, the ILM metadata is copied from the index metadata of the source index to the destination index. But the ILM state of the new index can be stuck if the source index was in an AsyncAction at the time of reindexing. To un-stick the new index, we call TransportRetryAction to retry the AsyncAction. In the past this action would only run if the index were in the error phase. This change includes an update to TransportRetryAction, which allows it to be run when the index is not in an error phase, if the parameter requireError is set to false.
2025-03-06 12:39:45 -06:00
Niels Bauman
ff6465b83b
Avoid hoarding cluster state references during rollover (#124107)
By keeping a list of all the rollover results in a rollover request
batch, we were keeping references to all the intermediate cluster states
that we built. We've seen this list take up ~1.4GB with 600 rollover
requests in one batch.

We only kept the list of results to compute the "reason" for the
allocation reroute, so we can easily drop the cluster state reference
from the list and only keep what we need.

Fixes #123893
2025-03-06 18:34:57 +01:00
Benjamin Trent
a1ee3c9291
Have create index return a bad request on poor formatting (#123761)
closes: https://github.com/elastic/elasticsearch/issues/123661
2025-03-07 04:24:54 +11:00
Kostas Krikellas
296cae8a30
[DOCS] Document source-related restrictions (#124011)
* Document source-related restrictions

* Update mapping-source-field.md

* Update docs/reference/elasticsearch/mapping-reference/mapping-source-field.md

Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>

* Update mapping-source-field.md

---------

Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
2025-03-06 11:38:09 -05:00
Colleen McGinnis
23be51a04f
[DOCS] fix external links (#124248) 2025-03-06 17:27:03 +01:00
Tim Grein
67af06905a
[Inference API] Fix output stream ordering in InferenceActionProxy (#124225) 2025-03-06 16:33:20 +01:00
Liam Thompson
7cc613b0e4
[DOCS] Update DOCS README.md backporting guidance (#124228) 2025-03-06 15:43:27 +01:00
Andrei Stefan
04c8bf4ba8
ESQL: Revive some more of inlinestats functionality (#123589) 2025-03-06 16:37:58 +02:00
Marci W
bea3af2467
[DOCS] Clarify support for doc_values (#124047)
* Update doc-values.md

* Make the note more visible

* fix link
2025-03-06 09:01:19 -05:00
Martijn van Groningen
ea8283e9c8
Avoid serializing empty _source fields in mappings. (#122606) 2025-03-06 12:20:07 +01:00
Francisco Fernández Castaño
387eef070c
Enhance memory accounting for document expansion and introduce max document size limit (#123543)
This commit improves memory accounting by incorporating document
expansion during shard bulk execution. Additionally, it introduces a new
limit on the maximum document size, which defaults to 5% of the
available heap.

This limit can be configured using the new setting:

indexing_pressure.memory.max_operation_size
These changes help prevent excessive memory consumption and
improve indexing stability.

Closes ES-10777
2025-03-06 11:26:49 +01:00
Nhat Nguyen
206363664c
Introduce allow_partial_results setting in ES|QL (#122890)
This change introduces a cluster setting 
`esql.query.allow_partial_results` that allows enabling or disabling
allow_partial_results in ES|QL at the cluster-wide level. Initially,
this setting defaults to false, but it will be switched to true soon. 
The reason for not changing the default in this PR is that it requires
adjusting many tests, which would make the PR too large. Instead, we
will adjust the tests incrementally and switch the default when the
tests are ready. This cluster setting is useful for falling back to the
previous behavior (i.e., disabling allow_partial_results) if users
upgrade to the new version and haven't updated their queries.

Also, the default setting can be overridden on a per-request basis via a 
URL parameter (allow_partial_results) (changed from request body to URL
parameter to conform to the proposal).

Relates #122802
2025-03-05 13:48:20 -08:00