elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-21 05:37:23 -04:00

Author	SHA1	Message	Date
Benjamin Trent	b2c1c4e0f0	New `vector_rescore` parameter as a quantized index type option (#124581 ) This adds a new parameter to the quantized index mapping that allows default oversampling and rescoring to occur. This doesn't adjust any of the defaults. It allows it to be configured. When the user provides `rescore_vector: {oversample: <number>}` in the query it will overwrite it. For example, here is how to use it with bbq: ``` PUT rescored_bbq { "mappings": { "properties": { "vector": { "type": "dense_vector", "index_options": { "type": "bbq_hnsw", "rescore_vector": {"oversample": 3.0} } } } } } ``` Then, when querying, it will auto oversample the `k` by `3x` and rerank with the raw vectors. ``` POST _search { "knn": { "query_vector": [...], "field": "vector" } } ```	2025-03-14 00:40:08 +11:00
Craig Taverner	d5ddb909a4	ESQL autogenerate docs v3 (#124312 ) Building on the work started in https://github.com/elastic/elasticsearch/pull/123904, we now want to auto-generate most of the small subfiles from the ES\|QL functions unit tests. This work also investigates any remaining discrepancies between the original asciidoc version and the new markdown, and tries to minimize differences so the docs do not look too different. The kibana json and markdown files are moved to a new location, and the operator docs are a little more generated than before (although still largely manual).	2025-03-13 14:16:46 +01:00
Slobodan Adamović	cac356ae64	Disable queryable built-in feature in docs YAML tests (#124684 ) The .security index is created asynchronously on a cluster startup. This affects some of the docs YAML tests in a way that they need to account for the existence of the .security index or wait for the index to be created and green. This PR disables the feature for docs YAML tests. Disabling the feature in docs YAML tests will solve the flakiness without affecting the coverage. Resolves https://github.com/elastic/elasticsearch/issues/122343 Resolves https://github.com/elastic/elasticsearch/issues/121748 Resolves https://github.com/elastic/elasticsearch/issues/121611 Resolves https://github.com/elastic/elasticsearch/issues/121345 Resolves https://github.com/elastic/elasticsearch/issues/121338 Resolves https://github.com/elastic/elasticsearch/issues/121337 Resolves https://github.com/elastic/elasticsearch/issues/121288 Resolves https://github.com/elastic/elasticsearch/issues/121287 Resolves https://github.com/elastic/elasticsearch/issues/121867 Resolves https://github.com/elastic/elasticsearch/issues/122335 Resolves https://github.com/elastic/elasticsearch/issues/122681 Resolves https://github.com/elastic/elasticsearch/issues/121976 Resolves https://github.com/elastic/elasticsearch/issues/123094 Resolves https://github.com/elastic/elasticsearch/issues/123192 Resolves https://github.com/elastic/elasticsearch/issues/122983 Resolves https://github.com/elastic/elasticsearch/issues/124671 Resolves https://github.com/elastic/elasticsearch/issues/124103	2025-03-13 23:13:45 +11:00
Jan Kuipers	a503497bce	Add max.chunks to EmbeddingRequestChunker to prevent OOM (#123150 ) * add max number of chunks * wire merge function * implement sparse merge function * move tests to correct package/file * float merge function * bytes merge function * more accurate byte average * spotless * Fix/improve EmbeddingRequestChunkerTests * Remove TODO * remove unnecessary field * remove Chunk generic * add TODO * Remove specialized chunks * add comment * Update docs/changelog/123150.yaml * update changelog	2025-03-13 11:38:12 +01:00
Martijn van Groningen	ce3a778fa1	Improve downsample performance by buffering docids and do bulk processing. (#124477 )	2025-03-13 07:46:08 +01:00
Andrei Stefan	c48f9a9e1c	ESQL: Change the order of the optimization rules (#124335 )	2025-03-13 07:45:37 +02:00
Nick Tindall	74d61a4052	Retry when the server can't be resolved (#123852 )	2025-03-13 12:38:04 +11:00
Joe Gallo	d565304f4b	Fix geoip databases index access after system feature migration (take 3) (#124604 )	2025-03-12 14:03:57 -04:00
Tommaso Teofili	c971d79a95	Let MLTQuery throw IAE when no analyzer is set (#124662 ) * Let MLTQuery throw IAE when no analyzer is set	2025-03-12 18:37:31 +01:00
Charlotte Hoblik	9e754ec8f6	[DOCS] Plugin management reference cleanup (#124578 ) * add content to plugin management * add content to Plugin Management * Update docs/reference/elasticsearch-plugins/plugin-management.md Co-authored-by: florent-leborgne <florent.leborgne@elastic.co> * fix applies-to tag * add ech to docset.yml --------- Co-authored-by: florent-leborgne <florent.leborgne@elastic.co>	2025-03-12 17:01:10 +01:00
Valeriy Khakhutskyy	44fba7213d	[ML] Provide model_size_stats as soon as an anomaly detection job is opened (#124638 ) Fixes #121168	2025-03-12 16:57:58 +01:00
Pat Whelan	9f89a3b318	[ML] Integrate with DeepSeek API (#122218 ) Integrating for Chat Completion and Completion task types, both calling the chat completion API for DeepSeek.	2025-03-12 15:24:39 +01:00
Nik Everett	50aaa1c2a6	ESQL: Pragma to load from stored fields (#122891 ) This creates a `pragma` you can use to request that fields load from a stored field rather than doc values. It implements that pragma for `keyword` and number fields. We expect that, for some disk configuration and some number of fields, that it's faster to load those fields from _source or stored fields than it is to use doc values. Our default is doc values and on my laptop it's always faster to use doc values. But we don't ship my laptop to every cluster. This will let us experiment and debug slow queries by trying to load fields a different way. You access this pragma with: ``` curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{ "query": "FROM foo", "pragma": { "field_extract_preference": "STORED" } }' ``` On a release build you'll need to add `"accept_pragma_risks": true`.	2025-03-12 09:40:42 -04:00
Mridula	f6538e86e2	Prevent Query Rule Creation with Invalid Numeric Match Criteria (#122823 ) * SEARCH-802 - bug fixed - Query rules allows for creation of rules with invalid match criteria * [CI] Auto commit changes from spotless * Worked on the comments given in the PR * [CI] Auto commit changes from spotless * Fixed Integration tests * [CI] Auto commit changes from spotless * Made changes from the PR * Update docs/changelog/122823.yaml * [CI] Auto commit changes from spotless * Fixed the duplicate code issue in queryRuleTests * Refactored code to clean it up based on PR comments * [CI] Auto commit changes from spotless * Logger statements were removed * Cleaned up the QueryRule tests * [CI] Auto commit changes from spotless * Update x-pack/plugin/ent-search/src/test/java/org/elasticsearch/xpack/application/EnterpriseSearchModuleTestUtils.java Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co> * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co> Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>	2025-03-12 13:56:13 +01:00
Tim Grein	0b83425d17	[Inference API] Propagate product use case http header to EIS (#124025 )	2025-03-12 12:48:24 +01:00
Mikhail Berezovskiy	053b037a9b	GCS blob store: add OperationPurpose/Operation stats counters (#122991 )	2025-03-11 17:57:15 -07:00
kanoshiou	deff3df9f0	ES\|QL: Support `::date` in inline cast (#123460 ) * Inline cast to date * Update docs/changelog/123460.yaml * New capability for `::date` casting * More tests * Update tests --------- Co-authored-by: Fang Xing <155562079+fang-xing-esql@users.noreply.github.com>	2025-03-11 17:08:10 -04:00
Iván Cea Fontenla	2fff041077	ESQL: Push down StartsWith and EndsWith functions to Lucene (#123381 ) Fixes https://github.com/elastic/elasticsearch/issues/123067 Just like WildcardLike and RLike, some functions can be converted to Lucene queries. Here it's those two, which are nearly identical to WildcardLike This, like some other functions, needs a FoldContext. I'm using the static method for this here, but it's fixed in https://github.com/elastic/elasticsearch/pull/123398, which I kept separated as it changes many files	2025-03-11 19:14:05 +01:00
Simon Cooper	d8e889acb6	Restore TextSimilarityRankBuilder XContent output (#124564 )	2025-03-11 16:03:04 +00:00
Mark Tozzi	3e949479d8	ESQL - Include thread names in profile output (#124262 ) Resolves #123053 This adds the thread name to the driver sleep profile output. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-03-11 15:53:22 +01:00
Carlos Delgado	2b40e73fe9	ES\|QL - Add scoring for full text functions disjunctions (#121793 )	2025-03-11 15:29:15 +01:00
Johannes Fredén	e11d89d76b	Bump nimbus-jose-jwt to 10.0.2 (#124544 ) This bumps nimbus-jose-jwt from 10.0.1 -> 10.0.2	2025-03-12 00:23:33 +11:00
Jan Calanog	435d1db5b9	Remove subs attribute (#124551 )	2025-03-11 12:14:58 +01:00
David Kyle	444b8eab75	[ML] Avoid potentially throwing calls to Task#getDescription in model download	2025-03-11 09:48:07 +00:00
Ioana Tagirta	cda82554aa	ES\|QL: Add initial grammar and planning for RRF (snapshot) (#123396 )	2025-03-11 10:18:11 +01:00
Costin Leau	2761af000b	ESQL: Lazy collection copying during node transform (#124424 ) * ESQL: Lazy collection copying during node transform A set of optimization for tree traversal: 1. perform lazy copying during children transform 2. use long hashing to avoid object creation 3. perform type check first before collection checking Relates #124395	2025-03-10 16:11:47 -07:00
Luca Cavanna	def4c890bc	Fix concurrency issue in ScriptSortBuilder (#123757 ) Inter-segment concurrency is disabled whenever sort by field, included script sorting, is used in a search request. The reason why sort by field does not use concurrency is that there are some performance implications, given that the hit queue in Lucene is build per slice and the different search threads don't share information about the documents they have already visited etc. The reason why script sort has concurrency disabled is that the script sorting implementation is not thread safe. This commit addresses such concurrency issue and re-enables search concurrency for search requests that use script sorting. In addition, missing tests are added to cover for sort scripts that rely on _score being available and top_hits aggregation with a scripted sort clause.	2025-03-10 21:10:53 +01:00
Nhat Nguyen	79a1626160	Speed up block serialization (#124394 ) Currently, we use NamedWriteable for serializing blocks. While convenient, it incurs a noticeable performance penalty when pages contain thousands of blocks. Since block types are small and already centered in ElementType, we can safely switch from NamedWriteable to typed code. For example, the NamedWriteable alone of a small page with 10K fields would be 180KB, whereas the new method reduces it to 10KB. Below are the serialization improvements with FROM idx \| LIMIT 10000 where the target index has 10K fields: - write_exchange_response executed 173 times took: 73.2ms -> 26.7ms - read_exchange_response executed 173 times took: 49.4ms -> 25.8ms	2025-03-10 11:54:38 -07:00
Martijn van Groningen	6afd3ecc58	Avoid reading unnecessary dimension values when downsampling (#124451 ) Read dimension values once per tsid/bucket docid range instead of for each document being processed. The dimension value within a bucket-interval docid range is always to same and this avoids unnecessary reads. Latency of downsampling the tsdb track index into a 1 hour interval downsample index drop by ~16% (running on my local machine).	2025-03-10 12:12:42 +01:00
Mark Hopkin	a5f186bb5d	Give Kibana user 'all' permissions for .entity_analytics.* indices (#123588 )	2025-03-10 11:57:12 +01:00
Charlotte Hoblik	e51b50139b	Fix external URI images (#124350 )	2025-03-10 11:31:47 +01:00
Niels Bauman	9cecc89fed	Run `TransportExplainLifecycleAction` on local node (#122885 ) This action solely needs the cluster state, it can run on any node. Additionally, it needs to be cancellable to avoid doing unnecessary work after a client failure or timeout. Relates #101805	2025-03-10 09:43:13 +01:00
Samiul Monir	f0d5220178	Handle empty input inference (#123763 ) * Added check for blank string to skip generating embeddings with unit test * Adding yaml tests for skipping embedding generation * dynamic update not required if model_settings stays null * Updating node feature for handling empty input name and description * Update yaml tests with refresh=true * Update unit test to follow more accurate behavior * Added yaml tests for multu chunks * [CI] Auto commit changes from spotless * Adding highlighter yaml tests for empty input * Update docs/changelog/123763.yaml * Update changelog and test reason to have more polished documentation * adding input value into the response source and fixing unit tests by reformating * Adding highligher test for backward compatibility and refactor existing test * Added bwc tests for empty input and multi chunks * Removed reindex for empty input from bwc * [CI] Auto commit changes from spotless * Fixing yaml test * Update unit tests helper function to support both format * [CI] Auto commit changes from spotless * Adding cluster features for bwc * Centralize logic for assertInference helper --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-03-07 23:38:42 -05:00
Mike Pellegrini	db03788b17	Add bit vector support to semantic text (#123187 )	2025-03-07 16:00:48 -05:00
David Kilfoyle	e158cd868b	[Docs] Fix cross-repo links to Beats docs (#124360 ) Co-authored-by: Colleen McGinnis <colleen.mcginnis@elastic.co>	2025-03-07 14:38:46 -05:00
Parker Timmins	d83176d32a	Set cause on create index request in create from action (#124363 ) In the create-index-from-source action, we should set the cause of the create index request so that it is clear in the logs. Without setting the cause on the request, the default value of api is used.	2025-03-07 13:14:12 -06:00
Svilen Mihaylov	ee4bcac1db	Added optional parameters to QSTR ES\|QL function (#121787 ) Adds options to QSTR function. #118619 added named function parameters. This PR uses this mechanism for allowing query string function parameters, so query string parameters can be used in ES\|QL. Closes #120933	2025-03-07 13:00:22 -05:00
Tommaso Teofili	74bb0f9826	Do not let ShardBulkInferenceActionFilter unwrap / rewrap ESExceptions (#123890 ) * do not let ShardBulkInferenceActionFilter unwrap / rewrap ESExceptions	2025-03-07 16:53:19 +01:00
Parker Timmins	10a8dcf0fb	Retry ILM async action after reindexing data stream (#124149 ) When reindexing a data stream, the ILM metadata is copied from the index metadata of the source index to the destination index. But the ILM state of the new index can be stuck if the source index was in an AsyncAction at the time of reindexing. To un-stick the new index, we call TransportRetryAction to retry the AsyncAction. In the past this action would only run if the index were in the error phase. This change includes an update to TransportRetryAction, which allows it to be run when the index is not in an error phase, if the parameter requireError is set to false.	2025-03-06 12:39:45 -06:00
Niels Bauman	ff6465b83b	Avoid hoarding cluster state references during rollover (#124107 ) By keeping a list of all the rollover results in a rollover request batch, we were keeping references to all the intermediate cluster states that we built. We've seen this list take up ~1.4GB with 600 rollover requests in one batch. We only kept the list of results to compute the "reason" for the allocation reroute, so we can easily drop the cluster state reference from the list and only keep what we need. Fixes #123893	2025-03-06 18:34:57 +01:00
Benjamin Trent	a1ee3c9291	Have create index return a bad request on poor formatting (#123761 ) closes: https://github.com/elastic/elasticsearch/issues/123661	2025-03-07 04:24:54 +11:00
Kostas Krikellas	296cae8a30	[DOCS] Document source-related restrictions (#124011 ) * Document source-related restrictions * Update mapping-source-field.md * Update docs/reference/elasticsearch/mapping-reference/mapping-source-field.md Co-authored-by: Marci W <333176+marciw@users.noreply.github.com> * Update mapping-source-field.md --------- Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>	2025-03-06 11:38:09 -05:00
Colleen McGinnis	23be51a04f	[DOCS] fix external links (#124248 )	2025-03-06 17:27:03 +01:00
Tim Grein	67af06905a	[Inference API] Fix output stream ordering in InferenceActionProxy (#124225 )	2025-03-06 16:33:20 +01:00
Liam Thompson	7cc613b0e4	[DOCS] Update DOCS README.md backporting guidance (#124228 )	2025-03-06 15:43:27 +01:00
Andrei Stefan	04c8bf4ba8	ESQL: Revive some more of inlinestats functionality (#123589 )	2025-03-06 16:37:58 +02:00
Marci W	bea3af2467	[DOCS] Clarify support for doc_values (#124047 ) * Update doc-values.md * Make the note more visible * fix link	2025-03-06 09:01:19 -05:00
Martijn van Groningen	ea8283e9c8	Avoid serializing empty _source fields in mappings. (#122606 )	2025-03-06 12:20:07 +01:00
Francisco Fernández Castaño	387eef070c	Enhance memory accounting for document expansion and introduce max document size limit (#123543 ) This commit improves memory accounting by incorporating document expansion during shard bulk execution. Additionally, it introduces a new limit on the maximum document size, which defaults to 5% of the available heap. This limit can be configured using the new setting: indexing_pressure.memory.max_operation_size These changes help prevent excessive memory consumption and improve indexing stability. Closes ES-10777	2025-03-06 11:26:49 +01:00
Nhat Nguyen	206363664c	Introduce allow_partial_results setting in ES\|QL (#122890 ) This change introduces a cluster setting `esql.query.allow_partial_results` that allows enabling or disabling allow_partial_results in ES\|QL at the cluster-wide level. Initially, this setting defaults to false, but it will be switched to true soon. The reason for not changing the default in this PR is that it requires adjusting many tests, which would make the PR too large. Instead, we will adjust the tests incrementally and switch the default when the tests are ready. This cluster setting is useful for falling back to the previous behavior (i.e., disabling allow_partial_results) if users upgrade to the new version and haven't updated their queries. Also, the default setting can be overridden on a per-request basis via a URL parameter (allow_partial_results) (changed from request body to URL parameter to conform to the proposal). Relates #122802	2025-03-05 13:48:20 -08:00

... 4 5 6 7 8 ...

18047 commits