elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-23 14:47:31 -04:00

Author	SHA1	Message	Date
Quentin Pradet	5a2c7841b9	Fix trailing slash in security.put_privileges specification (#110177 )	2024-06-26 12:28:47 +04:00
Benjamin Trent	759f61d90f	Fix vector similarity test failures for half-byte (#110065 ) The test that is set up assumes a single shard. Since the test uses so few vectors and few dimensions, the statistics are pretty sensitive. CCS tests seem to allow more than one write shard (via more than one cluster). Consequently, the similarity detected can vary pretty wildly. However, through empirical testing, I found that the desired vector seems to always have a score > 0.0034 and all the other vectors have a score < 0.001. This commit adjusts this similarity threshold accordingly. This should make the test flakiness go away in CCS testing. closes: https://github.com/elastic/elasticsearch/issues/109881	2024-06-26 04:16:55 +10:00
Quentin Pradet	af8c35986f	Fix trailing slash in ml.get_categories specification (#110146 )	2024-06-25 22:00:31 +04:00
Oleksandr Kolomiiets	653b99a76b	Opt in keyword field into fallback synthetic source when doc values are disabled (#110016 )	2024-06-25 09:34:58 -07:00
Kathleen DeRusso	41a61b069b	Mark Query Rules as GA (#110004 ) * Mark query rules APIs as stable * Remove preview label from docs * Update docs/changelog/110004.yaml	2024-06-21 15:26:51 -04:00
Carlos Delgado	d332ed7d16	Enforce synonyms limit on APIs (#109981 )	2024-06-21 18:16:16 +02:00
Kostas Krikellas	9c603a38f0	Apply FLS to the contents of IgnoredSourceFieldMapper (#109931 ) * Apply FLS to the contents of IgnoredSourceFieldMapper * Update docs/changelog/109931.yaml * minor refactor * minor refactor * spotless fix * more tests * check unfiltered map * add comments * add comments * add unittest * update test	2024-06-21 08:19:25 +03:00
David Turner	5662f988b2	Remove trappy timeouts in snapshot APIs (#109828 ) Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in `o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST layer (where they become API params), the test suite (where they become `TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is available. Relates #107984	2024-06-21 07:11:12 +10:00
Jim Ferenczi	a6470fb86d	Fix cluster level dense vector stats (#107962 ) The cluster level dense vector stats returns the total number of dense vector indices globally including the replicas. This commit fixes the total to only include the value count of the primary indices. This change aligns with the docs stats which also reports the number of primary documents when used in cluster stats. The indices stats API still reports granular results for replicas and primaries so the information is not lost.	2024-06-18 17:45:02 +01:00
Niels Bauman	55431dd07b	Fix shards_capacity indicator assertion in Health API YAML test (#109808 ) This indicator is dependent on `HealthMetadata` being present in the cluster state, which we can't guarantee in this test, potentially resulting in an `unknown` status.	2024-06-18 15:29:49 +02:00
Jim Ferenczi	c2ca504c1b	Add an explicit synthetic mode for nested fields (#109809 ) This change adds a synthetic mode for nested fields that recursively load nested objects from stored fields and doc values. The order of the sub-objects is preserved since they are indexed in separate Lucene documents. This change also introduces the `store_array_source` mode in the nested field options. This option is disabled by default when synthetic is used but users can opt-in for this behaviour.	2024-06-18 08:31:39 +01:00
Kathleen DeRusso	8529bf71f6	Add SparseVectorStats (#108793 ) * Add SparseVectorStats * Update to use mappings in engine * Update to be unique to primary shards * Fix doc * Fix null error in test * Cleanup * fix yaml * remove comment * add version to yaml * Revert whitespace changes to stats doc * fix yml test * Checkstyle * Fix NPE in test * Update docs/changelog/108793.yaml * Add link to sparse_vector field type in docs * PR feedback * Flesh out test a bit more * PR feedback - alphabetize placement in docs * Fix doc change	2024-06-17 11:42:14 -04:00
Benjamin Trent	3aed0afb2b	Add new int4 quantization to dense_vector (#109317 ) This adds a new quantization mechanism for HNSW and flat indices. Here we add `int4` quantization via the `int4_hnsw` and `int4_flat` index types. This quantization methodology further reduces the memory required for fast HNSW, meaning that the memory required is 8x smaller than with regular float32 values. 8x reduction means that 1M 1024 dimension vectors goes from requiring 3.8GB to 477MB. Recall continues to stay steady, there is some reduction that is recoverable via slightly oversampling and reranking. For example over 500k CohereV3 vectors, only 5 extra vectors are required to be gathered to achieve over 0.98 recall in a brute-force scenario. ![recall](`b47a79d0`-020d-4baa-8199-41a932df00f7)	2024-06-18 00:15:43 +10:00
Efe Gürkan YALAMAN	8e878c4cc8	[Connector API] Add claim sync job endpoint (#109480 )	2024-06-17 13:16:01 +02:00
Salvatore Campagna	0ebe811a66	Introduce a setting controlling the activation of the `logs` index mode in `logs@settings` (#109025 ) Here we introduce a `cluster.logsdb.enabled` setting that controls activation of the new `logs` index mode in `logs@settings`. The setting default value is `false` and prevents usage of the new index mode by default in `logs@settings`. We also change `hostname` to `host.name` as the default field used for sorting (other than `@timestamp`) and include it in `logs@mappings`.	2024-06-17 12:19:42 +02:00
Nick Tindall	cd8b1f9dc9	Add wait_for_completion parameter to delete snapshot request (#109462 ) Closes #101300	2024-06-15 12:27:35 +10:00
Alexander Reelsen	4de67ad7f0	DocsStats: Add human readable bytesize (#109720 ) This adds support for the `human` parameter for DocsStats, as it was missing. Sample ``` GET _cluster/stats?human&filter_path=indices.docs ```	2024-06-15 08:20:04 +10:00
Benjamin Trent	6095e694c9	Ensure 41_knn_search_byte_quantized is on a single shard for repeatability (#109696 ) Since we are only indexing 3 docs, we need to ensure its a single shard for score repeatability. Additionally, adding back all the flushes that were removed to ensure we exercise the merging paths.	2024-06-13 15:46:22 -04:00
Benjamin Trent	ce78cdee5e	Remove unnecessary flush & refresh for 41_knn_search_byte_quantized (#109642 )	2024-06-13 04:18:40 +10:00
Benjamin Trent	8cd65a4a45	Add replicas, we are unnecessarily setting it to 0 (#109640 )	2024-06-12 13:54:20 -04:00
Benjamin Trent	fdd183ddbd	Merge branch 'lucene_snapshot_9_11'	2024-06-12 10:51:02 -04:00
Kathleen DeRusso	32139e45b1	[Query Rules] Add API calls to get or delete individual query rules within a ruleset (#109554 ) * Add priority to the query rule index, and merge rule updates into existing rulesets by priority * Don't require double specification of rule_id * Initial addition of get and delete API calls * Add tests * Update docs/changelog/109554.yaml * D'oh! Removed commented out code * Add test * Update URI for requests and add test * Ensure URIs are consistent for individual query rule API calls and update constant names to be more explicit that they are rules within a ruleset --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-06-12 09:32:46 -04:00
Alexander Spies	e540732e39	Aggs: Scripted metric allow list (#109444 ) Introduces new cluster settings that allow only a certain set of scripts in scripted metrics aggregations: - search.aggs.only_allowed_metric_scripts, defaults to false - search.aggs.allowed_inline_metric_scripts, defaults to empty list - search.aggs.allowed_stored_metric_scripts, defaults to empty list	2024-06-12 14:23:03 +02:00
Benjamin Trent	bd550074db	Fixing new flaky test from #109423 (#109622 )	2024-06-12 22:09:52 +10:00
Benjamin Trent	08298dcd69	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-12 08:05:36 -04:00
Benjamin Trent	d846223593	Mute all collapse tests for 8.13 (#109594 ) related to: https://github.com/elastic/elasticsearch/issues/109476	2024-06-12 21:51:42 +10:00
Benjamin Trent	90ab2558b0	Adjusting bwc version after backport of #109423 (#109469 ) Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-06-11 08:07:40 -04:00
Benjamin Trent	29288d6590	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-11 06:54:23 -04:00
Tommaso Teofili	8e9e9bc6c8	Relaxed resulting docs checks (#109560 )	2024-06-11 11:53:18 +02:00
Kathleen DeRusso	ec0b573af6	Add Create or update query rule API call (#109042 )	2024-06-10 14:17:25 -04:00
Tommaso Teofili	7e7f8a379a	Make dense vector field type updatable (#106591 )	2024-06-10 18:39:02 +02:00
Przemysław Witek	28faacd869	[Transform] Introduce `_transform/_node_stats` API (#107279 )	2024-06-10 14:03:49 +02:00
David Turner	8b759a1a70	Fix trappy timeouts in security settings APIs (#109233 ) Relates #107984	2024-06-10 17:48:18 +10:00
Oleksandr Kolomiiets	1080425a65	Enable fallback synthetic source by default (#109370 )	2024-06-07 09:21:22 -07:00
Max Hniebergall	f2e218ac44	[ML] Add dry run and force to json spec for Delete Inference endpoint (#109402 ) * Add dry run and force to json spec * Rewording Co-authored-by: Tim Grein <tim.grein@elastic.co> --------- Co-authored-by: Tim Grein <tim.grein@elastic.co>	2024-06-07 11:18:42 -04:00
Benjamin Trent	d3561f9cf3	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-06 18:22:08 -04:00
Benjamin Trent	4c17c861d2	Correct how hex strings are handled when dynamically updating vector dims (#109423 ) closes: https://github.com/elastic/elasticsearch/issues/109411	2024-06-07 04:20:47 +10:00
Lorenzo Verardo	02a6c831e1	Limit the value in prefix query (#108537 ) Reuse the setting index.max_regex_length for the max length in a prefix query. Closes #108486	2024-06-05 14:51:07 -04:00
Benjamin Trent	ac53d6020b	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-05 12:38:23 -04:00
Martijn van Groningen	ac6c0eecc1	Ensure synthetic source and dv codec are enabled with logs index mode (attempt 2). (#109382 ) This was initially muted via #109365, because of a failing newly introduced assert. Original PR #109269	2024-06-05 17:32:14 +02:00
Benjamin Trent	013e0c7cc6	Merge branch 'main' into lucene_snapshot_9_11	2024-06-04 18:08:29 -04:00
Oleksandr Kolomiiets	f1153b1f8d	Revert "Ensure synthetic source and dv codec are enabled with logs index mode. (#109269 )" (#109365 ) This reverts commit `4161e4d2e2`.	2024-06-04 12:45:08 -07:00
john-wagster	dd83b5b8d0	Multivalue Sparse Vector Support (#109007 ) Updated LuceneDocument to take advantage of looking up feature values on existing features and selecting the max when parsing multi-value sparse vectors	2024-06-04 12:50:58 -04:00
Benjamin Trent	cf84416fc5	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-04 12:50:52 -04:00
Martijn van Groningen	4161e4d2e2	Ensure synthetic source and dv codec are enabled with logs index mode. (#109269 ) After running the elastic/logs track with logs index mode enabled, I noticed that _source was still getting stored. The issue was that other index modes than time_series weren't propagated to Indexmetadata and IndexSettings classes. Additionally the synthetic source defaults in SourceFieldMapper were geared towards time series index mode only. This change addresses this.	2024-06-04 16:06:19 +02:00
Jedr Blaszyk	d543d91f02	[Connector API] Implement _features endpoint (#109248 )	2024-06-04 11:58:37 +02:00
Benjamin Trent	9cd123d6cc	Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11	2024-06-02 16:46:19 -04:00
David Turner	3f23c15180	Fix misc trappy allocation API timeouts (#109241 ) Get/delete desired balance and get-allocation-stats APIs Relates #107984	2024-05-31 08:04:57 -04:00
David Turner	0177a0c8ae	Fix trappy timeout in allocation explain API (#109240 ) Relates #107984	2024-05-31 07:49:44 -04:00
Salvatore Campagna	f11ef44084	Introduce `logs` index mode as Tech Preview (#108896 ) This PR introduces a new index mode, `logs`, which enables usage of LogsDB in Elasticsearch. As a result of adopting the `logs` index mode, default index sorting is applied using the hostname and @timestamp fields. Users are allowed, anyway, to override index sort settings. By default, it will also use synthetic source and the same codecs used by TSDB. Note: the logs index mode is a Tech Preview feature.	2024-05-30 14:26:48 +02:00

... 2 3 4 5 6 ...

3650 commits