Commit graph

3650 commits

Author SHA1 Message Date
Quentin Pradet
5a2c7841b9
Fix trailing slash in security.put_privileges specification (#110177) 2024-06-26 12:28:47 +04:00
Benjamin Trent
759f61d90f
Fix vector similarity test failures for half-byte (#110065)
The test that is set up assumes a single shard. Since the test uses so
few vectors and few dimensions, the statistics are pretty sensitive. 

CCS tests seem to allow more than one write shard (via more than one
cluster). Consequently, the similarity detected can vary pretty wildly.
However, through empirical testing, I found that the desired vector
seems to always have a score > 0.0034 and all the other vectors have a
score < 0.001. This commit adjusts this similarity threshold
accordingly. This should make the test flakiness go away in CCS testing.

closes: https://github.com/elastic/elasticsearch/issues/109881
2024-06-26 04:16:55 +10:00
Quentin Pradet
af8c35986f
Fix trailing slash in ml.get_categories specification (#110146) 2024-06-25 22:00:31 +04:00
Oleksandr Kolomiiets
653b99a76b
Opt in keyword field into fallback synthetic source when doc values are disabled (#110016) 2024-06-25 09:34:58 -07:00
Kathleen DeRusso
41a61b069b
Mark Query Rules as GA (#110004)
* Mark query rules APIs as stable

* Remove preview label from docs

* Update docs/changelog/110004.yaml
2024-06-21 15:26:51 -04:00
Carlos Delgado
d332ed7d16
Enforce synonyms limit on APIs (#109981) 2024-06-21 18:16:16 +02:00
Kostas Krikellas
9c603a38f0
Apply FLS to the contents of IgnoredSourceFieldMapper (#109931)
* Apply FLS to the contents of IgnoredSourceFieldMapper

* Update docs/changelog/109931.yaml

* minor refactor

* minor refactor

* spotless fix

* more tests

* check unfiltered map

* add comments

* add comments

* add unittest

* update test
2024-06-21 08:19:25 +03:00
David Turner
5662f988b2
Remove trappy timeouts in snapshot APIs (#109828)
Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in
`o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST
layer (where they become API params), the test suite (where they become
`TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is
available.

Relates #107984
2024-06-21 07:11:12 +10:00
Jim Ferenczi
a6470fb86d
Fix cluster level dense vector stats (#107962)
The cluster level dense vector stats returns the total number of dense vector indices globally including the replicas.
This commit fixes the total to only include the value count of the primary indices.
This change aligns with the docs stats which also reports the number of primary documents when used in cluster stats.
The indices stats API still reports granular results for replicas and primaries so the information is not lost.
2024-06-18 17:45:02 +01:00
Niels Bauman
55431dd07b
Fix shards_capacity indicator assertion in Health API YAML test (#109808)
This indicator is dependent on `HealthMetadata` being present in
the cluster state, which we can't guarantee in this test,
potentially resulting in an `unknown` status.
2024-06-18 15:29:49 +02:00
Jim Ferenczi
c2ca504c1b
Add an explicit synthetic mode for nested fields (#109809)
This change adds a synthetic mode for nested fields that recursively load nested objects from stored fields and doc values.
The order of the sub-objects is preserved since they are indexed in separate Lucene documents.
This change also introduces the `store_array_source` mode in the nested field options. This option is disabled by default when synthetic is used but users can opt-in for this behaviour.
2024-06-18 08:31:39 +01:00
Kathleen DeRusso
8529bf71f6
Add SparseVectorStats (#108793)
* Add SparseVectorStats

* Update to use mappings in engine

* Update to be unique to primary shards

* Fix doc

* Fix null error in test

* Cleanup

* fix yaml

* remove comment

* add version to yaml

* Revert whitespace changes to stats doc

* fix yml test

* Checkstyle

* Fix NPE in test

* Update docs/changelog/108793.yaml

* Add link to sparse_vector field type in docs

* PR feedback

* Flesh out test a bit more

* PR feedback - alphabetize placement in docs

* Fix doc change
2024-06-17 11:42:14 -04:00
Benjamin Trent
3aed0afb2b
Add new int4 quantization to dense_vector (#109317)
This adds a new quantization mechanism for HNSW and flat indices. Here
we add `int4` quantization via the `int4_hnsw` and `int4_flat` index
types. This quantization methodology further reduces the memory required
for fast HNSW, meaning that the memory required is 8x smaller than with
regular float32 values. 

8x reduction means that 1M 1024 dimension vectors goes from requiring
3.8GB to 477MB.

Recall continues to stay steady, there is some reduction that is
recoverable via slightly oversampling and reranking. For example over
500k CohereV3 vectors, only 5 extra vectors are required to be gathered
to achieve over 0.98 recall in a brute-force scenario.

![recall](b47a79d0-020d-4baa-8199-41a932df00f7)
2024-06-18 00:15:43 +10:00
Efe Gürkan YALAMAN
8e878c4cc8
[Connector API] Add claim sync job endpoint (#109480) 2024-06-17 13:16:01 +02:00
Salvatore Campagna
0ebe811a66
Introduce a setting controlling the activation of the logs index mode in logs@settings (#109025)
Here we introduce a `cluster.logsdb.enabled` setting that controls activation of the new `logs` index mode in `logs@settings`.  The setting default value is `false` and prevents usage of the new index mode by default in `logs@settings`. We also change `hostname` to `host.name` as the default field used for sorting (other than `@timestamp`) and include it in `logs@mappings`.
2024-06-17 12:19:42 +02:00
Nick Tindall
cd8b1f9dc9
Add wait_for_completion parameter to delete snapshot request (#109462)
Closes #101300
2024-06-15 12:27:35 +10:00
Alexander Reelsen
4de67ad7f0
DocsStats: Add human readable bytesize (#109720)
This adds support for the `human` parameter for DocsStats, as it was
missing. Sample

```
GET _cluster/stats?human&filter_path=indices.docs
```
2024-06-15 08:20:04 +10:00
Benjamin Trent
6095e694c9
Ensure 41_knn_search_byte_quantized is on a single shard for repeatability (#109696)
Since we are only indexing 3 docs, we need to ensure its a single shard for score repeatability.

Additionally, adding back all the flushes that were removed to ensure we exercise the merging paths.
2024-06-13 15:46:22 -04:00
Benjamin Trent
ce78cdee5e
Remove unnecessary flush & refresh for 41_knn_search_byte_quantized (#109642) 2024-06-13 04:18:40 +10:00
Benjamin Trent
8cd65a4a45
Add replicas, we are unnecessarily setting it to 0 (#109640) 2024-06-12 13:54:20 -04:00
Benjamin Trent
fdd183ddbd Merge branch 'lucene_snapshot_9_11' 2024-06-12 10:51:02 -04:00
Kathleen DeRusso
32139e45b1
[Query Rules] Add API calls to get or delete individual query rules within a ruleset (#109554)
* Add priority to the query rule index, and merge rule updates into existing rulesets by priority

* Don't require double specification of rule_id

* Initial addition of get and delete API calls

* Add tests

* Update docs/changelog/109554.yaml

* D'oh! Removed commented out code

* Add test

* Update URI for requests and add test

* Ensure URIs are consistent for individual query rule API calls and update constant names to be more explicit that they are rules within a ruleset

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-06-12 09:32:46 -04:00
Alexander Spies
e540732e39
Aggs: Scripted metric allow list (#109444)
Introduces new cluster settings that allow only a certain set of scripts in scripted metrics aggregations:
- search.aggs.only_allowed_metric_scripts, defaults to false
- search.aggs.allowed_inline_metric_scripts, defaults to empty list
- search.aggs.allowed_stored_metric_scripts, defaults to empty list
2024-06-12 14:23:03 +02:00
Benjamin Trent
bd550074db
Fixing new flaky test from #109423 (#109622) 2024-06-12 22:09:52 +10:00
Benjamin Trent
08298dcd69 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-12 08:05:36 -04:00
Benjamin Trent
d846223593
Mute all collapse tests for 8.13 (#109594)
related to: https://github.com/elastic/elasticsearch/issues/109476
2024-06-12 21:51:42 +10:00
Benjamin Trent
90ab2558b0
Adjusting bwc version after backport of #109423 (#109469)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-06-11 08:07:40 -04:00
Benjamin Trent
29288d6590 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-11 06:54:23 -04:00
Tommaso Teofili
8e9e9bc6c8
Relaxed resulting docs checks (#109560) 2024-06-11 11:53:18 +02:00
Kathleen DeRusso
ec0b573af6
Add Create or update query rule API call (#109042) 2024-06-10 14:17:25 -04:00
Tommaso Teofili
7e7f8a379a
Make dense vector field type updatable (#106591) 2024-06-10 18:39:02 +02:00
Przemysław Witek
28faacd869
[Transform] Introduce _transform/_node_stats API (#107279) 2024-06-10 14:03:49 +02:00
David Turner
8b759a1a70
Fix trappy timeouts in security settings APIs (#109233)
Relates #107984
2024-06-10 17:48:18 +10:00
Oleksandr Kolomiiets
1080425a65
Enable fallback synthetic source by default (#109370) 2024-06-07 09:21:22 -07:00
Max Hniebergall
f2e218ac44
[ML] Add dry run and force to json spec for Delete Inference endpoint (#109402)
* Add dry run and force to json spec

* Rewording

Co-authored-by: Tim Grein <tim.grein@elastic.co>

---------

Co-authored-by: Tim Grein <tim.grein@elastic.co>
2024-06-07 11:18:42 -04:00
Benjamin Trent
d3561f9cf3 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-06 18:22:08 -04:00
Benjamin Trent
4c17c861d2
Correct how hex strings are handled when dynamically updating vector dims (#109423)
closes: https://github.com/elastic/elasticsearch/issues/109411
2024-06-07 04:20:47 +10:00
Lorenzo Verardo
02a6c831e1
Limit the value in prefix query (#108537)
Reuse the setting index.max_regex_length for the max length in a prefix query.

Closes #108486
2024-06-05 14:51:07 -04:00
Benjamin Trent
ac53d6020b Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-05 12:38:23 -04:00
Martijn van Groningen
ac6c0eecc1
Ensure synthetic source and dv codec are enabled with logs index mode (attempt 2). (#109382)
This was initially muted via #109365, because of a failing newly introduced assert.

Original PR #109269
2024-06-05 17:32:14 +02:00
Benjamin Trent
013e0c7cc6 Merge branch 'main' into lucene_snapshot_9_11 2024-06-04 18:08:29 -04:00
Oleksandr Kolomiiets
f1153b1f8d
Revert "Ensure synthetic source and dv codec are enabled with logs index mode. (#109269)" (#109365)
This reverts commit 4161e4d2e2.
2024-06-04 12:45:08 -07:00
john-wagster
dd83b5b8d0 Multivalue Sparse Vector Support (#109007)
Updated LuceneDocument to take advantage of looking up feature values on existing features and selecting the max when parsing multi-value sparse vectors
2024-06-04 12:50:58 -04:00
Benjamin Trent
cf84416fc5 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-04 12:50:52 -04:00
Martijn van Groningen
4161e4d2e2
Ensure synthetic source and dv codec are enabled with logs index mode. (#109269)
After running the elastic/logs track with logs index mode enabled, I noticed that _source was still getting stored.
The issue was that other index modes than time_series weren't propagated to Indexmetadata and IndexSettings classes. Additionally the synthetic source defaults in SourceFieldMapper were geared towards time series index mode only. This change addresses this.
2024-06-04 16:06:19 +02:00
Jedr Blaszyk
d543d91f02
[Connector API] Implement _features endpoint (#109248) 2024-06-04 11:58:37 +02:00
Benjamin Trent
9cd123d6cc Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-02 16:46:19 -04:00
David Turner
3f23c15180
Fix misc trappy allocation API timeouts (#109241)
Get/delete desired balance and get-allocation-stats APIs

Relates #107984
2024-05-31 08:04:57 -04:00
David Turner
0177a0c8ae
Fix trappy timeout in allocation explain API (#109240)
Relates #107984
2024-05-31 07:49:44 -04:00
Salvatore Campagna
f11ef44084
Introduce logs index mode as Tech Preview (#108896)
This PR introduces a new index mode, `logs`, which enables usage of LogsDB in Elasticsearch.
As a result of adopting the `logs` index mode, default index sorting is applied using the hostname
and @timestamp fields. Users are allowed, anyway, to override index sort settings.
By default, it will also use synthetic source and the same codecs used by TSDB.

Note: the logs index mode is a Tech Preview feature.
2024-05-30 14:26:48 +02:00