Commit graph

11682 commits

Author SHA1 Message Date
Oleksandr Kolomiiets
8bc5ecdc31
Support synthetic source together with ignore_malformed in histogram fields (#109882) 2024-06-20 09:09:45 -07:00
Liam Thompson
c6e21a9fd3
Fix Bulk Helpers link of Python (#108694) (#109939)
Co-authored-by: Hasanul Islam <hasanuli10@gmail.com>
2024-06-20 02:19:44 +10:00
Niels Bauman
ba91bfdc94
Lazily create the failure store (#109289)
Rather than initializing the failure store right away when a new
data stream is created, we leave it empty and mark it for lazy
rollover. This results in the failure store only being initialized
(i.e. an index created) when a failure has actually occurred.

The exception to the rule is when a failure occurs while the data
stream is being auto-created. In that case, we do want to initialize
the failure store right away.
2024-06-19 13:18:47 +02:00
Jim Ferenczi
a6470fb86d
Fix cluster level dense vector stats (#107962)
The cluster level dense vector stats returns the total number of dense vector indices globally including the replicas.
This commit fixes the total to only include the value count of the primary indices.
This change aligns with the docs stats which also reports the number of primary documents when used in cluster stats.
The indices stats API still reports granular results for replicas and primaries so the information is not lost.
2024-06-18 17:45:02 +01:00
Oleksandr Kolomiiets
5440f178aa
Support synthetic source for geo_point when ignore_malformed is used (#109651) 2024-06-18 08:37:27 -07:00
Nik Everett
b35f0ed48d
ESQL: Make a table of all inline casts (#109713)
This adds a test that generates
`docs/reference/esql/functions/kibana/inline_cast.json` which is a json
object who's keys are the names of valid inline casts and who's values
are the resulting data types.

I also moved one of the maps we use to make the inline casts to
`DataType`, which is a place where we want it.
2024-06-18 06:23:11 -04:00
Ed Savage
c214457b39
[ML] Handle the "output memory allocator bytes" field (#109653)
Handle the "output memory allocator bytes" field if and only if it is present in the model size stats, as reported by the C++ backend.

This PR _must_ be merged prior to the corresponding `ml-cpp` one, to keep CI tests happy.
2024-06-18 15:25:05 +12:00
Benjamin Trent
acc99302c6
Adding hamming distance function to painless for dense_vector fields (#109359)
This adds `hamming` distances, the pop-count of `xor` byte vectors as a
first class citizen in painless. 

For byte vectors, this means that we can compute hamming distances via
script_score (aka, brute-force).

The implementation of `hamming` is the same that is available in Lucene,
and when lucene 9.11 is merged, we should update our logic where
applicable to utilize it.

NOTE: this does not yet add hamming distance as a metric for indexed
vectors. This will be a future PR after the Lucene 9.11 upgrade.
2024-06-18 03:41:20 +10:00
Kathleen DeRusso
8529bf71f6
Add SparseVectorStats (#108793)
* Add SparseVectorStats

* Update to use mappings in engine

* Update to be unique to primary shards

* Fix doc

* Fix null error in test

* Cleanup

* fix yaml

* remove comment

* add version to yaml

* Revert whitespace changes to stats doc

* fix yml test

* Checkstyle

* Fix NPE in test

* Update docs/changelog/108793.yaml

* Add link to sparse_vector field type in docs

* PR feedback

* Flesh out test a bit more

* PR feedback - alphabetize placement in docs

* Fix doc change
2024-06-17 11:42:14 -04:00
shainaraskas
c97be9cbc7
rm remaining dsl technical preview notice (#109810) 2024-06-17 10:38:19 -04:00
Benjamin Trent
3aed0afb2b
Add new int4 quantization to dense_vector (#109317)
This adds a new quantization mechanism for HNSW and flat indices. Here
we add `int4` quantization via the `int4_hnsw` and `int4_flat` index
types. This quantization methodology further reduces the memory required
for fast HNSW, meaning that the memory required is 8x smaller than with
regular float32 values. 

8x reduction means that 1M 1024 dimension vectors goes from requiring
3.8GB to 477MB.

Recall continues to stay steady, there is some reduction that is
recoverable via slightly oversampling and reranking. For example over
500k CohereV3 vectors, only 5 extra vectors are required to be gathered
to achieve over 0.98 recall in a brute-force scenario.

![recall](b47a79d0-020d-4baa-8199-41a932df00f7)
2024-06-18 00:15:43 +10:00
David Turner
0131e80624 Revert "(+Doc) link split-brain wiki from quorom decision making (#108915)"
This reverts commit 4d3ca2d029.
2024-06-16 08:54:44 +01:00
Nick Tindall
cd8b1f9dc9
Add wait_for_completion parameter to delete snapshot request (#109462)
Closes #101300
2024-06-15 12:27:35 +10:00
Alexander Reelsen
4de67ad7f0
DocsStats: Add human readable bytesize (#109720)
This adds support for the `human` parameter for DocsStats, as it was
missing. Sample

```
GET _cluster/stats?human&filter_path=indices.docs
```
2024-06-15 08:20:04 +10:00
Nik Everett
2aade9dd66
ESQL: Warn about division (#109716)
When you divide two integers or two longs we round towards 0. Like
Postgres or Java or Rust or C. Other systems, like MySQL or SPL or
Javascript or Python always produce a floating point number. We should
warn folks about this. It's genuinely unexpected for some folks. OTOH,
converting into a floating point number would be unexpected for other
folks. Oh well, let's document what we've got.
2024-06-14 08:36:27 -04:00
Carlos Delgado
d10dfb4ac5
Add limitations section to semantic_text field type docs (#109666) 2024-06-13 15:19:00 +02:00
Albert Zaharovits
0e4888bdec
Refactor field name translator of query endpoints for security entities (#109559)
This is a refactoring of the internal logic that's used to translate
query-level into index-level field names for query APIs for
security entities (i.e. users, API Keys, and soon, roles).
The objective here is to have and reuse a single class to handle
all the translations for different security query APIs.
2024-06-13 14:12:19 +03:00
elasticsearchmachine
98d2f75564
Forward port release notes for v8.14.1 (#109641) 2024-06-12 16:27:51 -04:00
Oleksandr Kolomiiets
c847235ed0
Support synthetic source for scaled_float and unsigned_long when ignore_malformed is used (#109506) 2024-06-12 11:05:23 -07:00
shainaraskas
900eb82c99
[DOCS] Address local vs. remote storage + shard limits feedback (#109360) 2024-06-12 13:50:23 -04:00
Luigi Dell'Aquila
47edae4fbd
ES|QL: reduce memory footprint for MvAppendTests with shapes (#109517)
Fixing MvAppendTests CB exceptions by generating smaller geometries: the
test generates a lot of documents and the CB is too small for multiple
big shapes.

Fixes https://github.com/elastic/elasticsearch/issues/109409
2024-06-13 02:44:49 +10:00
Benjamin Trent
fdd183ddbd Merge branch 'lucene_snapshot_9_11' 2024-06-12 10:51:02 -04:00
David Turner
366c0b16bf
Add docs on HTTP client config (#109543)
Some notes and recommendations on timeouts and TCP keepalives.

Relates INC-1049
2024-06-12 14:54:54 +01:00
Jonathan Buttner
6a1ece0c06
Adding input type to docs (#109588) 2024-06-12 09:15:08 -04:00
Benjamin Trent
08298dcd69 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-12 08:05:36 -04:00
Liam Thompson
394d2b09a6
Revert "[DOCS] Remove ESQL demo env link from 8.14+ (#109562)" (#109579)
This reverts commit 0480c1acba.
2024-06-11 17:04:37 +02:00
Nik Everett
c888e5f4cd
ESQL: Run LOOKUP docs test only in SNAPSHOT (#109493)
LOOKUP is only registered on SNAPSHOT builds.

closes #109478
2024-06-11 23:27:22 +10:00
Nik Everett
c6fe3c3efe
ESQL: Improve syntax for LOOKUP tables (#109489)
Replace the syntax for `tables` with something a little more natural.

Now it is:

```
$ curl -uelastic:password -HContent-Type:application/json -XPOST \
    'localhost:9200/_query?error_trace&pretty&format=txt' \
-d'{
    "query": "ROW a=1::LONG | LOOKUP t ON a",
    "tables": {
        "t": {
            "a": {"long":     [    1,     4,     2]},
            "v1": {"integer": [   10,    11,    12]},
            "v2": {"keyword": ["cat", "dog", "wow"]}
        }
    }
}'
      v1       |      v2       |       a
---------------+---------------+---------------
10             |cat            |1
```
2024-06-11 23:26:04 +10:00
Liam Thompson
d6fb5cfbe6
[DOCS] Expand context about xpack.security.enabled setting (#109575) 2024-06-11 14:59:40 +02:00
Benjamin Trent
29288d6590 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-11 06:54:23 -04:00
Liam Thompson
0480c1acba
[DOCS] Remove ESQL demo env link from 8.14+ (#109562) 2024-06-11 11:24:52 +02:00
Carlos Delgado
d975997a3a
Add semantic-text warning about inference endpoints removal (#109561) 2024-06-11 18:33:25 +10:00
Oleksandr Kolomiiets
a9f31bd2aa
Support synthetic source for date fields when ignore_malformed is used (#109410) 2024-06-10 10:26:31 -07:00
Karen Metts
f4d87e0f25
[DOCS] Add note that Logstash sets up data streams (#109502) 2024-06-10 12:24:23 -04:00
Jean-Fabrice Bobo
a9bc30d66e
FIx misleading repository-s3 type (#109347)
in 8.x, `repository-s3` type has been replaced by `s3` type. Fixing
remaining reference to `repository-s3` in the documentation.
2024-06-11 01:51:55 +10:00
Alexander Reelsen
7cba6c8c16
Docs: Fix available update by query operations (#109486) 2024-06-10 15:57:56 +02:00
Oleksandr Kolomiiets
eedc2b9354
Fix typo in TSDB documentation (#109504) 2024-06-10 06:24:05 -07:00
Carlos Delgado
4d3f9f2fb9
Fix RRF example for semantic query (#109516)
Follow up to https://github.com/elastic/elasticsearch/pull/109433, fix
appropriately this time the semantic query example with RRF.
2024-06-10 17:59:13 +10:00
David Turner
683245e41e
Detect long-running tasks on network threads (#109204)
This commit introduces a watchdog timer to monitor for long-running
tasks on network threads. If a network thread is active and has not made
progress for two consecutive ticks of the timer then the watchdog logs a
warning and a thread dump.
2024-06-10 17:47:40 +10:00
Luigi Dell'Aquila
3d0c65d0c5
ES|QL: add tests for COALESCE() function on VERSION type (#109468) 2024-06-07 18:01:42 +02:00
Benjamin Trent
a5fbfe81b2 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-07 07:24:43 -04:00
Panagiotis Bailis
1c3b3d8f11
Adding support for explain in rrf (#108682) 2024-06-07 11:09:06 +03:00
Nik Everett
7916e6a231
ESQL: Implement LOOKUP, an "inline" enrich (#107987)
This adds support for `LOOKUP`, a command that implements a sort of
inline `ENRICH`, using data that is passed in the request:

```
$ curl -uelastic:password -HContent-Type:application/json -XPOST \
    'localhost:9200/_query?error_trace&pretty&format=txt' \
-d'{
    "query": "ROW a=1::LONG | LOOKUP t ON a",
    "tables": {
        "t": {
            "a:long":     [    1,     4,     2],
            "v1:integer": [   10,    11,    12],
            "v2:keyword": ["cat", "dog", "wow"]
        }
    },
    "version": "2024.04.01"
}'
      v1       |      v2       |       a       
---------------+---------------+---------------
10             |cat            |1
```

This required these PRs: * #107624 * #107634 * #107701 * #107762 *
#107923 * #107894 * #107982 * #108012 * #108020 * #108169 * #108191 *
#108334 * #108482 * #108696 * #109040 * #109045

Closes #107306
2024-06-07 11:38:51 +10:00
Benjamin Trent
d3561f9cf3 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-06 18:22:08 -04:00
István Zoltán Szabó
d89dae2a32
[DOCS] Modifies semantic search-related docs to refer to the semantic_text workflow (#109418)
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
2024-06-06 16:45:46 +02:00
Carlos Delgado
d4d5d9320c
Fix semantic_text retrievers docs example (#109433) 2024-06-06 16:31:12 +02:00
elasticsearchmachine
f8291f8e83
Forward port release notes for v8.14.0 (#109403) 2024-06-05 14:52:38 -04:00
Lorenzo Verardo
02a6c831e1
Limit the value in prefix query (#108537)
Reuse the setting index.max_regex_length for the max length in a prefix query.

Closes #108486
2024-06-05 14:51:07 -04:00
Benjamin Trent
ac53d6020b Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-05 12:38:23 -04:00
Mark J. Hoy
80a22ec046
[Inference API] Add Docs for Mistral Embedding Support for the Inference API (#109319)
* Initial docs for put-inference for Mistral

* adds mistral embeddings to tutorial; add changelog

* update mistral text and dimensions

* fix mistral spelling error

* fix azure AI studio; fix Mistral label

* fix auto-formatted items

* change pipeline button back to azure openai

* put proper Azure AI Studio include in

* fix missing azure-openai; fix huggingface hidden

* fix mistral tab for reindex

* re-add Mistral service settings to put inference
2024-06-05 11:23:29 -04:00