Commit graph

913 commits

Author SHA1 Message Date
Jim Ferenczi
dbeb55cb3d
Enable Mapped Field Types to Override Default Highlighter (#121176)
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
2025-01-29 21:55:53 +00:00
Kathleen DeRusso
4b4c59de7f
Fix error in docs code snippet (#121187) 2025-01-29 16:05:05 +01:00
Benjamin Trent
038aab864e
Mark bbq indices as GA and add rolling upgrade integration tests (#121105)
With the introduction of our new backing algorithm and making rescoring
easier with the `rescore_vector` API, let's mark bbq as GA. 

Additionally, this commit adds rolling upgrade tests to ensure
stability.
2025-01-30 01:58:08 +11:00
Oleksandr Kolomiiets
cdff3defde
Fix typo in synthetic source docs (#120685) 2025-01-23 07:51:58 -08:00
Jim Ferenczi
1db194df22
Add Multi-Field Support for Semantic Text Fields (#120128)
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.

Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
2025-01-21 22:01:11 +01:00
Liam Thompson
18b281ea16
[DOCS] Updated wording for clarity for new users (#120257) (#120507)
Co-authored-by: Kofi B <kofi.bartlett@elastic.co>
2025-01-21 20:32:20 +11:00
Carlos Delgado
aea4853069
[Docs] kNN vector rescoring for quantized vectors (#118425) 2025-01-17 17:02:09 +01:00
Kathleen DeRusso
c7ec808f45
[Docs] Add docs for new semantic text query functionality (#119520)
* Update docs with new semantic text functionality

* PR feedback

* PR feedback

* PR Feedback
2025-01-09 11:11:20 -05:00
Benjamin Trent
a5716c8f99
Add new experimental rank_vectors mapping for late-interaction second order ranking (#118804)
Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.


This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models. 

For example, this is how you would use it via the API:

```
PUT index
{
  "mappings": {
    "properties": {
      "late_interaction_vectors": {
        "type": "rank_vectors"
      }
    }
  }
}
```

Then to index:

```
POST index/_doc
{
  "late_interaction_vectors": [[0.1, ...],...]
}
```

For querying, scoring can be exposed with scripting:

```
POST index/_search
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "maxSimDotProduct(params.query_vector, 'my_vector')",
        "params": {
          "query_vector": [[0.42, ...], ...]
        }
      }
    }
  }
}
```

Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.
2025-01-07 04:06:59 +11:00
Mayya Sharipova
b460f081c2
[DOCS] _index_prefix for highligh matched_fields (#118569)
Enhance documenation to explain that "_index_prefix" subfield must
be added to `matched_fields` param for highlighting a main field.
When doing prefix queries on fields that are indexed with prefixes,
"_index_prefix" subfield is used. If we try to highlight the main
field, we may not get any results. "_index_prefix" subfield must
be added to `matched_fields` which instructs ES to use matches
from "_index_prefix" to highlight the main field.
2024-12-12 10:24:55 -05:00
Marci W
ae9bb90fd1
Update and edit logsdb docs for logsdb / synthetic source GA (#118303)
* Update licensing; fix screenshots; edit generally

* Small edit for clarity and style

* Update docs/reference/index-modules.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Apply changes from review

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Address review comments

* Match similar change from review

* More changes from review

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Update docs/reference/data-streams/logs.asciidoc

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

* Change to general subscription note

* Apply suggestions from review

Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>

* Apply suggestions from review

Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>

* Apply suggestions from review; additional edits

* Apply suggestions from review; clarity tweaks

* Restore previous paragraph structure and context

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>
2024-12-11 13:24:24 -05:00
Benjamin Trent
645657cc56
Remove old _knn_search tech preview API in v9 (#118104)
Removes the old `_knn_search` API that was never out of tech preview and
deprecated throughout the v8 cycle. 

To utilize the API, `compatible-with=8` can be utilized.
2024-12-11 02:01:25 +11:00
kosabogi
b2b8e3f762
[DOCS] [8.17] Adds new default inference endpoint information (#117985)
* Adds new default inference information

* Update docs/reference/mapping/types/semantic-text.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/mapping/types/semantic-text.asciidoc

Co-authored-by: David Kyle <david.kyle@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-12-09 09:05:11 +01:00
Jim Ferenczi
c580024ea9
Add Highlighter for Semantic Text Fields (#118064)
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
2024-12-06 18:42:50 +00:00
Jim Ferenczi
0901a2734e
Add option to store sparse_vector outside _source (#117917)
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
2024-12-04 17:29:46 +00:00
Philippus Baalman
fd6e8857bc
Mention bbq_hnsw for m and ef_construction options in docs (#117022) 2024-11-25 14:50:09 +01:00
István Zoltán Szabó
339e431081
[DOCS] Documents that ELSER is the default service for semantic_text (#115769) 2024-11-25 08:07:30 -05:00
shainaraskas
2d2ad00872
fix formatting errors (#116843) 2024-11-14 15:45:16 -05:00
kosabogi
bada2a60ed
Updates chunk settings documentation (#116719) 2024-11-13 14:14:56 +01:00
István Zoltán Szabó
4058daf8b2
Revert "[DOCS] Documents that ELSER is the default service for `semantic_text…" (#115748)
This reverts commit 541bcf30e5.
2024-10-28 14:31:42 +01:00
shainaraskas
97ed0a93bb
Make a minor change to trigger release note process (#113975)
* changelog entry
2024-10-24 13:26:15 -04:00
István Zoltán Szabó
541bcf30e5
[DOCS] Documents that ELSER is the default service for semantic_text (#114615)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-10-24 08:53:12 +02:00
Salvatore Campagna
f32051f462
fix: use setting instead of (#115193) 2024-10-22 11:09:19 +02:00
István Zoltán Szabó
1cae3c8361
[DOCS] Documents that dynamic templates are not supported by semantic_text. (#115195) 2024-10-21 12:51:10 +02:00
Salvatore Campagna
ebd363d4af
Update synthetic source documentation (#112363)
* docs: update synthetic source docs

* fix: also doc values false works

* Revert "fix: also doc values false works"

This reverts commit 0895a76758.

* fix: update synthetic source documentation

* fix: all field types support it

* fix: no need to explicitly mention it

* fix: synthetic source sorting

* fix: may instead of might
2024-10-18 13:48:32 +02:00
Salvatore Campagna
f6a1e36d6b
Replace usages of _source.mode in documentation (#114743)
We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.
2024-10-16 16:17:41 +02:00
Kostas Krikellas
8cf2cb35f6
Fix minor formatting issue (#114815)
The list with two options doesn't get rendered as a list, due to the
snippet in between.

https://www.elastic.co/guide/en/elasticsearch/reference/master/passthrough.html#passthrough-conflicts
2024-10-15 23:39:33 +11:00
Kostas Krikellas
4d775cba4f
Add documentation for passthrough field type (#114720)
* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-10-15 12:05:02 +02:00
Benjamin Trent
6c752abc23
Adding new bbq index types behind a feature flag (#114439)
new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.
2024-10-14 20:13:27 -04:00
Liam Thompson
1292580c03
[DOCS] Lookup runtime fields are now GA (#114221) 2024-10-07 14:52:42 +02:00
Simon Cooper
4ef5ea6d1c
Change default locale of date mappers to ENGLISH (#112799)
English is not changing between COMPAT and CLDR locale databases, whereas ROOT is
2024-10-07 10:22:38 +01:00
Kostas Krikellas
dd2024881d
Add object param for keeping synthetic source (#113690)
* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913eee.

* Revert "fix bwc"

This reverts commit f4ddb0e5e5.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform
2024-10-03 21:19:04 +03:00
István Zoltán Szabó
b9adc701fa
[DOCS] Expands param descriptions for semantic_text (#114024)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-10-03 19:48:16 +02:00
Simon Cooper
31d50eed0f
Update 9.0 with various locale changes from 8.x (#113787) (#113870)
Forward-port changes from #113787, and update the docs with similar information to #113587
2024-10-02 11:41:33 +01:00
john-wagster
0fbb3bcb45
Updated Date Range to Follow Documentation When Assuming Missing Values (#112258)
* updated rangetype to be more inline with the docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html) and added tests to reflect as much
2024-10-01 09:21:47 -05:00
Kostas Krikellas
c9f378da29
Revert "Apply auto-flattening to subobjects: auto (#112092)" (#113692)
* Revert "Apply auto-flattening to `subobjects: auto` (#112092)"

This reverts commit fffe8844

* fix DataGenerationHelper
2024-09-30 10:11:15 +03:00
István Zoltán Szabó
5e019998ef
[DOCS] Improves semantic text documentation. (#113606) 2024-09-26 16:09:28 +02:00
Kostas Krikellas
fffe8844e9
Apply auto-flattening to subobjects: auto (#112092)
* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-09-26 11:42:40 +03:00
Salvatore Campagna
208a1fe571
Introduce an ignore_above index-level setting (#113121)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
2024-09-23 18:05:02 +02:00
Felix Barnsteiner
8d223cbf7a
Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
2024-09-23 17:31:18 +10:00
Stef Nestor
a4dba7db8d
(Doc+) Sparse Vectors NA to mapping analyzers (#112523)
* retry
2024-09-05 09:19:19 -06:00
Simon Cooper
a36d90cf34
Use CLDR locale provider on JDK 23+ (#110222)
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.

This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.
2024-09-04 13:42:40 +01:00
Ignacio Vera
3747765ab8
[DOC] geo_shape field type supports geo_hex aggregation (#112448) 2024-09-04 11:12:11 +02:00
István Zoltán Szabó
2c29a3ae0a
[DOCS] Highlights auto-chunking in intro of semantic text. (#111836) 2024-08-29 12:43:10 +02:00
Liam Thompson
4034615e29
[DOCS] Clarify copy_to behavior with strict dynamic mappings (#111408)
* [DOCS] Clarify copy_to behavior with strict dynamic mappings

* Add id

* De-verbosify

* Delete pesky comma

* More info about root and nest

* Fixes per review, clarify non-recursive explanation

* Skip tests for illustrative example

* Fix example syntax

* Fix typo
2024-08-01 14:37:17 +02:00
Felix Barnsteiner
3090438037
Add support for boolean dimensions (#111457)
Closes #111338
2024-07-31 23:00:32 +10:00
István Zoltán Szabó
1a5b008921
[DOCS] Clarifies semantic query behavior on sparse and dense vector fields (#111339)
* [DOCS] Clarifies semantic query behavior on sparse and dense vector fields.

* [DOCS] Adds a NOTE to the semantic query docs.
2024-07-26 16:53:38 +02:00
Carlos Delgado
ff3a77ca46
Clarify some semantic_text docs (#111329) 2024-07-26 16:45:29 +02:00
István Zoltán Szabó
22ead8d106
[DOCS] Documents automatic text chunking behavior for semantic text. (#111331) 2024-07-26 12:02:47 +02:00
Tommaso Teofili
9b86fd17aa
Document how to update dense vector field type (#111038) 2024-07-23 09:55:31 +02:00