Commit graph

920 commits

Author SHA1 Message Date
Parker Timmins
617ca837f7
[8.18] Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600) (#129794)
* Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600)

There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.

(cherry picked from commit 245dc0775a)

* remove methods not avaiable in java version

* skip testing console-result in docs
2025-06-21 08:06:49 +10:00
Peter Titov
73c56a5812
Update constant-keyword.asciidoc (#129053)
Change the field name `date` to `@timestamp` so that users will be able to follow along with documentation.  If not, then the date field is mapped as a keyword, which confuses users.
2025-06-06 15:36:13 +02:00
Marci W
7e2f23b662
Remove stale synthetic source tech preview note (#128982) (#128998)
(cherry picked from commit b88839c548)

# Conflicts:
#	docs/reference/mapping/types/text.asciidoc
2025-06-06 02:19:30 +10:00
Mike Pellegrini
67bd512908
Mark semantic text inference_id param as optional (#127587) (#127602) 2025-05-01 22:53:32 +10:00
Liam Thompson
6b3286fdbb
[DOCS][8.x] Clarify update behavior for indices with semantic_text fields, flag CCS/CCR limitation (#127319) (#127340) 2025-04-25 02:30:24 +10:00
Larisa Motova
0442be3000
Update keyword documentation for logsdb (#126652) (#126661)
This commit adds a note that ignore_above has a different limit for
logsdb indices to the documentation. Also specifies that ignore_above
applies to all types of the keyword family.

Relates https://github.com/elastic/sdh-elasticsearch/issues/8892
2025-04-11 18:17:09 +10:00
Mike Pellegrini
c4fc197180
[8.18] [8.x] Mark semantic text as GA in docs (#124670) (#125302)
* [8.x] Mark semantic text as GA in docs (#124670)

* Update docs/changelog/125302.yaml

* Remove extra changelog
2025-03-20 23:51:21 +11:00
Kostas Krikellas
1ca82c8603
[8.x][DOCS] Document source-related restrictions (#124317) (#124324)
* Update synthetic-source.asciidoc

* Update source-field.asciidoc
2025-03-08 01:43:32 +11:00
Marci W
174195563a
Update doc-values.asciidoc (#123309) 2025-02-25 08:33:24 -05:00
Jim Ferenczi
0db2f0a027
Enable Mapped Field Types to Override Default Highlighter (#121176) (#121237)
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
2025-01-30 20:35:43 +11:00
Benjamin Trent
4d8cdba39f
Mark bbq indices as GA and add rolling upgrade integration tests (#121105) (#121190)
With the introduction of our new backing algorithm and making rescoring
easier with the `rescore_vector` API, let's mark bbq as GA. 

Additionally, this commit adds rolling upgrade tests to ensure
stability.
2025-01-30 03:05:43 +11:00
Kathleen DeRusso
3bddff3019
Fix error in docs code snippet (#121187) (#121191) 2025-01-30 02:32:17 +11:00
Oleksandr Kolomiiets
f73084cfca
Fix typo in synthetic source docs (#120685) (#120735) 2025-01-24 03:18:48 +11:00
Jim Ferenczi
b02de7e387
Add Multi-Field Support for Semantic Text Fields (#120128) (#120558)
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.

Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
2025-01-22 09:19:00 +11:00
Liam Thompson
891408381b
[DOCS] Updated wording for clarity for new users (#120257) (#120508)
Co-authored-by: Kofi B <kofi.bartlett@elastic.co>
2025-01-21 20:33:30 +11:00
Carlos Delgado
9362cafcf3
[Docs] kNN vector rescoring for quantized vectors (#118425) (#120407) 2025-01-18 03:30:11 +11:00
Kathleen DeRusso
13c4f5d593
[Docs] Add docs for new semantic text query functionality (#119520) (#119883)
* Update docs with new semantic text functionality

* PR feedback

* PR feedback

* PR Feedback
2025-01-10 03:39:00 +11:00
Benjamin Trent
8555b350cc
[8.x] Add new experimental rank_vectors mapping for late-interaction second order ranking (#118804) (#119601)
* Add new experimental rank_vectors mapping for late-interaction second order ranking (#118804)

Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.


This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models. 

For example, this is how you would use it via the API:

```
PUT index
{
  "mappings": {
    "properties": {
      "late_interaction_vectors": {
        "type": "rank_vectors"
      }
    }
  }
}
```

Then to index:

```
POST index/_doc
{
  "late_interaction_vectors": [[0.1, ...],...]
}
```

For querying, scoring can be exposed with scripting:

```
POST index/_search
{
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "maxSimDotProduct(params.query_vector, 'my_vector')",
        "params": {
          "query_vector": [[0.42, ...], ...]
        }
      }
    }
  }
}
```

Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.

* Update docs/changelog/119601.yaml
2025-01-07 05:19:38 +11:00
Mayya Sharipova
c341d73f89
[DOCS] _index_prefix for highligh matched_fields (#118569) (#118580)
Enhance documenation to explain that "_index_prefix" subfield must
be added to `matched_fields` param for highlighting a main field.
When doing prefix queries on fields that are indexed with prefixes,
"_index_prefix" subfield is used. If we try to highlight the main
field, we may not get any results. "_index_prefix" subfield must
be added to `matched_fields` which instructs ES to use matches
from "_index_prefix" to highlight the main field.
2024-12-13 03:27:56 +11:00
shainaraskas
9018bcd88b
Update and edit logsdb docs for logsdb / synthetic source GA (#118303) (#118505)
* Update licensing; fix screenshots; edit generally

* Small edit for clarity and style

* Update docs/reference/index-modules.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Apply changes from review

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Address review comments

* Match similar change from review

* More changes from review

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Update docs/reference/data-streams/logs.asciidoc

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>

* Apply suggestions from review

* Change to general subscription note

* Apply suggestions from review

Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>

* Apply suggestions from review

Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>

* Apply suggestions from review; additional edits

* Apply suggestions from review; clarity tweaks

* Restore previous paragraph structure and context

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>
(cherry picked from commit ae9bb90fd1)

Co-authored-by: Marci W <333176+marciw@users.noreply.github.com>
2024-12-11 21:39:35 +01:00
kosabogi
4f21c62ce4
[DOCS] [8.17] Adds new default inference endpoint information (#117985) (#118240)
* Adds new default inference information

* Update docs/reference/mapping/types/semantic-text.asciidoc



* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc



* Update docs/reference/mapping/types/semantic-text.asciidoc



---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-12-09 19:34:29 +11:00
Jim Ferenczi
fe4f510e04
[8.x] Add Highlighter for Semantic Text Fields (#118064) (#118185)
* Add Highlighter for Semantic Text Fields (#118064)

This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.

* Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
2024-12-07 07:17:38 +11:00
Jim Ferenczi
e1304593b2
Add option to store sparse_vector outside _source (#117917) (#118018)
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
2024-12-05 09:55:29 +11:00
Carlos Delgado
dbb1bc27bf
Mention bbq_hnsw for m and ef_construction options in docs (#117022) (#117481)
Co-authored-by: Philippus Baalman <philippus@gmail.com>
2024-11-26 01:16:30 +11:00
István Zoltán Szabó
bdab3e65af
[DOCS] Documents that ELSER is the default service for semantic_text (#115769) (#117471) 2024-11-26 00:35:39 +11:00
Liam Thompson
0cb702c539
fix formatting errors (#116843) (#116911)
(cherry picked from commit 2d2ad00872)

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-11-18 19:41:57 +11:00
kosabogi
e1af8ccc82
Updates chunk settings documentation (#116719) (#116722)
(cherry picked from commit bada2a60ed)
2024-11-14 00:43:35 +11:00
István Zoltán Szabó
e1bb8f89b6
Revert "[DOCS] Documents that ELSER is the default service for `semantic_text…" (#115748) (#115767)
This reverts commit 541bcf30e5.
2024-10-29 00:59:14 +11:00
shainaraskas
9506d46815
Make a minor change to trigger release note process (#113975) (#115592)
* changelog entry

(cherry picked from commit 97ed0a93bb)
2024-10-25 07:42:59 +11:00
István Zoltán Szabó
4f3de8344b
[DOCS] Resolves conflict. (#115503) 2024-10-24 11:08:18 +02:00
István Zoltán Szabó
af19586f8c
[DOCS] Documents that dynamic templates are not supported by semantic_text. (#115195) (#115202) 2024-10-21 22:22:10 +11:00
Salvatore Campagna
cb7f9b7be0
Update synthetic source documentation (#112363) (#115097) 2024-10-18 14:40:46 +02:00
Salvatore Campagna
90e16e7e87
Replace usages of _source.mode in documentation (#114743) (#114917)
We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.

(cherry picked from commit f6a1e36d6b)
2024-10-16 16:51:48 +02:00
Kostas Krikellas
9ca549cacc
Fix minor formatting issue (#114815) (#114822)
The list with two options doesn't get rendered as a list, due to the
snippet in between.

https://www.elastic.co/guide/en/elasticsearch/reference/master/passthrough.html#passthrough-conflicts
(cherry picked from commit 8cf2cb35f6)
2024-10-16 00:05:30 +11:00
Kostas Krikellas
79580869a8
Add documentation for passthrough field type (#114720) (#114809)
* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
(cherry picked from commit 4d775cba4f)
2024-10-15 22:25:49 +11:00
Benjamin Trent
64e8f2ac9c
[8.x] Adding new bbq index types behind a feature flag (#114439) (#114783)
* Adding new bbq index types behind a feature flag (#114439)

new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.

(cherry picked from commit 6c752abc23)

* spotless
2024-10-15 07:04:19 -04:00
Simon Cooper
1b01548425
[8.16] Change default locale of date mappers to ENGLISH (#112799) (#114210)
Backport #112799 to 8.16, for CLDR locale compatibility
2024-10-07 15:51:36 +01:00
Liam Thompson
6e9ac4d0d7
[DOCS] Lookup runtime fields are now GA (#114221) (#114229)
(cherry picked from commit 1292580c03)
2024-10-08 00:20:27 +11:00
Kostas Krikellas
faaf4ba7fd
[8.x] Add object param for keeping synthetic source (#113690) (#114058)
* Add object param for keeping synthetic source (#113690)

* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913eee.

* Revert "fix bwc"

This reverts commit f4ddb0e5e5.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform

(cherry picked from commit dd2024881d)

# Conflicts:
#	rest-api-spec/build.gradle

* Update build.gradle

* Update MapperFeatures.java

* Update 20_synthetic_source.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml

* Update 21_synthetic_source_stored.yml
2024-10-04 08:08:44 +10:00
István Zoltán Szabó
b507537bf0
[DOCS] Expands param descriptions for semantic_text (#114024) (#114055)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-10-04 04:13:11 +10:00
john-wagster
8a8ad1b815
updated rangetype to be more inline with the docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html) and added tests to reflect as much (#113872) 2024-10-02 01:40:55 +10:00
Simon Cooper
53d9c3cc6a
Add some information on locale database to the ES docs (#113587) 2024-09-30 09:28:13 +01:00
Kostas Krikellas
7b3d726eca
Revert "Apply auto-flattening to subobjects: auto (#112092)" (#113692) (#113760)
* Revert "Apply auto-flattening to `subobjects: auto` (#112092)"

This reverts commit fffe8844

* fix DataGenerationHelper

(cherry picked from commit c9f378da29)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java
2024-09-30 18:19:26 +10:00
István Zoltán Szabó
cf55728d77
[DOCS] Improves semantic text documentation. (#113606) (#113611) 2024-09-27 00:34:37 +10:00
Kostas Krikellas
8539876663
[8.x] Apply auto-flattening to subobjects: auto (#113584)
* Apply auto-flattening to `subobjects: auto` (#112092)

* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
(cherry picked from commit fffe8844e9)

# Conflicts:
#	modules/dot-prefix-validation/build.gradle
#	rest-api-spec/build.gradle

* Update build.gradle
2024-09-26 20:17:11 +10:00
Salvatore Campagna
bac208a154
Introduce an ignore_above index-level setting (#113121) (#113414)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.

(cherry picked from commit 208a1fe571)
2024-09-24 06:16:08 +10:00
Felix Barnsteiner
0aebbb53d6
[8.x] Add support for multi-value dimensions (#112645) (#113369)
* Add support for multi-value dimensions (#112645)

Closes https://github.com/elastic/elasticsearch/issues/110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cbf7a)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java

* Remove skip test for 8.x

This was just needed for 8.x to 9.0 compatibility tests
2024-09-24 00:05:25 +10:00
Stef Nestor
a4dba7db8d
(Doc+) Sparse Vectors NA to mapping analyzers (#112523)
* retry
2024-09-05 09:19:19 -06:00
Simon Cooper
a36d90cf34
Use CLDR locale provider on JDK 23+ (#110222)
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.

This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.
2024-09-04 13:42:40 +01:00
Ignacio Vera
3747765ab8
[DOC] geo_shape field type supports geo_hex aggregation (#112448) 2024-09-04 11:12:11 +02:00