* Make flattened synthetic source concatenate object keys on scalar/object mismatch (#129600)
There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source. This fixes the issue by replacing the object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by a period. For example, they are of the form foo.bar.baz. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.
(cherry picked from commit 245dc0775a)
* remove methods not avaiable in java version
* skip testing console-result in docs
Change the field name `date` to `@timestamp` so that users will be able to follow along with documentation. If not, then the date field is mapped as a keyword, which confuses users.
This commit adds a note that ignore_above has a different limit for
logsdb indices to the documentation. Also specifies that ignore_above
applies to all types of the keyword family.
Relates https://github.com/elastic/sdh-elasticsearch/issues/8892
This commit introduces the `MappedFieldType#getDefaultHighlighter`, allowing a specific highlighter to be enforced for a field.
The semantic field mapper utilizes this new functionality to set the `semantic` highlighter as the default.
All other fields will continue to use the `unified` highlighter by default.
With the introduction of our new backing algorithm and making rescoring
easier with the `rescore_vector` API, let's mark bbq as GA.
Additionally, this commit adds rolling upgrade tests to ensure
stability.
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.
Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
* Add new experimental rank_vectors mapping for late-interaction second order ranking (#118804)
Late-interaction models are powerful rerankers. While their size and
overall cost doesn't lend itself for HNSW indexing, utilizing them as
second order "brute-force" reranking can provide excellent boosts in
relevance. At generally lower inference times than large cross-encoders.
This commit exposes a new experimental `rank_vectors` field that allows
for maxSim operations. This unlocks the initial, and most common use of
late-interaction dense-models.
For example, this is how you would use it via the API:
```
PUT index
{
"mappings": {
"properties": {
"late_interaction_vectors": {
"type": "rank_vectors"
}
}
}
}
```
Then to index:
```
POST index/_doc
{
"late_interaction_vectors": [[0.1, ...],...]
}
```
For querying, scoring can be exposed with scripting:
```
POST index/_search
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "maxSimDotProduct(params.query_vector, 'my_vector')",
"params": {
"query_vector": [[0.42, ...], ...]
}
}
}
}
}
```
Of course, the initial ranking should be done before re-scoring or
combining via the `rescore` parameter, or simply passing whatever first
phase retrieval you want as the inner query in `script_score`.
* Update docs/changelog/119601.yaml
Enhance documenation to explain that "_index_prefix" subfield must
be added to `matched_fields` param for highlighting a main field.
When doing prefix queries on fields that are indexed with prefixes,
"_index_prefix" subfield is used. If we try to highlight the main
field, we may not get any results. "_index_prefix" subfield must
be added to `matched_fields` which instructs ES to use matches
from "_index_prefix" to highlight the main field.
* Adds new default inference information
* Update docs/reference/mapping/types/semantic-text.asciidoc
* Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc
* Update docs/reference/mapping/types/semantic-text.asciidoc
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>
* Add Highlighter for Semantic Text Fields (#118064)
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields.
It extracts the most relevant fragments by scoring nested chunks using the original semantic query.
In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
* Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
This PR introduces an option for `sparse_vector` to store its values separately from `_source` by using term vectors.
This capability is primarly needed by the semantic text field.
We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.
(cherry picked from commit f6a1e36d6b)
* Adding new bbq index types behind a feature flag (#114439)
new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.
(cherry picked from commit 6c752abc23)
* spotless
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
(cherry picked from commit 208a1fe571)
* Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387
Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
(cherry picked from commit 8d223cbf7a)
# Conflicts:
# server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
* Remove skip test for 8.x
This was just needed for 8.x to 9.0 compatibility tests
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.
This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.