* Parse the contents of dynamic objects for [subobjects:false]
* Update docs/changelog/117762.yaml
* add tests
* tests
* test dynamic field
* test dynamic field
* fix tests
(cherry picked from commit f2addbc69a)
# Conflicts:
# server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
This reverts #117106. Bwc tests fail, because older nodes are killed with the following error:
```
[2024-11-20T10:54:58,600][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v8.17.0-0] fatal error in thread [elasticsearch[v8.17.0-0
][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"_data_stream_timestamp":{"enabled":true},"_source":{},"properties":{"@timestamp":{"type":"date"},"k8s":{"properties":{"pod":{"properties":{"ip":{"type":"ip"},"name":{"type":"keyword"},"network":{"properties":{"rx":{"type":"long"},"tx":{"type":"long"}}},"uid":{"type":"keyword","time_series_dimension":true}}}}},"metricset":{"type":"keyword","time_series_dimension":true}}}}] differs from mapping [{"_doc":{"_data_stream_timestamp":{"enabled":true},"_source":{"mode":"synthetic"},"properties":{"@timestamp":{"type":"date"},"k8s":{"properties":{"pod":{"properties":{"ip":{"type":"ip"},"name":{"type":"keyword"},"network":{"properties":{"rx":{"type":"long"},"tx":{"type":"long"}}},"uid":{"type":"keyword","time_series_dimension":true}}}}},"metricset":{"type":"keyword","time_series_dimension":true}}}}]
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:66)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.newDocumentMapper(MapperService.java:588)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.mapper.MapperService.updateMapping(MapperService.java:346)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.index.IndexService.updateMapping(IndexService.java:840)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndicesAndUpdateShards(IndicesClusterStateService.java:583)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.doApplyClusterState(IndicesClusterStateService.java:306)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:260)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:544)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:530)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:503)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:157)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:956)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:218)
at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:184)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
```
The `mode` parameter no longer gets serialized for new indices. However on the older nodes still serialize the `mode` parameter, which caused the menioned assertion to fail. Reverting for now and see how best to address this bwc serialization issue.
We can only stop serializing mode, when all nodes are on the same version. Unfortunately we can't invoke `c.clusterTransportVersion().get()` from parser or builder, because that calling thread isn't allowed to call `clusterService.state()`.
The current name doesn't allow skipping it to workaround compatibility
test failures:
```
> Task :rest-api-spec:yamlRestCompatTestTransform FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':rest-api-spec:yamlRestCompatTestTransform'.
> class com.fasterxml.jackson.databind.node.ObjectNode cannot be cast to class com.fasterxml.jackson.databind.node.ArrayNode (com.fasterxml.jackson.databind.node.ObjectNode and com.fasterxml.jackson.databind.node.ArrayNode are in unnamed module of loader org.gradle.internal.classloader.VisitableURLClassLoader$InstrumentingVisitableURLClassLoader @15eaac09)
```
This adds a new `multi_dense_vector` field that focuses on the maxSim
usecase provided by Col[BERT|Pali].
Indexing vectors in HNSW as it stands makes no sense. Performance wise
or for cost. However, we should totally support rescoring and
brute-force search over vectors with maxSim.
This is step one of many. Behind a feature flag, this adds support for
indexing any number of vectors of the same dimension.
Supports bit/byte/float.
Scripting support will be a follow up.
Marking as non-issue as its behind a flag and unusable currently.
(cherry picked from commit 7369c0818d)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add `docvalue_fields` Support for `dense_vector` Fields (#114484)
Currently dense_vector field don't support docvalue_fields.
This add this support for debugging purposes. Users can inspect
row values of their vectors even if the source is disabled.
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
(cherry picked from commit c8a8d4d931)
* fixing for backport
---------
Co-authored-by: Rassyan <yjkhngds@gmail.com>
When ingesting logs, it's important to ensure that documents are not dropped due to mapping issues, also when dealing with dynamically mapped fields. Elasticsearch provides two key settings that help manage the total number of field mappings and handle situations where this limit might be exceeded:
1. **`index.mapping.total_fields.limit`**: This setting defines the maximum number of fields allowed in an index. If this limit is reached, any further mapped fields would cause indexing to fail.
2. **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: This setting determines whether Elasticsearch should ignore any dynamically mapped fields that exceed the limit defined by `index.mapping.total_fields.limit`. If set to `false`, indexing will fail once the limit is surpassed. However, if set to `true`, Elasticsearch will continue indexing the document but will silently ignore any additional dynamically mapped fields beyond the limit.
To prevent indexing failures due to dynamic mapping issues, especially in logs where the schema might change frequently, we change the default value of **`index.mapping.total_fields.ignore_dynamic_beyond_limit` from `false` to `true` in LogsDB**. This change ensures that even when the number of dynamically mapped fields exceeds the set limit, documents will still be indexed, and additional fields will simply be ignored rather than causing an indexing failure.
This adjustment is important for LogsDB, where dynamically mapped fields may be common, and we want to make sure to avoid documents from being dropped.
Set index stats to be refreshed immediately - cached 0 size may be the
reason why it fails.
Fixes#115600
(cherry picked from commit 5f4e681788)
# Conflicts:
# muted-tests.yml
* Add lookup index mode (#115143)
This change introduces a new index mode, lookup, for indices intended
for lookup operations in ES|QL. Lookup indices must have a single shard
and be replicated to all data nodes by default. Aside from these
requirements, they function as standard indices. Documentation will be
added later when the lookup operator in ES|QL is implemented.
* default shard
* minimal
* compile
In 8.x we need to have bwc back to when before `element_type: byte`
existed. To prevent loss of coverage for past versions & telemetry, here
I move the index creation around so that we only create with `byte` when
we have the more recent telemetry changes (and thus also `byte`
elements).
closes: https://github.com/elastic/elasticsearch/issues/114556
* Adding new bbq index types behind a feature flag (#114439)
new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.
(cherry picked from commit 6c752abc23)
* spotless
* Guard second doc parsing pass with index setting (#114649)
* Guard second doc parsing pass with index setting
* add test
* updates
* updates
* merge
(cherry picked from commit 98e0a4e953)
* Update 21_synthetic_source_stored.yml
**Description:**
This PR addresses the issue described in [#114402](https://github.com/elastic/elasticsearch/issues/114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.
**Changes:**
- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.
**Related Issues:**
- Closes [#114402](https://github.com/elastic/elasticsearch/issues/114402)
- Introduced in [#110059](https://github.com/elastic/elasticsearch/pull/110059)
Co-authored-by: Rassyan <yjkhngds@gmail.com>
* Add a query rules tester API call
* Update docs/changelog/114168.yaml
* Wrap client call in async with origin
* Remove unused param
* PR feedback
* Remove redundant test
* CI workaround - add ent-search as ml dependency so it can find node features
API for `/_inference/{task_type}/{inference_id}/_stream` and `/_inference/{inference_id}/_stream`
Request is `application/json`
Response is `text/event-stream`
This adds some more counts for dense_vector field mapping stats. This
allows for seeing the number of mappings with a given element type,
similarity, or index type.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add remote cluster stats to _cluster/stats
* Implement remote cluster stats polling
* Add docs for the include_remotes part
(cherry picked from commit b26d81c713)