Requests to the bulk API comprise a sequence of items, each of which
starts with a JSON object describing the item. This object includes the
type of action to perform with the item which should be one of `create`,
`update`, `index`, or `delete`. In earlier versions Elasticsearch would
ignore items with an unrecognized type, skipping the next line in the
request, but this lenient behaviour means that there is no way for the
client to associate the items in the response with the items in the
request, and in some cases it would cause the remainder of the request
to be parsed incorrectly.
With this commit, requests to the bulk API must comprise only items with
recognized types. Elasticsearch will reject requests containing any
items with an unrecognized type with a `400 Bad Request` error response.
This change adds the filter query for a filtered alias to the knn query during the dfs phase on the
shard. This ensures the correct number of k results are returned instead of removing results as a post
filter.
Fixes: #89561
* [Doc] Release notes for v8.4.1 (#89636)
* [Doc] Release notes for v8.4.1
Gradle generated release notes for v8.4.1
* address feedback
* [DOCS] Remove coming tag for 8.4.1 RNs
Co-authored-by: Yang Wang <yang.wang@elastic.co>
This adds support for synthetic _source to the `version` field type. It
works very similarly to `keyword` but with an extra decode step.
I modified the decoder to return a `BytesRef` instead of a `String`
because many of the callers seemed to be converting that string directly
into bytes again. Synthetic source would have wanted to do that. As was
the query infrastructure.
* Create restart-cluster.asciidoc
As per https://github.com/elastic/elasticsearch/issues/49972 and https://github.com/elastic/elasticsearch/issues/56578, if a node is above low disk threshold when being restarted (rolling restart, network disruption or crash), the disk threshold decider prevents reusing the shard content on the restarted node.
The consequence of the event is the node may take a long time to start.
* Update docs/reference/setup/restart-cluster.asciidoc
LGTM! Thanks!
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
When starting a trained model deployment, a queue is created.
If the queue_capacity is too large, it can lead to OOM and a node
crash.
This commit adds validation that the queue_capacity cannot be more
than 1M.
Closes#89555
The documentation mentions the `noop` value ("Otherwise it does nothing (`noop`)"), however, the value used in the example script is `none`. This change corrects the value in the example script to `noop`.
This commit adds support for floating point node.processors setting.
This is useful when the nodes run in an environment where the CPU
time assigned to the ES node process is limited (i.e. using cgroups).
With this change, the system would be able to size the thread pools
accordingly, in this case it would round up the provided setting
to the closest integer.
This commit adds a short note to the 'search your data' docs around kNN search
to explain how approximate kNN works with aggregations:
* Make section on 'hybrid retrieval' more general and include aggregations info
* Remove an example response from the previous section on filtering, since this
page was getting long
Now that we're releasing synthetic _source as a tech preview feature, we
no longer want to remove the docs from the non-release builds. And we
want to mark all of the headings describing synthetic `_source` as a
preview.
The docs for synthetic `_source` incorrectly claimed that synthetic
`_source` deduplicates numbers. It doesn't. The example below the prose
shows it *not* removing duplicates.
We have a check that enforces the total number of fields needs to be below a
certain (configurable) threshold. Before runtime fields did not contribute
to the count. This patch makes all runtime fields contribute to the
count, runtime fields:
- that were explicitly defined in mapping by a user
- as well as runtime fields that were dynamically created by dynamic
mappings
Closes#88265
There are some redundant words so I just removed those words. Please accept this change.
(cherry picked from commit e1e5398051)
Co-authored-by: Adnan Ashraf <adnan.ashraff1@gmail.com>
* [DOCS] Warn only one date format is added to the field date formats
When using multiple options in `dynamic_date_formats`, only one of the formats of the first document having a date matching one of the date formats provided will be used.
E.g.
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM", "MM/dd/yyyy"]
}
}
PUT my-index-000001/_doc/1
{
"create_date": "09/25/2015"
}
```
The generated mappings will be:
```
"mappings": {
"dynamic_date_formats": [
"yyyy/MM",
"MM/dd/yyyy"
],
"properties": {
"create_date": {
"type": "date",
"format": "MM/dd/yyyy"
}
}
},
```
Indexing a document with `2015/12` would lead to the `format` `"yyyy/MM"` being used for the `create_date`.
This can be misleading especially if the user is using multiple date formats on the same field.
The first document will determine the format of the `date` field being detected.
Maybe we should provide an additional example, such as:
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM||MM/dd/yyyy"]
}
}
```
My wording is not great, so feel free to amend/edit.
* Update docs/reference/mapping/dynamic/field-mapping.asciidoc
Reword and add code example
* Turned discussion of the two syntaxes into an admonition
* Fix failing tests
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
The docs for `transport.ping_schedule` note that the transport client
defaults to a 5s ping schedule, but this is no longer relevant. This
commit drops this from the docs, and also moves the docs for this
setting further down the page to reflect its relative unimportance.
We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.