Adds chunked rest serialization infrastructure that tries to serialize
only what can be flushed to the channel right away instead of fully
materializing a response on heap first and then writing it to the channel.
Makes use of the new infrastructure for get-snapshots as an example use case.
* Update openjdk to 18.0.2.1
* Update docs/changelog/89535.yaml
* Keep using download.oracle.com
After clarification, Java.net download urls disappear after newer major versions have been released
We can't run these actions on the transport threads in their current form.
The fully deserialize the mapping for each index they inspect which will
take a massive amount of time if either a large number of indices or a large
number of fields (indices * fields_per_index) overall are inspected.
Requests to the bulk API comprise a sequence of items, each of which
starts with a JSON object describing the item. This object includes the
type of action to perform with the item which should be one of `create`,
`update`, `index`, or `delete`. In earlier versions Elasticsearch would
ignore items with an unrecognized type, skipping the next line in the
request, but this lenient behaviour means that there is no way for the
client to associate the items in the response with the items in the
request, and in some cases it would cause the remainder of the request
to be parsed incorrectly.
With this commit, requests to the bulk API must comprise only items with
recognized types. Elasticsearch will reject requests containing any
items with an unrecognized type with a `400 Bad Request` error response.
Date rounding logic should take into account the fields that will be
parsed be a parser. If a parser has a DayOfYear field, the rounding logic
should not try to default DayOfMonth as it will conflict with DayOfYear
However the DateTimeFormatter does not have a public method to return
information of fields that will be parsed. The hacky workaround is
to rely on toString() implementation that will return a field info when
it was defined with textual pattern.
This commits introduced conditional logic for DayOfYear, ClockHourOfAMPM and HourOfAmPM
closes#89096closes#58986
* Autoscaling after clone fix
Autoscaling could start failing after a clone, if the source of
the clone is deleted.
* Update docs/changelog/89768.yaml
* Update docs/changelog/89768.yaml
This adds support for synthetic `_source` to the `match_only_text` field
type. When synthetic `_source` is enabled `match_only_text` fields
create a hidden stored field to contain their text. This should have
similar or better search performance for this specific field type,
though it will have slightly worse indexing performance because
synthetic `_source` is still writing `_recovery_source`, which means
we're writing the bits for this field twice.
Adds `WriteField` to the ingest context via `field(<path>)`.
`WriteField` implements APIs:
* `String` `getName()`: The path
* `boolean` `exists()`: Does the path exist in the document?
* `WriteField` `set(def)`: Set the value at the path, creates nested path elements if necessary
* `WriteField` `append(def)`: Appends value to the path, creates nested path elements if necessary, the value at path is always a List after this call?
* `boolean` `isEmpty()`: Does the path contain a value?
* `int` `size()`: How many elements does the path contain?
* `Iterator` `iterator()`: Iterate over all elements at the path.
* `def` `get(def`): Get the value at the path if it exists, otherwise return the given default
* `def` `get(int, def)`: Get the value at the path and index if it exists, otherwise return the given default
* `boolean` `hasValue(Predicate)`: Is there a value at the path that passes the filter?
* `WriteField` `transform(Function)`: Change all values at the path
* `WriteField` `deduplicate()`: Remove duplicates from the path
* `WriteField` `removeValuesIf(Predicate)`: Remove a values from the path that pass the filter
* `WriteField` `removeValue(int)`: Remove the index from the path, if it exists
Some APIs remain unimplemented:
* `void` `move(String)`
* `void` `overwrite(String)`
* `void` `remove()`
This change does not handle equivalent paths, which are paths that differ in the source but flatten to the same field in the indexed document.
Path resolution is basic, each path element is assumed to be a key in the current container, starting with the root. If there is not an entry in the map at a given level, the algorithm checks to see if the remaining path exists as a flat key. The nested then flat nature algorithm handles the common case of some or none nesting followed by flat keys.
Refs: #79155
This allows you to use `ignore_above` with `keyword` fields in synthetic
source. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes `ignore_above` work pretty
much the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.
This changes the response status code from `500` to `408` when
the following ML APIs time out:
- open anomaly detection job
- start datafeed
- start data frame analytics
Closes#89585
When an action is denied due to authorization error, the list of
assigned roles is shown in the error message. However, it is possible
that the effective roles are fewer or more than the assigned list: *
Fewer roles can happen when the role is not defined or the license does
not permit it * More roles can happen when anonymous access is enabled
This PR changes the error message to show the effective roles instead of
the assigned roles (whenever possible) to help troubleshooting. In
addition, it also reports any missing roles, i.e. roles that are
assigned but cannot be found.
The user profile document is updated on each activate call even when
there is no actual content change because it always updates the
last_synchronized timestamp. This behaviour is intentional to track the
user's last login time (since Kibana calls to the activate API on user
login). Client must explicitly handle retries for version conflicts.
This is generally desirable. However, on each login there are often
multiple web components trying to call this API concurrently. This
results into more frequent version conflicts. Since these updates occur
in a short period of time, updating last_synchronized for each of them
does not really contribute a lot for tracking user login.
This PR introduces a grace period for the update behaviour (30 seconds
non-configurable) so that the update (on activate) is only performed
when either of the following is true:
* There are actual content changes
* Or it has been more than 30 seconds since last update
This stops us from applying the `min_doc_count` operation on partial
reduction. If run it on partial reduction then we filter out results
that might have more docs arrive.
Closes#89686
The `allow_no_indices` request option, when set to `false` (it is `true`
by default for all APIs), should fail indices requests that contain any
wildcard expression that resolves to no resources. This PR addresses
some cases where a wildcard can expand to no resources, and the request
is nevertheless successful. The fixed cases are when the wildcard
resolves only to hidden and/or system resources and the given request
context prohibits such resources. Another case is when the wildcard
resolves to only open or closed indices and again the request context
prohibits such resources.
Note that the fix only applies when Security is disabled (when enabled
the behavior is already correct).
This is another step towards reusing the Core's
`WildcardExpressionResolver#innerResolve` in Security, following
https://github.com/elastic/elasticsearch/pull/89311 .
This change adds the filter query for a filtered alias to the knn query during the dfs phase on the
shard. This ensures the correct number of k results are returned instead of removing results as a post
filter.
Fixes: #89561
This commit fixes an issue where inside of an `on_failure` block the
`_ingest.on_failure_pipeline` metadata was not being propagated. The
root cause seems to be an off-by-one error where we were checking for at
least two pipelines, when we should be checking for only a minimum of a
single pipeline that has been executed before adding the metadata.
* [Doc] Release notes for v8.4.1 (#89636)
* [Doc] Release notes for v8.4.1
Gradle generated release notes for v8.4.1
* address feedback
* [DOCS] Remove coming tag for 8.4.1 RNs
Co-authored-by: Yang Wang <yang.wang@elastic.co>
This adds support for synthetic _source to the `version` field type. It
works very similarly to `keyword` but with an extra decode step.
I modified the decoder to return a `BytesRef` instead of a `String`
because many of the callers seemed to be converting that string directly
into bytes again. Synthetic source would have wanted to do that. As was
the query infrastructure.
This commit adds stable analysis plugin API with analysis components interfaces and annotations.
It does not contain any usage of it yet. Separate changes to introduce example plugins or refactoring to existing ones will follow later.
It contains two gradle modules. One plugin-api with two annotations Nameable and NamedComponent, which can be reused for plugins other than analysis.
And second analysis-plugin-api which contains analysis components (TokenFilterFactory, CharFilterFactory etc)
NamedComponent - used by plugin developer - indicates that a Nameable component will be registered under a given name.
Nameable - for analysis plugins it is only used by the stable analysis api designers (ES) - indicates that component have a name and should be declared with NamedComponent
additional tasks that will follow: #88980
* Create restart-cluster.asciidoc
As per https://github.com/elastic/elasticsearch/issues/49972 and https://github.com/elastic/elasticsearch/issues/56578, if a node is above low disk threshold when being restarted (rolling restart, network disruption or crash), the disk threshold decider prevents reusing the shard content on the restarted node.
The consequence of the event is the node may take a long time to start.
* Update docs/reference/setup/restart-cluster.asciidoc
LGTM! Thanks!
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Upgrade ES to a new Lucene snapshot:
Changes of interest:
- LUCENE-10592 Build HNSW Graph on indexing
- LUCENE-10678: Fix potential overflow when computing the partition point
- LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields
* Source Lookup refactor with error on script synthetic source load
Refactors SourceLookup into a static source lookup used for most cases where we
access the source again after the fetch phase, and a re-loading lookup used by
scripts. The re-loading lookup now also fails with an error when we are using
synthetic source preventing silent failures or non-sensical behavior from
scripts.
This commit changes the status code returned when the start
trained model deployment api times out from `500` to `408`.
In addition, we add validation that the timeout must be positive.
Relates #89585