This patch builds on the work in #113757, #122999, #124594, and #125529 to
natively store array offsets for unsigned long fields instead of falling
back to ignored source when synthetic_source_keep: arrays.
This allows a `rescore_vector: {oversample: 0}` to indicate bypassing
oversampling and rescoring.
This is useful for:
- Updating a quantized mapping to turn off automatic rescoring
- Bypassing oversampling at query time in an ad-hoc manner if its on by default in the mapping
closes: https://github.com/elastic/elasticsearch/issues/125157
Load field caps from store if they haven't been initialised through a refresh yet.
Keep the plain reads to not mess with performance characteristics too much on the good path but protect against confusing races when loading field infos now (that probably should have been ordered stores in the first place but this was safe due to other locks/volatiles on the refresh path).
Closes#125483
Adds the `original_types` to the description of ESQL's `unsupported`
fields. This looks like:
```
{
"name" : "a",
"type" : "unsupported",
"original_types" : [
"long",
"text"
]
}
```
for union types. And like:
```
{
"name" : "a",
"type" : "unsupported",
"original_types" : [
"date_range"
]
}
```
for truly unsupported types.
This information is useful for the UI. For union types it can suggest
that users append a cast.
The dataset name for the deprecation logs index was previously renamed
from `deprecation.elasticsearch` to `elasticsearch.deprecation` in
order to follow the pattern of `product.group`. The deprecation index
template, however, was not updated. This causes indexing errors once
upgraded to 9.0 due to the dataset name having changed on a
constant_keyword field. In order to avoid that mismatch, this commit
renames the deprecation indexing datastream to match the dataset name.
The old template is kept in place, but marked as deprecated, so that any
deprecation logs written during upgrading to 9.x will continue to be
indexed into the old datastream.
closes#125445
Follow up to #125345. If the query contained both a nanos and a millis comparison, we were formatting the dates incorrectly for the lucene push down. This PR adds a test and a fix for that case.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This patch builds on the work in #113757, #122999, and #124594 to natively
store array offsets for boolean fields instead of falling back to ignored
source when `synthetic_source_keep: arrays`.
Fixes#125439
We were incorrectly formatting nanosecond dates when building lucene queries. We had missed this in our testing because none of the CSV tests were running against Lucene. This happened because the date nanos test data includes multivalue fields. Our warning behavior for multivalue fields is inconsistent between queries run in Lucene and queries run in pure ES|QL without pushdown. Our warning tests, however, require that the specified warnings be present in all execution paths. When we first built the date nanos CSV tests, we worked around this by always using an MV function to unpack the multivalue fields. But we forgot that using an MV function prevents the entire query from being pushed down to Lucene, and thus that path wasn't being tested.
In this PR, I've duplicated many of the tests to have a version that doesn't use the MV function, and uses warningRegex instead of warning. The regex version does not fail if the header is absent, so it's safe to use in both modes. Rewriting the tests this way revealed several situations in which this bug can manifest, all of which are fixed in this PR. I cannot be confidant that there aren't more paths that can trigger this bug and aren't covered by these tests, but I haven't found any yet.
I've left some trace level logging that I found helpful while debugging this.
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Instead of processing cluster state updates on the cluster state applier
thread, we fork to a different thread where ILM's runtime of processing
the cluster state update does not affect the speed at which the cluster
can apply new cluster states. That does not mean we don't need to
optimize ILM's cluster state processing, as the overall amount of
processing is generally unaffected by this fork approach (unless we skip
some cluster states), but it does mean we're saving a significant amount
of processing on the critical cluster state applier thread.
Additionally, by running ILM's state processing asynchronously, we allow
ILM to skip some cluster states if the management thread pool is
saturated or ILM's processing is taking too long.
This PR was originally focused on improving support for Kibana docs, in particular the missing operator docs, but it has expanded to cover a bunch of related things:
* Primarily the main work was to improve operators support. ESQL generated docs cover all functions and most operators for which their is a clear operator class and test class. However, some are built-in behaviour and need additional support. This PR adds more generated content for those operators.
* Various specific operators requested by Kibana: Cast & null-predicates, and in particular the addition of examples
* Two functions without examples: mv_append and to_date_nanos
* Many small visual document cleanups (spelling, grammar, capitalization, etc.)
* Initial support for `applies_to` for multi-version differentiation.
This last point requires more work, as it is not yet agreed on just how we want this to look. We'll probably need to do refinements in followup PR. Consider the version in this PR as a first step into how this could look.
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
This action solely needs the cluster state, it can run on any node.
Additionally, it needs to be cancellable to avoid doing unnecessary work
after a client failure or timeout.
Relates #101805
Since #122905 we were throwing NPEs (i.e. 5xxs) when a rollover request has an unknown/non-existent target. Before that, we returned a 400 - illegal argument exception. We now return a 404 which matches "missing target" better. Additionally, to avoid this from happening again, we add a YAML test that asserts the correct exception behavior.
Adds a new cache and setting
TransportGetAllocationStatsAction.CACHE_TTL_SETTING
"cluster.routing.allocation.stats.cache.ttl" to configure the max age
for cached NodeAllocationStats on the master. The default
value is currently 1 minute per the suggestion in issue 110716.
Closes#110716
Make the conversion functions that process `BytesRef`s into `BytesRefs`
keep the `OrdinalBytesRefVector`s when processing. Let's use `TO_LOWER`
as an example. First, the performance numbers:
```
(operation) Mode Score Error -> Score Error Units
to_lower 30.662 ± 6.163 -> 30.048 ± 0.479 ns/op
to_lower_ords 30.773 ± 0.370 -> 0.025 ± 0.001 ns/op
to_upper 33.552 ± 0.529 -> 35.775 ± 1.799 ns/op
to_upper_ords 35.791 ± 0.658 -> 0.027 ± 0.001 ns/op
```
The test has a 8192 positions containing alternating `foo` and `bar`.
Running `TO_LOWER` via ordinals is super duper faster. No longer
`O(positions)` and now `O(unique_values)`.
Let's paint some pictures! `OrdinalBytesRefVector` is a lookup table.
Like this:
```
+-------+----------+
| bytes | ordinals |
| ----- | -------- |
| FOO | 0 |
| BAR | 1 |
| BAZ | 2 |
+-------+ 1 |
| 1 |
| 0 |
+----------+
```
That lookup table is one block. When you read it you look up the
`ordinal` and match it to the `bytes`. Previously `TO_LOWER` would
process each value one at a time and make:
```
bytes
-----
foo
bar
baz
bar
bar
foo
```
So it'd run `TO_LOWER` once per `ordinal` and it'd make an ordinal
non-lookup table. With this change `TO_LOWER` will now make:
```
+-------+----------+
| bytes | ordinals |
| ----- | -------- |
| foo | 0 |
| bar | 1 |
| baz | 2 |
+-------+ 1 |
| 1 |
| 0 |
+----------+
```
We don't even have to copy the `ordinals` - we can reuse those from the
input and just bump the reference count. That's why this goes from
`O(positions)` to `O(unique_values)`.
Cohere embeddings are expected to be normalized to unit vectors, but due to floating point precision issues,
our check ({@link DenseVectorFieldMapper#isNotUnitVector(float)}) often fails.
This change fixes this bug by setting the default similarity for newly created Cohere inference endpoint to cosine.
Closes#122878
Added BWCLucene8*Codecs wrapper classes for the lucene8* equivalents. A BWC wrapper is initialized for archive indices and provides read-only capabilities for an index.
This PR updates the documentation for Creating classic plugins, replacing the instructions relative to the Java SecurityManager with information on Entitlements.
Relates to ES-10846