Adds deployment threading options and a new memory section reporting
the memory usage for each of the ml features
# Conflicts:
# server/src/main/java/org/elasticsearch/TransportVersions.java
Allows setting index total_shards_per_node in the SearchableSnapshot action of ILM to remediate hot spot in shard allocation for searchable snapshot index.
Closes#112261
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.
(cherry picked from commit f7880ae85f)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
(cherry picked from commit 208a1fe571)
* Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387
Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
(cherry picked from commit 8d223cbf7a)
# Conflicts:
# server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
* Remove skip test for 8.x
This was just needed for 8.x to 9.0 compatibility tests
This will correct/switch "year" unit diffing from the current integer
subtraction to a crono subtraction. Consequently, two dates are (at
least) one year apart now if (at least) a full calendar year separates
them. The previous implementation simply subtracted the year part of the
dates.
Note: this parts with ES SQL's implementation of the same function,
which itself is aligned with MS SQL's implementation, which works
equivalent to an integer subtraction.
Fixes#112482.
(cherry picked from commit f7ff00f645)
The addition of the logger requires several updates to tests to deal with the possible warning, or muting if there is not way to specify an allowed (but not mandatory) warning
Apparently some users consider "node is restarting" not to apply to a
full-cluster restart. This commit further clarifies that you must not
set `cluster.initial_master_nodes` in a full cluster restart.
* Remove zstd feature flag for index codec best compression. (#112665)
ZStandard was added via #103374 a few months ago to snapshot builds of Elasticsearch only and benchmark results have shown that using zstd is a better trade off compared to deflate for when index.codec is set to best_compression.
This change removes the feature flag for ZStandard stored field compression for indices with index.codec set to best_compression.
* Update docs/changelog/112857.yaml
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* (Doc+) Inference Pipeline ignores Mapping Analyzers
From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
👋 howdy, team!
1. Related to https://github.com/elastic/dev/issues/2631, highlights customers are usually seeking `heap.percent` instead of `ram.percent`
2. Aligns the claimed "(Default)" columns in doc to what returned for v8.15.1 test cluster
Adds an API which scans all the metadata (and optionally the raw data)
in a snapshot repository to look for corruptions or other
inconsistencies.
Closes https://github.com/elastic/elasticsearch/issues/52622 Closes
ES-8560
Currently, the data stream lifecycle telemetry has the following
structure:
```
{
....
"data_lifecycle" : {
"available": true,
"enabled": true,
"count": 0,
"default_rollover_used": true,
"retention": {
"minimum_millis": 0,
"maximum_millis": 0,
"average_millis": 0.0
}
}....
```
In the snippet above you can see that we track:
- The amount of data streams managed by the data stream lifecycle by `count`
- If the default rollover has been overwritten by `default_rollover_used`
- The min, max and average of the `data_retention` configured on a data stream level.
In this PR we propose the following extention:
```
....
"data_lifecycle" : {
"available": true,
"enabled": true,
"count": 0,
"default_rollover_used": true,
"effective_retention": { #https://github.com/elastic/dev/issues/2537
"retained_data_streams": 5,
"minimum_millis": 0, # Only if retained data streams > 1
"maximum_millis": 0,
"average_millis": 0.0
},
"data_retention": {
"configured_data_streams": 5,
"minimum_millis": 0, # Only if retained data streams > 1
"maximum_millis": 0,
"average_millis": 0.0
},
"global_retention": {
"default": {
"defined": true/false,
"affected_data_streams": 0,
"millis": 0
},
"max": {
"defined": true/false,
"affected_data_streams": 0,
"millis": 0
}
}
```
With this extension we are tracking:
- The amount of data streams managed by the data stream lifecycle by `count`
- If the default rollover has been overwritten by `default_rollover_used`
- The min, max and average of the `data_retention` configured on a data stream level and the number of data streams that have it configured. We add the min, max and avg only if there are data streams with data retention configuration to avoid messing with the stats in a dashboard.
- The min, max and average of the `effective_retention` and the number of data streams that are retained. We add the min, max and avg only if there are retained data streams to avoid messing with the stats in a dashboard.
- Global retention stats, if they are defined, if the number of the affected data streams and the actual value.
The above metrics allow us to answer questions like:
- How many data streams are affected by global retention.
- How big is the difference between the longest data retention compared to max global retention.
- How much does the effective retention diverging from the data retention, this will show the impact of the global retention.
* #101472 Updates default index.translog.flush_threshold_size value
* Update docs/reference/index-modules/translog.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Updates the description
---------
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
When CASE hits a multivalued field it was previously either crashing on
fold or evaluating it to the first value. Since booleans are loaded in
sorted order from lucene that *usually* means `false`. This changes the
behavior to line up with the rest of ESQL - now multivalued fields are
treated as `false` with a warning.
You might say "hey wait! multivalued fields usually become `null`, not
`false`!". Yes, dear reader, you are right. Very right. But! `CASE`'s
contract is to immediatly convert its values into `true` or `false`
using the standard boolean tri-valued logic. So `null` just become
`false` immediately. This is how PostgreSQL, MySQL, and SQLite behave:
```
> SELECT CASE WHEN null THEN 1 ELSE 2 END;
2
```
They turn that `null` into a false. And we're right there with them.
Except, of course, that we're turning `[false, false]` and the like into
`null` first. See!? It's consitent. Consistently confusing, but sane at
least.
The warning message just says "treating multivalued field as false"
rather than explaining all of that.
This also fixes up a few of CASE's docs which I noticed were kind of
busted while working on CASE. I think the docs generation is having a
lot of trouble with CASE so I've manually hacked the right thing into
place, but we should figure out a better solution eventually.
Closes#112359
- Added mv_median_absolute_deviation function
- Added possibility of having a fixed param in Multivalue "ascending" functions
- Add surrogate to MedianAbsoluteDeviation
### Calculations used to avoid overflows
First, a quick recap of how the MAD is calculated:
1. Sort values, and get the median
2. Calculate the difference between each value with the median (`abs(median - value)`)
3. Sort the differences, and get their median
Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`.
To solve this, some types are up-casted as follow:
- Int: Stored as longs, simple approach
- Long: Stored as longs, but switched to unsigned long representation when calculating the differences
- Unsigned long: No effect; the resulting range is the same
- Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort
Closes https://github.com/elastic/elasticsearch/issues/111590