Add release notes for 8.9.0 release (#97061) (#97931)

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: David Kyle <david.kyle@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This commit is contained in:
Ryan Ernst 2023-07-25 07:00:03 -07:00 committed by GitHub
parent 947ea5893b
commit 97d19a3881
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
3 changed files with 348 additions and 5 deletions

View file

@ -16,7 +16,27 @@ coming::[8.9.0]
[[breaking-changes-8.9]] [[breaking-changes-8.9]]
=== Breaking changes === Breaking changes
The following changes in {es} 8.9 might affect your applications
and prevent them from operating normally.
Before upgrading to 8.9, review these changes and take the described steps
to mitigate the impact.
// NOTE: The notable-breaking-changes tagged regions are re-used in the
// Installation and Upgrade Guide
// tag::notable-breaking-changes[] // tag::notable-breaking-changes[]
There are no breaking changes in {es} 8.9. [discrete]
[[breaking_89_rest_api_changes]]
==== REST API changes
[[switch_tdigeststate_to_use_hybriddigest_by_default]]
.Switch TDigestState to use `HybridDigest` by default
[%collapsible]
====
*Details* +
The default implementation for TDigest in percentile calculations switches to a new internal implementation offering superior performance (2x-10x speedup), at a very small accuracy penalty for very large sample populations.
*Impact* +
This change leads to generating slightly different results in percentile calculations. If the highest possible accuracy is desired, or it's crucial to produce exactly the same results as in previous versions, one can either set `execution_hint` to `high_accuracy` in the `tdigest` spec of a given percentile calculation, or set `search.aggs.tdigest_execution_hint` to `high_accuracy` in cluster settings to apply to all percentile queries.
====
// end::notable-breaking-changes[] // end::notable-breaking-changes[]

View file

@ -5,4 +5,287 @@ coming[8.9.0]
Also see <<breaking-changes-8.9,Breaking changes in 8.9>>. Also see <<breaking-changes-8.9,Breaking changes in 8.9>>.
[[known-issues-8.9.0]]
[float]
=== Known issues
* Question Answering fails on long input text. If the context supplied to the
task is longer than the model's max_sequence_length and truncate is set to none
then inference fails with the message `question answering result has
invalid dimension`. (issue: {es-issue}97917[#97917])
[[breaking-8.9.0]]
[float]
=== Breaking changes
Aggregations::
* Switch TDigestState to use `HybridDigest` by default {es-pull}96904[#96904]
[[bug-8.9.0]]
[float]
=== Bug fixes
Allocation::
* Attempt to fix delay allocation {es-pull}95921[#95921]
* Fix NPE in Desired Balance API {es-pull}97775[#97775]
* Fix autoexpand during node replace {es-pull}96281[#96281]
Authorization::
* Resolving wildcard application names without prefix query {es-pull}96479[#96479] (issue: {es-issue}96465[#96465])
CRUD::
* Fix `retry_on_conflict` parameter in update API to not retry indefinitely {es-pull}96262[#96262]
* Handle failure in `TransportUpdateAction#handleUpdateFailureWithRetry` {es-pull}97290[#97290] (issue: {es-issue}97286[#97286])
Cluster Coordination::
* Avoid `getStateForMasterService` where possible {es-pull}97304[#97304]
* Become candidate on publication failure {es-pull}96490[#96490] (issue: {es-issue}96273[#96273])
* Fix cluster settings update task acknowledgment {es-pull}97111[#97111]
* Improve exception handling in Coordinator#publish {es-pull}97840[#97840] (issue: {es-issue}97798[#97798])
Data streams::
* Accept timestamp as object at root level {es-pull}97401[#97401]
Geo::
* Fix bug when creating empty `geo_lines` {es-pull}97509[#97509] (issue: {es-issue}97311[#97311])
* Fix time-series geo_line to include reduce phase in MergedGeoLines {es-pull}96953[#96953] (issue: {es-issue}96983[#96983])
* Support for Byte and Short as vector tiles features {es-pull}97619[#97619] (issue: {es-issue}97612[#97612])
ILM+SLM::
* Limit the details field length we store for each SLM invocation {es-pull}97038[#97038] (issue: {es-issue}96918[#96918])
Infra/CLI::
* Initialise ES logging in CLI {es-pull}97353[#97353] (issue: {es-issue}97350[#97350])
Infra/Core::
* Capture max processors in static init {es-pull}97119[#97119] (issue: {es-issue}97088[#97088])
* Interpret microseconds cpu stats from cgroups2 properly as nanos {es-pull}96924[#96924] (issue: {es-issue}96089[#96089])
Infra/Logging::
* Add slf4j-nop in order to prevent startup warnings {es-pull}95459[#95459]
Infra/REST API::
* Fix tchar pattern in `RestRequest` {es-pull}96406[#96406]
Infra/Scripting::
* Fix Painless method lookup over unknown super interfaces {es-pull}97062[#97062] (issue: {es-issue}97022[#97022])
Infra/Settings::
* Enable validation for `versionSettings` {es-pull}95874[#95874] (issue: {es-issue}95873[#95873])
Ingest Node::
* Fixing `DateProcessor` when the format is `epoch_millis` {es-pull}95996[#95996]
* Fixing `GeoIpDownloaderStatsAction$NodeResponse` serialization by defensively copying inputs {es-pull}96777[#96777] (issue: {es-issue}96438[#96438])
* Trim field references in reroute processor {es-pull}96941[#96941] (issue: {es-issue}96939[#96939])
Machine Learning::
* Catch exceptions thrown during inference and report as errors {ml-pull}2542[#2542]
* Fix `WordPiece` tokenization where stripping accents results in an empty string {es-pull}97354[#97354]
* Improve model downloader robustness {es-pull}97274[#97274]
* Prevent high memory usage by evaluating batch inference singularly {ml-pull}2538[#2538]
Mapping::
* Avoid stack overflow while parsing mapping {es-pull}95705[#95705] (issue: {es-issue}52098[#52098])
* Fix mapping parsing logic to determine synthetic source is active {es-pull}97355[#97355] (issue: {es-issue}97320[#97320])
Ranking::
* Fix `sub_searches` serialization bug {es-pull}97587[#97587]
Recovery::
* Promptly fail recovery from snapshot {es-pull}96421[#96421] (issue: {es-issue}95525[#95525])
Search::
* Prevent instantiation of `top_metrics` when sub-aggregations are present {es-pull}96180[#96180] (issue: {es-issue}95663[#95663])
* Set new providers before building `FetchSubPhaseProcessors` {es-pull}97460[#97460] (issue: {es-issue}96284[#96284])
Snapshot/Restore::
* Fix blob cache races/assertion errors {es-pull}96458[#96458]
* Fix reused/recovered bytes for files that are only partially recovered from cache {es-pull}95987[#95987] (issues: {es-issue}95970[#95970], {es-issue}95994[#95994])
* Fix reused/recovered bytes for files that are recovered from cache {es-pull}97278[#97278] (issue: {es-issue}95994[#95994])
* Refactor `RestoreClusterStateListener` to use `ClusterStateObserver` {es-pull}96662[#96662] (issue: {es-issue}96425[#96425])
TSDB::
* Error message for misconfigured TSDB index {es-pull}96956[#96956] (issue: {es-issue}96445[#96445])
* Min score for time series {es-pull}96878[#96878]
Task Management::
* Improve cancellability in `TransportTasksAction` {es-pull}96279[#96279]
Transform::
* Improve reporting status of the transform that is about to finish {es-pull}95672[#95672]
[[enhancement-8.9.0]]
[float]
=== Enhancements
Aggregations::
* Add cluster setting to `SearchExecutionContext` to configure `TDigestExecutionHint` {es-pull}96943[#96943]
* Add support for dynamic pruning to cardinality aggregations on low-cardinality keyword fields {es-pull}92060[#92060]
* Make TDigestState configurable {es-pull}96794[#96794]
* Skip `SortingDigest` when merging a large digest in `HybridDigest` {es-pull}97099[#97099]
* Support value retrieval in `top_hits` {es-pull}95828[#95828]
Allocation::
* Take into account `expectedShardSize` when initializing shard in simulation {es-pull}95734[#95734]
Analysis::
* Create `.synonyms` system index {es-pull}95548[#95548]
Application::
* Add template parameters to Search Applications {es-pull}95674[#95674]
* Chunk profiling stacktrace response {es-pull}96340[#96340]
* [Profiling] Add status API {es-pull}96272[#96272]
* [Profiling] Allow to upgrade managed ILM policy {es-pull}96550[#96550]
* [Profiling] Introduce ILM for K/V indices {es-pull}96268[#96268]
* [Profiling] Require POST to retrieve stacktraces {es-pull}96790[#96790]
* [Profiling] Tweak default ILM policy {es-pull}96516[#96516]
* [Search Applications] Support arrays in stored mustache templates {es-pull}96197[#96197]
Authentication::
* Header validator with Security {es-pull}95112[#95112]
* Upgrade xmlsec to 2.1.8 {es-pull}97741[#97741]
Authorization::
* Add Search ALC filter index prefix to the enterprise search user {es-pull}96885[#96885]
* Ensure checking application privileges work with nested-limited roles {es-pull}96970[#96970]
Autoscaling::
* Add shard explain info to `ReactiveReason` about unassigned shards {es-pull}88590[#88590] (issue: {es-issue}85243[#85243])
DLM::
* Add auto force merge functionality to DLM {es-pull}95204[#95204]
* Adding `data_lifecycle` to the _xpack/usage API {es-pull}96177[#96177]
* Adding `manage_data_stream_lifecycle` index privilege and expanding `view_index_metadata` for access to data stream lifecycle APIs {es-pull}95512[#95512]
* Allow for the data lifecycle and the retention to be explicitly nullified {es-pull}95979[#95979]
Data streams::
* Add support for `logs@custom` component template for `logs-*-* data streams {es-pull}95481[#95481] (issue: {es-issue}95469[#95469])
* Adding ECS dynamic mappings component and applying it to logs data streams by default {es-pull}96171[#96171] (issue: {es-issue}95538[#95538])
* Adjust ECS dynamic templates to support `subobjects: false` {es-pull}96712[#96712]
* Automatically parse log events in logs data streams, if their `message` field contains JSON content {es-pull}96083[#96083] (issue: {es-issue}95522[#95522])
* Change default of `ignore_malformed` to `true` in `logs-*-*` data streams {es-pull}95329[#95329] (issue: {es-issue}95224[#95224])
* Set `@timestamp` for documents in logs data streams if missing and add support for custom pipeline {es-pull}95971[#95971] (issues: {es-issue}95537[#95537], {es-issue}95551[#95551])
* Update data streams implicit timestamp `ignore_malformed` settings {es-pull}96051[#96051]
EQL::
* Sequence - add support for missing events {es-pull}92348[#92348]
Engine::
* Cache modification time of translog writer file {es-pull}95107[#95107]
* Trigger refresh when shard becomes search active {es-pull}96321[#96321] (issue: {es-issue}95544[#95544])
Geo::
* Add brute force approach to `GeoHashGridTiler` {es-pull}96863[#96863]
* Asset tracking - geo_line in time-series aggregations {es-pull}94954[#94954]
ILM+SLM::
* Chunk the GET _ilm/policy response {es-pull}97251[#97251] (issue: {es-issue}96569[#96569])
* Move get lifecycle API to Management thread pool and make cancellable {es-pull}97248[#97248] (issue: {es-issue}96568[#96568])
* Reduce WaitForNoFollowersStep requests indices shard stats {es-pull}94510[#94510]
Indices APIs::
* Bootstrap profiling indices at startup {es-pull}95666[#95666]
Infra/Node Lifecycle::
* SIGTERM node shutdown type {es-pull}95430[#95430]
Ingest Node::
* Add mappings for enrich fields {es-pull}96056[#96056]
* Ingest: expose reroute inquiry/reset via Elastic-internal API bridge {es-pull}96958[#96958]
Machine Learning::
* Improved compliance with memory limitations {ml-pull}2469[#2469]
* Improve detection of calendar cyclic components with long bucket lengths {ml-pull}2493[#2493]
* Improve detection of time shifts, for example for daylight saving {ml-pull}2479[#2479]
Mapping::
* Allow unsigned long field to use decay functions {es-pull}96394[#96394] (issue: {es-issue}89603[#89603])
Ranking::
* Add multiple queries for ranking to the search endpoint {es-pull}96224[#96224]
Recovery::
* Implement `StartRecoveryRequest#getDescription` {es-pull}95731[#95731]
Search::
* Add search shards endpoint {es-pull}94534[#94534]
* Don't generate stacktrace in `EarlyTerminationException` and `TimeExceededException` {es-pull}95910[#95910]
* Feature/speed up binary vector decoding {es-pull}96716[#96716]
* Improve brute force vector search speed by using Lucene functions {es-pull}96617[#96617]
* Include search idle info to shard stats {es-pull}95740[#95740] (issue: {es-issue}95727[#95727])
* Integrate CCS with new `search_shards` API {es-pull}95894[#95894] (issue: {es-issue}93730[#93730])
* Introduce a filtered collector manager {es-pull}96824[#96824]
* Introduce minimum score collector manager {es-pull}96834[#96834]
* Skip shards when querying constant keyword fields {es-pull}96161[#96161] (issue: {es-issue}95541[#95541])
* Support CCS minimize round trips in async search {es-pull}96012[#96012]
* Support for patter_replace filter in keyword normalizer {es-pull}96588[#96588]
* Support null_value for rank_feature field type {es-pull}95811[#95811]
Security::
* Add "_storage" internal user {es-pull}95694[#95694]
Snapshot/Restore::
* Reduce overhead in blob cache service get {es-pull}96399[#96399]
Stats::
* Add `ingest` information to the cluster info endpoint {es-pull}96328[#96328] (issue: {es-issue}95392[#95392])
* Add `script` information to the cluster info endpoint {es-pull}96613[#96613] (issue: {es-issue}95394[#95394])
* Add `thread_pool` information to the cluster info endpoint {es-pull}96407[#96407] (issue: {es-issue}95393[#95393])
TSDB::
* Feature: include unit support for time series rate aggregation {es-pull}96605[#96605] (issue: {es-issue}94630[#94630])
Vector Search::
* Leverage SIMD hardware instructions in Vector Search {es-pull}96453[#96453] (issue: {es-issue}96370[#96370])
[[feature-8.9.0]]
[float]
=== New features
Application::
* Enable analytics geoip in behavioral analytics {es-pull}96624[#96624]
Authorization::
* Support restricting access of API keys to only certain workflows {es-pull}96744[#96744]
Data streams::
* Adding ability to auto-install ingest pipelines and refer to them from index templates {es-pull}95782[#95782]
Geo::
* Geometry simplifier {es-pull}94859[#94859]
ILM+SLM::
* Enhance ILM Health Indicator {es-pull}96092[#96092]
Infra/Node Lifecycle::
* Gracefully shutdown elasticsearch {es-pull}96363[#96363]
Infra/Plugins::
* [Fleet] Add `.fleet-secrets` system index {es-pull}95625[#95625] (issue: {es-issue}95143[#95143])
Machine Learning::
* Add support for `xlm_roberta` tokenized models {es-pull}94089[#94089]
* Removes the technical preview admonition from query_vector_builder docs {es-pull}96735[#96735]
Snapshot/Restore::
* Add repo throttle metrics to node stats api response {es-pull}96678[#96678] (issue: {es-issue}89385[#89385])
Stats::
* New HTTP info endpoint {es-pull}96198[#96198] (issue: {es-issue}95391[#95391])
[[upgrade-8.9.0]]
[float]
=== Upgrades
Infra/Transport API::
* Bump `TransportVersion` to the first non-release version number. Transport protocol is now versioned independently of release version. {es-pull}95286[#95286]
Network::
* Upgrade Netty to 4.1.92 {es-pull}95575[#95575]
* Upgrade Netty to 4.1.94.Final {es-pull}97112[#97112]
Search::
* Upgrade Lucene to a 9.7.0 snapshot {es-pull}96433[#96433]
* Upgrade to new lucene snapshot 9.7.0-snapshot-a8602d6ef88 {es-pull}96741[#96741]

View file

@ -27,10 +27,50 @@ endif::[]
// The notable-highlights tag marks entries that // The notable-highlights tag marks entries that
// should be featured in the Stack Installation and Upgrade Guide: // should be featured in the Stack Installation and Upgrade Guide:
// tag::notable-highlights[] // tag::notable-highlights[]
// [discrete]
// === Heading [discrete]
// [[better_indexing_search_performance_under_concurrent_indexing_search]]
// Description. === Better indexing and search performance under concurrent indexing and search
When a query like a match phrase query or a terms query targets a constant keyword field we can skip query execution on shards where the query is rewritten to match no documents. We take advantage of index mappings including constant keyword fields and rewrite queries in such a way that, if a constant keyword field does not match the value defined in the index mapping, we rewrite the query to match no document. This will result in the shard level request to return immediately, before the query is executed on the data node and, as a result, skipping the shard completely. Here we leverage the ability to skip shards whenever possible to avoid unnecessary shard refreshes and improve query latency (by not doing any search-related I/O). Avoiding such unnecessary shard refreshes improves query latency since the search thread does not need to wait anymore for unnecessary shard refreshes. Shards not matching the query criteria will remain in a search-idle state and indexing throughput will not be negatively affected by a refresh. Before introducing this change a query hitting multiple shards, including those with no documents matching the search criteria (think about using index patterns or data streams with many backing indices), would potentially result in a "shard refresh storm" increasing query latency as a result of the search thread waiting on all shard refreshes to complete before being able to initiate and carry out the search operation. After introducing this change the search thread will just need to wait for refreshes to be completed on shards including relevant data. Note that execution of the shard pre-filter and the corresponding "can match" phase where rewriting happens, depends on the overall number of shards involved and on whether there is at least one of them returning a non-empty result (see 'pre_filter_shard_size' setting to understand how to control this behavior). Elasticsearch does the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access index mappings and extract information about constant keyword fields and their values. This means we still "fan-out" search queries from the coordinator node to involved data nodes. Rewriting queries based on index mappings is not possible on the coordinator node because the coordinator node is missing index mapping information.
{es-pull}96161[#96161]
[discrete]
[[add_multiple_queries_for_ranking_to_search_endpoint]]
=== Add multiple queries for ranking to the search endpoint
The search endpoint adds a new top-level element called `sub_searches`. This top-level element is a list of searches used for ranking where each "sub search" is executed independently. The `sub_searches` element is used instead of `query` to allow a search using ranking to execute multiple queries. The `sub_searches` and `query` elements cannot be used together.
{es-pull}96224[#96224]
[discrete]
[[make_text_embedding_for_knn_search_ga]]
=== Make text embedding for kNN search GA
From 8.9, the `text_embedding` `query_vector_builder` extension to kNN search is generally available. This functionality is required to perform <<semantic-search,semantic search>> by converting text to dense vectors.
{es-pull}96735[#96735]
// end::notable-highlights[] // end::notable-highlights[]
[discrete]
[[asset_tracking_geo_line_in_time_series_aggregations]]
=== Asset tracking - geo_line in time-series aggregations
The <<search-aggregations-metrics-geo-line,`geo_line` aggregation>> builds tracks from `geo_points`.
It has previously needed to use large arrays in memory for collecting points into multiple buckets
and sorting those buckets.
With the advances made in TSDB features and the `time_series` aggregation in particular,
it is now possible to rely on data aggregating in both TSID and timestamp order,
enabling the removal of all sorting, as well as the use of only a single bucket's
worth of memory, a dramatic improvement in memory footprint. In addition, we can use the streaming line
simplifier algorithm introduced in https://github.com/elastic/elasticsearch/pull/94859 to replace the previous
behaviour of truncating very large tracks with the far more preferable approach of simplifying those tracks.
[role="screenshot"]
image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
In this diagram, the grey line is the original geometry, the blue line is the truncated geometry as would be
produced by the original `geo_line` aggregation, and the magenta line is the new simplified geometry.
{es-pull}94954[#94954]