In this PR we expose the global retention via the `GET
_data_stream/{target}/_lifecycle` API.
Since the global retention is a main feature of the data stream
lifecycle we chose to expose it by default.
```
GET /_data_stream/my-data-stream/_lifecycle
{
"global_retention": {
"default_retention": "7d",
"max_retention": "365d"
},
"data_streams": [...]
}
```
With #111972 we enable users to set up global retention for data streams that are managed by the data stream lifecycle. This will allow users of elasticsearch to have a more control over their data retention, and consequently better resource management of their clusters.
However, there is a small number of data streams that are necessary for the good operation of elasticsearch and should not follow user defined retention to avoid surprises.
For this reason, we put forth the following definition of internal data streams.
A data stream is internal if it's either a system index (system flag is true) or if its name starts with a dot.
This PR adds the `isInternalDataStream` param in the effective retention calculation making explicit that this is also used to determine the effective retention.
* (Doc+) Link API to parent Doc part1
---------
Co-authored-by: shainaraskas <shaina.raskas@elastic.co>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Rather than initializing the failure store right away when a new
data stream is created, we leave it empty and mark it for lazy
rollover. This results in the failure store only being initialized
(i.e. an index created) when a failure has actually occurred.
The exception to the rule is when a failure occurs while the data
stream is being auto-created. In that case, we do want to initialize
the failure store right away.
We were seeing more and more common fields between "regular" backing indices and failure store indices (i.e. `indices`, `rolloverOnWrite`, `autoShardingEvent`). To avoid having to duplicate these fields (and possibly any future fields), we extract a class that contains these fields.
* Remove `es-test-dir` book-scoped variable
* Remove `plugins-examples-dir` book-scoped variable
* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables
- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem
* Replace `es-repo-dir` with `es-ref-dir`
* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
In this PR we introduce the API that will expose the global retention configuration and will allow users to take advantage of it.
These APIs are protected by the dedicated introduced privileges:
`manage_data_stream_global_retention` or higher, which allows all operations on the global retention configuration
`monitor_data_stream_retention` or higher, which allows the retrieval of the global retention configuration.
This PR is the final PR that makes the global retention available for our users.
Adds the ability to configure a data stream to create a new kind of backing index called a failure store which will eventually be used to store error information when ingest pipelines fail to ingest a document or when a document fails to be parsed correctly by the configured mapping on the data stream.
This releases the Data stream lifecycle feature as a
Technical Preview feature.
Data stream lifecycle, albeit in technical preview, will allow data streams
to take advantage of a native simplified and resilient lifecycle implementation.
Currently the `GET target/_lifecycle/explain` API only works for
indices. In this PR we extend this behaviour to allow the target to be a
data stream so we can get the overview lifecycle status for all the
backing indices of a data stream.
This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
In this PR we enable all new data streams to be managed by the data
stream lifecycle by default. This is implemented by adding an empty
`lifecycle: {}` upon new data stream creation.
Opting out is represented by a the `enabled` flag:
```
{
"lifecycle": {
"enabled": false
}
}
```
This change has the following implications on when is an index managed
and by which feature:
| Parent data stream lifecycle| ILM| `prefer_ilm`|Managed by|
|----------------------------|----|----------------|-| | default | yes|
true| ILM| | default | yes| false| data stream lifecycle| |default |
no|true/false|data stream lifecycle| |opt-out or
missing|yes|true/false|ILM| |opt-out or missing|no|true/false|unmanaged|
Data streams that have been created before the data stream lifecycle is
enabled will not have the default lifecycle.
Next steps: - We need to document this when the feature will be GA
(https://github.com/elastic/elasticsearch/issues/97973).