mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
* Update licensing; fix screenshots; edit generally * Small edit for clarity and style * Update docs/reference/index-modules.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Apply changes from review Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Address review comments * Match similar change from review * More changes from review * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Update docs/reference/data-streams/logs.asciidoc Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> * Apply suggestions from review * Change to general subscription note * Apply suggestions from review Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com> * Apply suggestions from review Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com> * Apply suggestions from review; additional edits * Apply suggestions from review; clarity tweaks * Restore previous paragraph structure and context --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com> Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>
225 lines
10 KiB
Text
225 lines
10 KiB
Text
[[logs-data-stream]]
|
|
== Logs data stream
|
|
|
|
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
|
|
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
|
|
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
|
|
|
|
A logs data stream is a data stream type that stores log data more efficiently.
|
|
|
|
In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data
|
|
stream. The exact impact varies by data set.
|
|
|
|
[discrete]
|
|
[[how-to-use-logsds]]
|
|
=== Create a logs data stream
|
|
|
|
To create a logs data stream, set your <<index-templates,template>> `index.mode` to `logsdb`:
|
|
|
|
[source,console]
|
|
----
|
|
PUT _index_template/my-index-template
|
|
{
|
|
"index_patterns": ["logs-*"],
|
|
"data_stream": { },
|
|
"template": {
|
|
"settings": {
|
|
"index.mode": "logsdb" <1>
|
|
}
|
|
},
|
|
"priority": 101 <2>
|
|
}
|
|
----
|
|
// TEST
|
|
|
|
<1> The index mode setting.
|
|
<2> The index template priority. By default, Elasticsearch ships with a `logs-*-*` index template with a priority of 100. To make sure your index template takes priority over the default `logs-*-*` template, set its `priority` to a number higher than 100. For more information, see <<avoid-index-pattern-collisions,Avoid index pattern collisions>>.
|
|
|
|
After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and <<use-a-data-stream,using the data stream>>.
|
|
|
|
You can also set the index mode and adjust other template settings in <<index-mgmt,the Elastic UI>>.
|
|
|
|
////
|
|
[source,console]
|
|
----
|
|
DELETE _index_template/my-index-template
|
|
----
|
|
// TEST[continued]
|
|
////
|
|
|
|
[[logsdb-default-settings]]
|
|
|
|
[discrete]
|
|
[[logsdb-synthetic-source]]
|
|
=== Synthetic source
|
|
|
|
If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
|
|
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
|
|
|
|
If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.
|
|
|
|
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
|
|
|
|
When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
|
|
are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
|
|
which retains both duplicate values and the order of entries. However, the exact structure of
|
|
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
|
|
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
|
|
|
|
[discrete]
|
|
[[logsdb-sort-settings]]
|
|
=== Index sort settings
|
|
|
|
In `logsdb` index mode, the following sort settings are applied by default:
|
|
|
|
`index.sort.field`: `["host.name", "@timestamp"]`::
|
|
Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.
|
|
|
|
`index.sort.order`: `["desc", "desc"]`::
|
|
Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.
|
|
|
|
`index.sort.mode`: `["min", "min"]`::
|
|
The `min` mode sorts indices by the minimum value of multi-value fields.
|
|
|
|
`index.sort.missing`: `["_first", "_first"]`::
|
|
Missing values are sorted to appear `_first`.
|
|
|
|
You can override these default sort settings. For example, to sort on different fields
|
|
and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
|
|
<<index-modules-index-sorting>>.
|
|
|
|
When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.
|
|
|
|
NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
|
|
named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
|
|
a single `host.name` field is mapped as a `keyword` field.
|
|
|
|
To apply different sort settings to an existing data stream, update the data stream's component templates, and then
|
|
perform or wait for a <<data-streams-rollover,rollover>>.
|
|
|
|
NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
|
|
automatically added to the list of sort fields.
|
|
|
|
[discrete]
|
|
[[logsdb-host-name]]
|
|
==== Existing data streams
|
|
|
|
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
|
|
|
|
To avoid mapping conflicts, consider these options:
|
|
|
|
* **Adjust mappings:** Check your existing mappings to ensure that `host.name` is mapped as a keyword.
|
|
|
|
* **Change sorting:** If needed, you can remove `host.name` from the sort settings and use a different set of fields. Sorting by `@timestamp` can be a good fallback.
|
|
|
|
* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.
|
|
|
|
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
|
|
|
|
[discrete]
|
|
[[logsdb-specialized-codecs]]
|
|
=== Specialized codecs
|
|
|
|
By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
|
|
compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.
|
|
|
|
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
|
|
encoded using the following sequence of codecs:
|
|
|
|
* **Delta encoding**:
|
|
Stores the difference between consecutive values instead of the actual values.
|
|
|
|
* **Offset encoding**:
|
|
Stores the difference from a base value rather than between consecutive values.
|
|
|
|
* **Greatest Common Divisor (GCD) encoding**:
|
|
Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD.
|
|
|
|
* **Frame Of Reference (FOR) encoding**:
|
|
Determines the smallest number of bits required to encode a block of values and uses
|
|
bit-packing to fit such values into larger 64-bit blocks.
|
|
|
|
Each encoding is evaluated according to heuristics determined by the data distribution.
|
|
For example, the algorithm checks whether the data is monotonically non-decreasing or
|
|
non-increasing. If so, delta encoding is applied; otherwise, the process
|
|
continues with the next encoding method (offset).
|
|
|
|
Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment
|
|
might use a different encoding than the original segments, depending on the characteristics of the merged data.
|
|
|
|
For keyword fields, **Run Length Encoding (RLE)** is applied to the ordinals, which represent positions in the Lucene
|
|
segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword.
|
|
|
|
[discrete]
|
|
[[logsdb-ignored-settings]]
|
|
=== `ignore` settings
|
|
|
|
The `logsdb` index mode uses the following `ignore` settings. You can override these settings as needed.
|
|
|
|
[discrete]
|
|
[[logsdb-ignore-malformed]]
|
|
==== `ignore_malformed`
|
|
|
|
By default, `logsdb` index mode sets `ignore_malformed` to `true`. With this setting, documents with malformed fields
|
|
can be indexed without causing ingestion failures.
|
|
|
|
[discrete]
|
|
[[logs-db-ignore-above]]
|
|
==== `ignore_above`
|
|
|
|
In `logsdb` index mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure
|
|
efficient storage and indexing of large keyword fields.The index-level default for `ignore_above` is 8191
|
|
_characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding.
|
|
|
|
The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
|
|
defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
|
|
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
|
|
|
|
If you need to customize the limit, you can override it at the mapping level or change the index level default.
|
|
|
|
[discrete]
|
|
[[logs-db-ignore-limit]]
|
|
==== `ignore_dynamic_beyond_limit`
|
|
|
|
In `logsdb` index mode, the setting `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` by
|
|
default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the `index.mapping.total_fields.limit`. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue.
|
|
|
|
NOTE: When automatically injected, `host.name` and `@timestamp` count toward the limit of mapped fields. If `host.name` is mapped with `subobjects: true`, it has two fields. When mapped with `subobjects: false`, `host.name` has only one field.
|
|
|
|
[discrete]
|
|
[[logsdb-nodocvalue-fields]]
|
|
=== Fields without `doc_values`
|
|
|
|
When the `logsdb` index mode uses synthetic `_source` and `doc_values` are disabled for a field in the mapping,
|
|
{es} might set the `store` setting to `true` for that field. This ensures that the field's
|
|
data remains accessible for reconstructing the document's source when using
|
|
<<synthetic-source,synthetic source>>.
|
|
|
|
For example, this adjustment occurs with text fields when `store` is `false` and no suitable multi-field is available for
|
|
reconstructing the original value.
|
|
|
|
[discrete]
|
|
[[logsdb-settings-summary]]
|
|
=== Settings reference
|
|
|
|
The `logsdb` index mode uses the following settings:
|
|
|
|
* **`index.mode`**: `"logsdb"`
|
|
|
|
* **`index.mapping.synthetic_source_keep`**: `"arrays"`
|
|
|
|
* **`index.sort.field`**: `["host.name", "@timestamp"]`
|
|
|
|
* **`index.sort.order`**: `["desc", "desc"]`
|
|
|
|
* **`index.sort.mode`**: `["min", "min"]`
|
|
|
|
* **`index.sort.missing`**: `["_first", "_first"]`
|
|
|
|
* **`index.codec`**: `"best_compression"`
|
|
|
|
* **`index.mapping.ignore_malformed`**: `true`
|
|
|
|
* **`index.mapping.ignore_above`**: `8191`
|
|
|
|
* **`index.mapping.total_fields.ignore_dynamic_beyond_limit`**: `true`
|