mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
Added a requirement that index.look_ahead_time index setting can't be lower than time_series.poll_interval setting. Additional changes: * Fixed a mistake in the docs that referenced indices.lifecycle.poll_interval instead of time_series.poll_interval. * Moved index.look_ahead_time setting to data stream module.
308 lines
No EOL
12 KiB
Text
308 lines
No EOL
12 KiB
Text
ifeval::["{release-state}"=="unreleased"]
|
|
[[tsds]]
|
|
== Time series data stream (TSDS)
|
|
|
|
preview::[]
|
|
|
|
A time series data stream (TSDS) models timestamped metrics data as one or
|
|
more time series.
|
|
|
|
// TODO: Replace XX% with actual percentage
|
|
You can use a TSDS to store metrics data more efficiently. In our benchmarks,
|
|
metrics data stored in a TSDS used 44% less disk space than a regular data
|
|
stream.
|
|
|
|
[discrete]
|
|
[[when-to-use-tsds]]
|
|
=== When to use a TSDS
|
|
|
|
Both a <<data-streams,regular data stream>> and a TSDS can store timestamped
|
|
metrics data. Only use a TSDS if you typically add metrics data to {es} in near
|
|
real-time and `@timestamp` order.
|
|
|
|
A TSDS is only intended for metrics data. For other timestamped data, such as
|
|
logs or traces, use a regular data stream.
|
|
|
|
[discrete]
|
|
[[differences-from-regular-data-stream]]
|
|
=== Differences from a regular data stream
|
|
|
|
A TSDS works like a regular data stream with some key differences:
|
|
|
|
* The matching index template for a TSDS requires a `data_stream` object with
|
|
the <<time-series-mode,`index_mode: time_series`>> option. This option enables
|
|
most TSDS-related functionality.
|
|
|
|
* In addition to a `@timestamp`, each document in a TSDS must contain one or
|
|
more <<time-series-dimension,dimension fields>>. The matching index template for
|
|
a TSDS must contain mappings for at least one `keyword` dimension.
|
|
+
|
|
TSDS documents also typically
|
|
contain one or more <<time-series-metric,metric fields>>.
|
|
|
|
* {es} generates a hidden <<tsid,`_tsid`>> metadata field for each document in a
|
|
TSDS.
|
|
|
|
* A TSDS uses <<time-bound-indices,time-bound backing indices>> to store data
|
|
from the same time period in the same backing index.
|
|
|
|
* The matching index template for a TSDS must contain the `index.routing_path`
|
|
index setting. A TSDS uses this setting to perform
|
|
<<dimension-based-routing,dimension-based routing>>.
|
|
|
|
* A TSDS uses internal <<index-modules-index-sorting,index sorting>> to order
|
|
shard segments by `_tsid` and `@timestamp`.
|
|
|
|
* TSDS documents only support auto-generated document `_id` values. For TSDS
|
|
documents, the document `_id` is a hash of the document's dimensions and
|
|
`@timestamp`. A TSDS doesn't support custom document `_id` values.
|
|
|
|
[discrete]
|
|
[[time-series]]
|
|
=== What is a time series?
|
|
|
|
A time series is a sequence of observations for a specific entity. Together,
|
|
these observations let you track changes to the entity over time. For example, a
|
|
time series can track:
|
|
|
|
* CPU and disk usage for a computer
|
|
* The price of a stock
|
|
* Temperature and humidity readings from a weather sensor.
|
|
|
|
.Time series of weather sensor readings plotted as a graph
|
|
image::images/data-streams/time-series-chart.svg[align="center"]
|
|
|
|
In a TSDS, each {es} document represents an observation, or data point, in a
|
|
specific time series. Although a TSDS can contain multiple time series, a
|
|
document can only belong to one time series. A time series can't span multiple
|
|
data streams.
|
|
|
|
[discrete]
|
|
[[time-series-dimension]]
|
|
==== Dimensions
|
|
|
|
Dimensions are field names and values that, in combination, identify a
|
|
document's time series. In most cases, a dimension describes some aspect of the
|
|
entity you're measuring. For example, documents related to the same weather
|
|
sensor may always have the same `sensor_id` and `location` values.
|
|
|
|
A TSDS document is uniquely identified by its time series and timestamp, both of
|
|
which are used to generate the document `_id`. So, two documents with the same
|
|
dimensions and the same timestamp are considered to be duplicates. When you use
|
|
the `_bulk` endpoint to add documents to a TSDS, a second document with the same
|
|
timestamp and dimensions overwrites the first. When you use the
|
|
`PUT /<target>/_create/<_id>` format to add an individual document and a document
|
|
with the same `_id` already exists, an error is generated.
|
|
|
|
You mark a field as a dimension using the boolean `time_series_dimension`
|
|
mapping parameter. The following field types support the `time_series_dimension`
|
|
parameter:
|
|
|
|
* <<keyword-field-type,`keyword`>>
|
|
* <<ip,`ip`>>
|
|
* <<number,`byte`>>
|
|
* <<number,`short`>>
|
|
* <<number,`integer`>>
|
|
* <<number,`long`>>
|
|
* <<number,`unsigned_long`>>
|
|
|
|
[[dimension-limits]]
|
|
.Dimension limits
|
|
****
|
|
In a TSDS, {es} uses dimensions to
|
|
generate the document `_id` and <<tsid,`_tsid`>> values. The resulting `_id` is
|
|
always a short encoded hash. To prevent the `_tsid` value from being overly
|
|
large, {es} limits the number of dimensions for an index using the
|
|
<<index-mapping-dimension-fields-limit,`index.mapping.dimension_fields.limit`>>
|
|
index setting. While you can increase this limit, the resulting document `_tsid`
|
|
value can't exceed 32KB.
|
|
****
|
|
|
|
[discrete]
|
|
[[time-series-metric]]
|
|
==== Metrics
|
|
|
|
Metrics are fields that contain numeric measurements, as well as aggregations
|
|
and/or downsampling values based off of those measurements. While not required,
|
|
documents in a TSDS typically contain one or more metric fields.
|
|
|
|
Metrics differ from dimensions in that while dimensions generally remain
|
|
constant, metrics are expected to change over time, even if rarely or slowly.
|
|
|
|
To mark a field as a metric, you must specify a metric type using the
|
|
`time_series_metric` mapping parameter. The following field types support the
|
|
`time_series_metric` parameter:
|
|
|
|
* <<aggregate-metric-double,`aggregate_metric_double`>>
|
|
* <<histogram,`histogram`>>
|
|
* All <<number,numeric field types>>
|
|
|
|
Accepted metric types vary based on the field type:
|
|
|
|
.Valid values for `time_series_metric`
|
|
[%collapsible%open]
|
|
====
|
|
// tag::time-series-metric-counter[]
|
|
`counter`:: A number that only increases or resets to `0` (zero). For
|
|
example, a count of errors or completed tasks.
|
|
// end::time-series-metric-counter[]
|
|
+
|
|
Only numeric and `aggregate_metric_double` fields support the `counter` metric
|
|
type.
|
|
|
|
// tag::time-series-metric-gauge[]
|
|
`gauge`:: A number that can increase or decrease. For example, a temperature or
|
|
available disk space.
|
|
// end::time-series-metric-gauge[]
|
|
+
|
|
Only numeric and `aggregate_metric_double` fields support the `gauge` metric
|
|
type.
|
|
|
|
// tag::time-series-metric-histogram[]
|
|
`histogram`:: A pair of numeric arrays that measure the distribution of values
|
|
across predefined buckets. For example, server response times by percentile.
|
|
// end::time-series-metric-histogram[]
|
|
+
|
|
Only `histogram` fields support the `histogram` metric type.
|
|
|
|
// tag::time-series-metric-summary[]
|
|
`summary`:: An array of aggregated values, such as `sum`, `avg`, `value_count`,
|
|
`min`, and `max`.
|
|
// end::time-series-metric-summary[]
|
|
+
|
|
Only `aggregate_metric_double` fields support the `gauge` metric type.
|
|
|
|
// tag::time-series-metric-null[]
|
|
`null` (Default):: Not a time series metric.
|
|
// end::time-series-metric-null[]
|
|
====
|
|
|
|
[discrete]
|
|
[[time-series-mode]]
|
|
=== Time series mode
|
|
|
|
The matching index template for a TSDS must contain a `data_stream` object with
|
|
the `index_mode: time_series` option. This option ensures the TSDS creates
|
|
backing indices with an <<index-mode,`index.mode`>> setting of `time_series`.
|
|
This setting enables most TSDS-related functionality in the backing indices.
|
|
|
|
If you convert an existing data stream to a TSDS, only backing indices created
|
|
after the conversion have an `index.mode` of `time_series`. You can't
|
|
change the `index.mode` of an existing backing index.
|
|
|
|
[discrete]
|
|
[[tsid]]
|
|
==== `_tsid` metadata field
|
|
|
|
When you add a document to a TSDS, {es} automatically generates a `_tsid`
|
|
metadata field for the document. The `_tsid` is an object containing the
|
|
document's dimensions. Documents in the same TSDS with the same `_tsid` are part
|
|
of the same time series.
|
|
|
|
The `_tsid` field is not queryable or updatable. You also can't retrieve a
|
|
document's `_tsid` using a <<docs-get,get document>> request. However, you can
|
|
use the `_tsid` field in aggregations and retrieve the `_tsid` value in searches
|
|
using the <<search-fields-param,`fields` parameter>>.
|
|
|
|
[discrete]
|
|
[[time-bound-indices]]
|
|
==== Time-bound indices
|
|
|
|
In a TSDS, each backing index, including the most recent backing index, has a
|
|
range of accepted `@timestamp` values. This range is defined by the
|
|
<<index-time-series-start-time,`index.time_series.start_time`>> and
|
|
<<index-time-series-end-time,`index.time_series.end_time`>> index settings.
|
|
|
|
When you add a document to a TSDS, {es} adds the document to the appropriate
|
|
backing index based on its `@timestamp` value. As a result, a TSDS can add
|
|
documents to any TSDS backing index that can receive writes. This applies even
|
|
if the index isn't the most recent backing index.
|
|
|
|
image::images/data-streams/time-bound-indices.svg[align="center"]
|
|
|
|
TIP: Some {ilm-init} actions, such as <<ilm-forcemerge,`forcemerge`>>,
|
|
<<ilm-shrink,`shrink`>>, and <<ilm-searchable-snapshot,`searchable_snapshot`>>,
|
|
make a backing index read-only. You cannot add documents to read-only indices.
|
|
Keep this in mind when defining the index lifecycle policy for your TSDS.
|
|
|
|
If no backing index can accept a document's `@timestamp` value, {es} rejects the
|
|
document.
|
|
|
|
|
|
{es} automatically configures `index.time_series.start_time` and
|
|
`index.time_series.end_time` settings as part of the index creation and rollover
|
|
process.
|
|
|
|
[discrete]
|
|
[[tsds-look-ahead-time]]
|
|
==== Look-ahead time
|
|
|
|
Use the <<index-look-ahead-time,`index.look_ahead_time`>> index setting to
|
|
configure how far into the future you can add documents to an index. When you
|
|
create a new write index for a TSDS, {es} calculates the index's
|
|
`index.time_series.end_time` value as:
|
|
|
|
`now + index.look_ahead_time`
|
|
|
|
At the time series poll interval (controlled via `time_series.poll_interval` setting),
|
|
{es} checks if the write index has met the rollover criteria in its index
|
|
lifecycle policy. If not, {es} refreshes the `now` value and updates the write
|
|
index's `index.time_series.end_time` to:
|
|
|
|
`now + index.look_ahead_time + time_series.poll_interval`
|
|
|
|
This process continues until the write index rolls over. When the index rolls
|
|
over, {es} sets a final `index.time_series.end_time` value for the index. This
|
|
value borders the `index.time_series.start_time` for the new write index. This
|
|
ensures the `@timestamp` ranges for neighboring backing indices always border
|
|
but never overlap.
|
|
|
|
[discrete]
|
|
[[dimension-based-routing]]
|
|
==== Dimension-based routing
|
|
|
|
Within each TSDS backing index, {es} uses the
|
|
<<index-routing-path,`index.routing_path`>> index setting to route documents
|
|
with the same dimensions to the same shards.
|
|
|
|
When you create the matching index template for a TSDS, you must specify one or
|
|
more dimensions in the `index.routing_path` setting. Each document in a TSDS
|
|
must contain one or more dimensions that match the `index.routing_path` setting.
|
|
|
|
Dimensions in the `index.routing_path` setting must be plain `keyword` fields.
|
|
The `index.routing_path` setting accepts wildcard patterns (for example `dim.*`)
|
|
and can dynamically match new fields. However, {es} will reject any mapping
|
|
updates that add scripted, runtime, or non-dimension, non-`keyword` fields that
|
|
match the `index.routing_path` value.
|
|
|
|
TSDS documents don't support a custom `_routing` value. Similarly, you can't
|
|
require a `_routing` value in mappings for a TSDS.
|
|
|
|
[discrete]
|
|
[[tsds-index-sorting]]
|
|
==== Index sorting
|
|
|
|
{es} uses <<index-codec,compression algorithms>> to compress repeated values.
|
|
This compression works best when repeated values are stored near each other — in
|
|
the same index, on the same shard, and side-by-side in the same shard segment.
|
|
|
|
Most time series data contains repeated values. Dimensions are repeated across
|
|
documents in the same time series. The metric values of a time series may also
|
|
change slowly over time.
|
|
|
|
Internally, each TSDS backing index uses <<index-modules-index-sorting,index
|
|
sorting>> to order its shard segments by `_tsid` and `@timestamp`. This makes it
|
|
more likely that these repeated values are stored near each other for better
|
|
compression. A TSDS doesn't support any
|
|
<<index-modules-index-sorting,`index.sort.*`>> index settings.
|
|
|
|
[discrete]
|
|
[[tsds-whats-next]]
|
|
=== What's next?
|
|
|
|
Now that you know the basics, you're ready to <<set-up-tsds,create a TSDS>> or
|
|
<<set-up-tsds,convert an existing data stream to a TSDS>>.
|
|
|
|
include::set-up-tsds.asciidoc[]
|
|
include::tsds-index-settings.asciidoc[]
|
|
endif::[] |