mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
179 lines
No EOL
6.6 KiB
Text
179 lines
No EOL
6.6 KiB
Text
ifeval::["{release-state}"=="unreleased"]
|
|
[[downsampling]]
|
|
=== Downsampling a time series data stream
|
|
|
|
preview::[]
|
|
|
|
Downsampling provides a method to reduce the footprint of your <<tsds,time
|
|
series data>> by storing it at reduced granularity.
|
|
|
|
Metrics solutions collect large amounts of time series data that grow over time.
|
|
As that data ages, it becomes less relevant to the current state of the system.
|
|
The downsampling process rolls up documents within a fixed time interval into a
|
|
single summary document. Each summary document includes statistical
|
|
representations of the original data: the `min`, `max`, `sum`, `value_count`,
|
|
and `average` for each metric. Data stream <<time-series-dimension,time series
|
|
dimensions>> are stored unchanged.
|
|
|
|
Downsampling, in effect, lets you to trade data resolution and precision for
|
|
storage size. You can include it in an <<index-lifecycle-management,{ilm}
|
|
({ilm-init})>> policy to automatically manage the volume and associated cost of
|
|
your metrics data at it ages.
|
|
|
|
Check the following sections to learn more:
|
|
|
|
* <<how-downsampling-works>>
|
|
* <<running-downsampling>>
|
|
* <<querying-downsampled-indices>>
|
|
* <<downsampling-restrictions>>
|
|
* <<try-out-downsampling>>
|
|
|
|
[discrete]
|
|
[[how-downsampling-works]]
|
|
=== How it works
|
|
|
|
A <<time-series,time series>> is a sequence of observations taken over time for
|
|
a specific entity. The observed samples can be represented as a continuous
|
|
function, where the time series dimensions remain constant and the time series
|
|
metrics change over time.
|
|
|
|
//.Sampling a continuous function
|
|
image::images/data-streams/time-series-function.png[align="center"]
|
|
|
|
In an Elasticsearch index, a single document is created for each timestamp,
|
|
containing the immutable time series dimensions, together with the metrics names
|
|
and the changing metrics values. For a single timestamp, several time series
|
|
dimensions and metrics may be stored.
|
|
|
|
//.Metric anatomy
|
|
image::images/data-streams/time-series-metric-anatomy.png[align="center"]
|
|
|
|
For your most current and relevant data, the metrics series typically has a low
|
|
sampling time interval, so it's optimized for queries that require a high data
|
|
resolution.
|
|
|
|
.Original metrics series
|
|
image::images/data-streams/time-series-original.png[align="center"]
|
|
|
|
Downsampling works on older, less frequently accessed data by replacing the
|
|
original time series with both a data stream of a higher sampling interval and
|
|
statistical representations of that data. Where the original metrics samples may
|
|
have been taken, for example, every ten seconds, as the data ages you may choose
|
|
to reduce the sample granularity to hourly or daily. You may choose to reduce
|
|
the granularity of `cold` archival data to monthly or less.
|
|
|
|
.Downsampled metrics series
|
|
image::images/data-streams/time-series-downsampled.png[align="center"]
|
|
|
|
[discrete]
|
|
[[running-downsampling]]
|
|
=== Running downsampling on time series data
|
|
|
|
To downsample a time series index, use the
|
|
<<indices-downsample-data-stream,Downsample API>> and set `fixed_interval` to
|
|
the level of granularity that you'd like:
|
|
|
|
```
|
|
POST /<source_index>/_downsample/<new_index>
|
|
{
|
|
"fixed_interval": "1d"
|
|
}
|
|
```
|
|
|
|
To downsample time series data as part of ILM, include a
|
|
<<ilm-downsample,Downsample action>> in your ILM policy and set `fixed_interval`
|
|
to the level of granularity that you'd like:
|
|
|
|
```
|
|
PUT _ilm/policy/my_policy
|
|
{
|
|
"policy": {
|
|
"phases": {
|
|
"warm": {
|
|
"actions": {
|
|
"downsample" : {
|
|
"fixed_interval": "1h"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
[discrete]
|
|
[[querying-downsampled-indices]]
|
|
=== Querying downsampled indices
|
|
|
|
You can use the <<search-search,`_search`>> and <<async-search,`_async_search`>>
|
|
endpoints to query a downsampled index. Multiple raw data and downsampled
|
|
indices can be queried in a single request, and a single request can include
|
|
downsampled indices at different granularities (different bucket timespan). That
|
|
is, you can query data streams that contain downsampled indices with multiple
|
|
downsampling intervals (for example, `15m`, `1h`, `1d`).
|
|
|
|
The result of a time based histogram aggregation is in a uniform bucket size and
|
|
each downsampled index returns data ignoring the downsampling time interval. For
|
|
example, if you run a `date_histogram` aggregation with `"fixed_interval": "1m"`
|
|
on a downsampled index that has been downsampled at an hourly resolution
|
|
(`"fixed_interval": "1h"`), the query returns one bucket with all of the data at
|
|
minute 0, then 59 empty buckets, and then a bucket with data again for the next
|
|
hour.
|
|
|
|
NOTE:
|
|
|
|
There are a few things to note when querying downsampled indices:
|
|
|
|
* When you run queries in {kib} and through Elastic solutions, a normal
|
|
response is returned without notification that some of the queried indices are
|
|
downsampled.
|
|
* For
|
|
<<search-aggregations-bucket-datehistogram-aggregation,date histogram aggregations>>,
|
|
only `fixed_intervals` (and not calendar-aware intervals) are supported.
|
|
* Only Coordinated Universal Time (UTC) date-times are supported.
|
|
|
|
[discrete]
|
|
[[downsampling-restrictions]]
|
|
=== Restrictions and limitations
|
|
|
|
The following restrictions and limitations apply for downsampling:
|
|
|
|
* Only indices in a <<tsds,time series data stream>> are supported.
|
|
|
|
* Data is downsampled based on the time dimension only. All other dimensions are
|
|
copied to the new index without any modification.
|
|
|
|
* Within a data stream, a downsampled index replaces the original index and the
|
|
original index is deleted. Only one index can exist for a given time period.
|
|
|
|
* A source index must be in read-only mode for the downsampling process to
|
|
succeed. Check the <<downsampling-manual,Run downsampling manually>> example for
|
|
details.
|
|
|
|
* Downsampling data for the same period many times (downsampling of a
|
|
downsampled index) is supported. The downsampling interval must be a multiple of
|
|
the interval of the downsampled index.
|
|
|
|
* Downsampling is provided as an ILM action. See <<ilm-downsample,Downsample>>.
|
|
|
|
* The new, downsampled index is created on the data tier of the original index
|
|
and it inherits its settings (for example, the number of shards and replicas).
|
|
|
|
* The numeric `gauge` and `counter` <<mapping-field-meta,metric types>> are
|
|
supported.
|
|
|
|
* The downsampling configuration is extracted from the time series data stream
|
|
<<tsds-create-mappings-component-template,index mapping>>. The only additional
|
|
required setting is the downsampling `fixed_interval`.
|
|
|
|
[discrete]
|
|
[[try-out-downsampling]]
|
|
=== Try it out
|
|
|
|
To take downsampling for a test run, try our example of
|
|
<<downsampling-manual,running downsampling manually>>.
|
|
|
|
Downsampling can easily be added to your ILM policy. To learn how, try our
|
|
<<downsampling-ilm,Run downsampling with ILM>> example.
|
|
|
|
endif::[] |