elasticsearch/docs/reference/esql/functions/examples/bucket.asciidoc
Bogdan Pintea bc3b629d8d
ESQL: Docs: add example of date bucketing with offset (#116680)
Add an example of how to create date histograms with an offset.

Fixes #114167
2024-12-18 17:12:14 +01:00

133 lines
5.4 KiB
Text
Generated

// This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
*Examples*
`BUCKET` can work in two modes: one in which the size of the bucket is computed
based on a buckets count recommendation (four parameters) and a range, and
another in which the bucket size is provided directly (two parameters).
Using a target number of buckets, a start of a range, and an end of a range,
`BUCKET` picks an appropriate bucket size to generate the target number of buckets or fewer.
For example, asking for at most 20 buckets over a year results in monthly buckets:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonth]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonth-result]
|===
The goal isn't to provide *exactly* the target number of buckets,
it's to pick a range that people are comfortable with that provides at most the target number of buckets.
Combine `BUCKET` with an <<esql-agg-functions,aggregation>> to create a histogram:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketMonthlyHistogram-result]
|===
NOTE: `BUCKET` does not create buckets that don't match any documents.
That's why this example is missing `1985-03-01` and other dates.
Asking for more buckets can result in a smaller range.
For example, asking for at most 100 buckets in a year results in weekly buckets:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogram]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogram-result]
|===
NOTE: `BUCKET` does not filter any rows. It only uses the provided range to pick a good bucket size.
For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range.
Combine`BUCKET` with <<esql-where>> to filter rows.
If the desired bucket size is known in advance, simply provide it as the second
argument, leaving the range out:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogramWithSpan]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketWeeklyHistogramWithSpan-result]
|===
NOTE: When providing the bucket size as the second parameter, it must be a time
duration or date period.
`BUCKET` can also operate on numeric fields. For example, to create a salary histogram:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumeric]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumeric-result]
|===
Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range.
You have to find the `min` and `max` separately. {esql} doesn't yet have an easy way to do that automatically.
The range can be omitted if the desired bucket size is known in advance. Simply
provide it as the second argument:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumericWithSpan]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketNumericWithSpan-result]
|===
Create hourly buckets for the last 24 hours, and calculate the number of events per hour:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=docsBucketLast24hr]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=docsBucketLast24hr-result]
|===
Create monthly buckets for the year 1985, and calculate the average salary by hiring month
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=bucket_in_agg-result]
|===
`BUCKET` may be used in both the aggregating and grouping part of the
<<esql-stats-by, STATS ... BY ...>> command provided that in the aggregating
part the function is referenced by an alias defined in the
grouping part, or that it is invoked with the exact same expression:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=reuseGroupingFunctionWithExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=reuseGroupingFunctionWithExpression-result]
|===
Sometimes you need to change the start value of each bucket by a given duration (similar to date histogram
aggregation's <<search-aggregations-bucket-histogram-aggregation,`offset`>> parameter). To do so, you will need to
take into account how the language handles expressions within the `STATS` command: if these contain functions or
arithmetic operators, a virtual `EVAL` is inserted before and/or after the `STATS` command. Consequently, a double
compensation is needed to adjust the bucketed date value before the aggregation and then again after. For instance,
inserting a negative offset of `1 hour` to buckets of `1 year` looks like this:
[source.merge.styled,esql]
----
include::{esql-specs}/bucket.csv-spec[tag=bucketWithOffset]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/bucket.csv-spec[tag=bucketWithOffset-result]
|===