elasticsearch/docs/reference/query-languages/esql/_snippets/functions/bucket.md at de448aa51bce51a0a29307b95311d3399171dc2f

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 09:54:06 -04:00

Colleen McGinnis b7e3a1e14b

[docs] Migrate docs from AsciiDoc to Markdown (#123507 )

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648d.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

2025-02-27 17:56:14 +01:00

11 KiB

Raw Blame History

`BUCKET` [esql-bucket]

Syntax

:::{image} ../../../../../images/bucket.svg :alt: Embedded :class: text-center :::

Parameters

field: Numeric or date expression from which to derive buckets.
buckets: Target number of buckets, or desired bucket size if from and to parameters are omitted.
from: Start of the range. Can be a number, a date or a date expressed as a string.
to: End of the range. Can be a number, a date or a date expressed as a string.

Description

Creates groups of values - buckets - out of a datetime or numeric input. The size of the buckets can either be provided directly, or chosen based on a recommended count and values range.

Supported types

field	buckets	from	to	result
date	date_period			date
date	integer	date	date	date
date	integer	date	keyword	date
date	integer	date	text	date
date	integer	keyword	date	date
date	integer	keyword	keyword	date
date	integer	keyword	text	date
date	integer	text	date	date
date	integer	text	keyword	date
date	integer	text	text	date
date	time_duration			date
date_nanos	date_period			date_nanos
date_nanos	integer	date	date	date_nanos
date_nanos	integer	date	keyword	date_nanos
date_nanos	integer	date	text	date_nanos
date_nanos	integer	keyword	date	date_nanos
date_nanos	integer	keyword	keyword	date_nanos
date_nanos	integer	keyword	text	date_nanos
date_nanos	integer	text	date	date_nanos
date_nanos	integer	text	keyword	date_nanos
date_nanos	integer	text	text	date_nanos
date_nanos	time_duration			date_nanos
double	double			double
double	integer	double	double	double
double	integer	double	integer	double
double	integer	double	long	double
double	integer	integer	double	double
double	integer	integer	integer	double
double	integer	integer	long	double
double	integer	long	double	double
double	integer	long	integer	double
double	integer	long	long	double
double	integer			double
double	long			double
integer	double			double
integer	integer	double	double	double
integer	integer	double	integer	double
integer	integer	double	long	double
integer	integer	integer	double	double
integer	integer	integer	integer	double
integer	integer	integer	long	double
integer	integer	long	double	double
integer	integer	long	integer	double
integer	integer	long	long	double
integer	integer			double
integer	long			double
long	double			double
long	integer	double	double	double
long	integer	double	integer	double
long	integer	double	long	double
long	integer	integer	double	double
long	integer	integer	integer	double
long	integer	integer	long	double
long	integer	long	double	double
long	integer	long	integer	double
long	integer	long	long	double
long	integer			double
long	long			double

Examples

BUCKET can work in two modes: one in which the size of the bucket is computed based on a buckets count recommendation (four parameters) and a range, and another in which the bucket size is provided directly (two parameters).

Using a target number of buckets, a start of a range, and an end of a range, BUCKET picks an appropriate bucket size to generate the target number of buckets or fewer. For example, asking for at most 20 buckets over a year results in monthly buckets:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hire_date = MV_SORT(VALUES(hire_date)) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT hire_date

hire_date:date	month:date
[1985-02-18T00:00:00.000Z, 1985-02-24T00:00:00.000Z]	1985-02-01T00:00:00.000Z
1985-05-13T00:00:00.000Z	1985-05-01T00:00:00.000Z
1985-07-09T00:00:00.000Z	1985-07-01T00:00:00.000Z
1985-09-17T00:00:00.000Z	1985-09-01T00:00:00.000Z
[1985-10-14T00:00:00.000Z, 1985-10-20T00:00:00.000Z]	1985-10-01T00:00:00.000Z
[1985-11-19T00:00:00.000Z, 1985-11-20T00:00:00.000Z, 1985-11-21T00:00:00.000Z]	1985-11-01T00:00:00.000Z

The goal isn’t to provide exactly the target number of buckets, it’s to pick a range that people are comfortable with that provides at most the target number of buckets.

Combine BUCKET with an aggregation to create a histogram:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_month = COUNT(*) BY month = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT month

hires_per_month:long	month:date
2	1985-02-01T00:00:00.000Z
1	1985-05-01T00:00:00.000Z
1	1985-07-01T00:00:00.000Z
1	1985-09-01T00:00:00.000Z
2	1985-10-01T00:00:00.000Z
4	1985-11-01T00:00:00.000Z

::::{note} BUCKET does not create buckets that don’t match any documents. That’s why this example is missing 1985-03-01 and other dates. ::::

Asking for more buckets can result in a smaller range. For example, asking for at most 100 buckets in a year results in weekly buckets:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 100, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT week

hires_per_week:long	week:date
2	1985-02-18T00:00:00.000Z
1	1985-05-13T00:00:00.000Z
1	1985-07-08T00:00:00.000Z
1	1985-09-16T00:00:00.000Z
2	1985-10-14T00:00:00.000Z
4	1985-11-18T00:00:00.000Z

::::{note} BUCKET does not filter any rows. It only uses the provided range to pick a good bucket size. For rows with a value outside of the range, it returns a bucket value that corresponds to a bucket outside the range. CombineBUCKET with WHERE to filter rows. ::::

If the desired bucket size is known in advance, simply provide it as the second argument, leaving the range out:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS hires_per_week = COUNT(*) BY week = BUCKET(hire_date, 1 week)
| SORT week

hires_per_week:long	week:date
2	1985-02-18T00:00:00.000Z
1	1985-05-13T00:00:00.000Z
1	1985-07-08T00:00:00.000Z
1	1985-09-16T00:00:00.000Z
2	1985-10-14T00:00:00.000Z
4	1985-11-18T00:00:00.000Z

::::{note} When providing the bucket size as the second parameter, it must be a time duration or date period. ::::

BUCKET can also operate on numeric fields. For example, to create a salary histogram:

FROM employees
| STATS COUNT(*) by bs = BUCKET(salary, 20, 25324, 74999)
| SORT bs

COUNT(*):long	bs:double
9	25000.0
9	30000.0
18	35000.0
11	40000.0
11	45000.0
10	50000.0
7	55000.0
9	60000.0
8	65000.0
8	70000.0

Unlike the earlier example that intentionally filters on a date range, you rarely want to filter on a numeric range. You have to find the min and max separately. {{esql}} doesn’t yet have an easy way to do that automatically.

The range can be omitted if the desired bucket size is known in advance. Simply provide it as the second argument:

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS c = COUNT(1) BY b = BUCKET(salary, 5000.)
| SORT b

c:long	b:double
1	25000.0
1	30000.0
1	40000.0
2	45000.0
2	50000.0
1	55000.0
1	60000.0
1	65000.0
1	70000.0

Create hourly buckets for the last 24 hours, and calculate the number of events per hour:

FROM sample_data
| WHERE @timestamp >= NOW() - 1 day and @timestamp < NOW()
| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 25, NOW() - 1 day, NOW())

COUNT(*):long	bucket:date

Create monthly buckets for the year 1985, and calculate the average salary by hiring month

FROM employees
| WHERE hire_date >= "1985-01-01T00:00:00Z" AND hire_date < "1986-01-01T00:00:00Z"
| STATS AVG(salary) BY bucket = BUCKET(hire_date, 20, "1985-01-01T00:00:00Z", "1986-01-01T00:00:00Z")
| SORT bucket

AVG(salary):double	bucket:date
46305.0	1985-02-01T00:00:00.000Z
44817.0	1985-05-01T00:00:00.000Z
62405.0	1985-07-01T00:00:00.000Z
49095.0	1985-09-01T00:00:00.000Z
51532.0	1985-10-01T00:00:00.000Z
54539.75	1985-11-01T00:00:00.000Z

BUCKET may be used in both the aggregating and grouping part of the STATS … BY … command provided that in the aggregating part the function is referenced by an alias defined in the grouping part, or that it is invoked with the exact same expression:

FROM employees
| STATS s1 = b1 + 1, s2 = BUCKET(salary / 1000 + 999, 50.) + 2 BY b1 = BUCKET(salary / 100 + 99, 50.), b2 = BUCKET(salary / 1000 + 999, 50.)
| SORT b1, b2
| KEEP s1, b1, s2, b2

s1:double	b1:double	s2:double	b2:double
351.0	350.0	1002.0	1000.0
401.0	400.0	1002.0	1000.0
451.0	450.0	1002.0	1000.0
501.0	500.0	1002.0	1000.0
551.0	550.0	1002.0	1000.0
601.0	600.0	1002.0	1000.0
601.0	600.0	1052.0	1050.0
651.0	650.0	1052.0	1050.0
701.0	700.0	1052.0	1050.0
751.0	750.0	1052.0	1050.0
801.0	800.0	1052.0	1050.0

Sometimes you need to change the start value of each bucket by a given duration (similar to date histogram aggregation’s offset parameter). To do so, you will need to take into account how the language handles expressions within the STATS command: if these contain functions or arithmetic operators, a virtual EVAL is inserted before and/or after the STATS command. Consequently, a double compensation is needed to adjust the bucketed date value before the aggregation and then again after. For instance, inserting a negative offset of 1 hour to buckets of 1 year looks like this:

FROM employees
| STATS dates = MV_SORT(VALUES(birth_date)) BY b = BUCKET(birth_date + 1 HOUR, 1 YEAR) - 1 HOUR
| EVAL d_count = MV_COUNT(dates)
| SORT d_count, b
| LIMIT 3

dates:date	b:date	d_count:integer
1965-01-03T00:00:00.000Z	1964-12-31T23:00:00.000Z	1
[1955-01-21T00:00:00.000Z, 1955-08-20T00:00:00.000Z, 1955-08-28T00:00:00.000Z, 1955-10-04T00:00:00.000Z]	1954-12-31T23:00:00.000Z	4
[1957-04-04T00:00:00.000Z, 1957-05-23T00:00:00.000Z, 1957-05-25T00:00:00.000Z, 1957-12-03T00:00:00.000Z]	1956-12-31T23:00:00.000Z	4

11 KiB Raw Blame History Unescape Escape

BUCKET [esql-bucket]

11 KiB

Raw Blame History

`BUCKET` [esql-bucket]