elasticsearch/docs/reference/esql/aggregation-functions/count-distinct.asciidoc
Nik Everett 82d67dc289 Docs for aggregation functions (ESQL-1268)
This adds docs for all of ESQL's aggregation functions. Hopefully from
here on out we can add the docs as we add new functions.

I've created a few tagged regions in the aggs docs themselves so we can
include them into the ESQL docs.

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-06-14 09:23:34 -05:00

43 lines
1.5 KiB
Text

[[esql-agg-count-distinct]]
=== `COUNT_DISTINCT`
The approximate number of distinct values.
[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
|===
Can take any field type as input and the result is always a `long` not matter
the input type.
==== Counts are approximate
Computing exact counts requires loading values into a set and returning its
size. This doesn't scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.
This `COUNT_DISTINCT` function is based on the
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
algorithm, which counts based on the hashes of the values with some interesting
properties:
include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
==== Precision is configurable
The `COUNT_DISTINCT` function takes an optional second parameter to configure the
precision discussed previously.
[source.merge.styled,esql]
----
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
|===