[discrete] [[esql-agg-count-distinct]] === `COUNT_DISTINCT` *Syntax* [source,esql] ---- COUNT_DISTINCT(expression[, precision_threshold]) ---- *Parameters* `expression`:: Expression that outputs the values on which to perform a distinct count. `precision_threshold`:: Precision threshold. Refer to <>. The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000. *Description* Returns the approximate number of distinct values. *Supported types* Can take any field type as input. *Examples* [source.merge.styled,esql] ---- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result] |=== With the optional second parameter to configure the precision threshold: [source.merge.styled,esql] ---- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result] |=== The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values: [source.merge.styled,esql] ---- include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression] ---- [%header.monospaced.styled,format=dsv,separator=|] |=== include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result] |=== [discrete] [[esql-agg-count-distinct-approximate]] ==== Counts are approximate Computing exact counts requires loading values into a set and returning its size. This doesn't scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster. This `COUNT_DISTINCT` function is based on the https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++] algorithm, which counts based on the hashes of the values with some interesting properties: include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation] The `COUNT_DISTINCT` function takes an optional second parameter to configure the precision threshold. The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is `3000`.