mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 09:54:06 -04:00
* Break out 'Limitations' into separate page * Add REST API docs * Restructure commands, functions, and operators refs * Add placeholder for getting started guide * Group 'Syntax', 'Metafields', and 'MV fields' under 'Language' * Add placeholder for Kibana page * Add link from landing page * Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs * Reword default LIMIT * Add support for COUNT(*) * Move 'Commands' and 'Functions and operators' to individual pages --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
46 lines
1.5 KiB
Text
46 lines
1.5 KiB
Text
[discrete]
|
|
[[esql-agg-count-distinct]]
|
|
=== `COUNT_DISTINCT`
|
|
The approximate number of distinct values.
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
|
|
|===
|
|
|
|
Can take any field type as input and the result is always a `long` not matter
|
|
the input type.
|
|
|
|
[discrete]
|
|
==== Counts are approximate
|
|
|
|
Computing exact counts requires loading values into a set and returning its
|
|
size. This doesn't scale when working on high-cardinality sets and/or large
|
|
values as the required memory usage and the need to communicate those
|
|
per-shard sets between nodes would utilize too many resources of the cluster.
|
|
|
|
This `COUNT_DISTINCT` function is based on the
|
|
https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
|
|
algorithm, which counts based on the hashes of the values with some interesting
|
|
properties:
|
|
|
|
include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
|
|
|
|
[discrete]
|
|
==== Precision is configurable
|
|
|
|
The `COUNT_DISTINCT` function takes an optional second parameter to configure the
|
|
precision discussed previously.
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
|
|
|===
|