mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
* Document nested expressions for stats * More docs * Apply suggestions from review - count-distinct.asciidoc - Content restructured, moving the section about approximate counts to end of doc. - count.asciidoc - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows. - percentile.asciidoc - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc. - stats.asciidoc - Clarified the `STATS` command - Added a note indicating that individual `null` values are skipped during aggregation * Comment out mentioning a buggy behavior * Update sum with inline function example, update test file * Fix typo * Delete line * Simplify wording * Fix conflict fix typo --------- Co-authored-by: Liam Thompson <leemthompo@gmail.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
60 lines
1.8 KiB
Text
60 lines
1.8 KiB
Text
[discrete]
|
|
[[esql-agg-median-absolute-deviation]]
|
|
=== `MEDIAN_ABSOLUTE_DEVIATION`
|
|
|
|
*Syntax*
|
|
|
|
[source,esql]
|
|
----
|
|
MEDIAN_ABSOLUTE_DEVIATION(expression)
|
|
----
|
|
|
|
*Parameters*
|
|
|
|
`expression`::
|
|
Expression from which to return the median absolute deviation.
|
|
|
|
*Description*
|
|
|
|
Returns the median absolute deviation, a measure of variability. It is a robust
|
|
statistic, meaning that it is useful for describing data that may have outliers,
|
|
or may not be normally distributed. For such data it can be more descriptive
|
|
than standard deviation.
|
|
|
|
It is calculated as the median of each data point's deviation from the median of
|
|
the entire sample. That is, for a random variable `X`, the median absolute
|
|
deviation is `median(|median(X) - X|)`.
|
|
|
|
NOTE: Like <<esql-agg-percentile>>, `MEDIAN_ABSOLUTE_DEVIATION` is
|
|
<<esql-agg-percentile-approximate,usually approximate>>.
|
|
|
|
[WARNING]
|
|
====
|
|
`MEDIAN_ABSOLUTE_DEVIATION` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
|
|
This means you can get slightly different results using the same data.
|
|
====
|
|
|
|
*Example*
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation-result]
|
|
|===
|
|
|
|
The expression can use inline functions. For example, to calculate the the
|
|
median absolute deviation of the maximum values of a multivalued column, first
|
|
use `MV_MAX` to get the maximum value per row, and use the result with the
|
|
`MEDIAN_ABSOLUTE_DEVIATION` function:
|
|
|
|
[source.merge.styled,esql]
|
|
----
|
|
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression]
|
|
----
|
|
[%header.monospaced.styled,format=dsv,separator=|]
|
|
|===
|
|
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression-result]
|
|
|===
|