elasticsearch/docs/reference/esql/functions/median-absolute-deviation.asciidoc
Abdon Pijpelink 980bc500b0
[DOCS] Support for nested functions in ES|QL STATS...BY (#104788)
* Document nested expressions for stats

* More docs

* Apply suggestions from review

- count-distinct.asciidoc
  - Content restructured, moving the section about approximate counts to end of doc.

- count.asciidoc
  - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.

- percentile.asciidoc
  - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.

- stats.asciidoc
  - Clarified the `STATS` command
  -  Added a note indicating that individual `null` values are skipped during aggregation

* Comment out mentioning a buggy behavior

* Update sum with inline function example, update test file

* Fix typo

* Delete line

* Simplify wording

* Fix conflict fix typo

---------

Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-01-30 19:29:12 +01:00

60 lines
1.8 KiB
Text

[discrete]
[[esql-agg-median-absolute-deviation]]
=== `MEDIAN_ABSOLUTE_DEVIATION`
*Syntax*
[source,esql]
----
MEDIAN_ABSOLUTE_DEVIATION(expression)
----
*Parameters*
`expression`::
Expression from which to return the median absolute deviation.
*Description*
Returns the median absolute deviation, a measure of variability. It is a robust
statistic, meaning that it is useful for describing data that may have outliers,
or may not be normally distributed. For such data it can be more descriptive
than standard deviation.
It is calculated as the median of each data point's deviation from the median of
the entire sample. That is, for a random variable `X`, the median absolute
deviation is `median(|median(X) - X|)`.
NOTE: Like <<esql-agg-percentile>>, `MEDIAN_ABSOLUTE_DEVIATION` is
<<esql-agg-percentile-approximate,usually approximate>>.
[WARNING]
====
`MEDIAN_ABSOLUTE_DEVIATION` is also {wikipedia}/Nondeterministic_algorithm[non-deterministic].
This means you can get slightly different results using the same data.
====
*Example*
[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=median-absolute-deviation-result]
|===
The expression can use inline functions. For example, to calculate the the
median absolute deviation of the maximum values of a multivalued column, first
use `MV_MAX` to get the maximum value per row, and use the result with the
`MEDIAN_ABSOLUTE_DEVIATION` function:
[source.merge.styled,esql]
----
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats_percentile.csv-spec[tag=docsStatsMADNestedExpression-result]
|===