mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 09:54:06 -04:00
This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
142 lines
4.4 KiB
Text
142 lines
4.4 KiB
Text
[[search-aggregations-matrix-stats-aggregation]]
|
|
=== Matrix stats aggregation
|
|
++++
|
|
<titleabbrev>Matrix stats</titleabbrev>
|
|
++++
|
|
|
|
The `matrix_stats` aggregation is a numeric aggregation that computes the following statistics over a set of document fields:
|
|
|
|
[horizontal]
|
|
`count`:: Number of per field samples included in the calculation.
|
|
`mean`:: The average value for each field.
|
|
`variance`:: Per field Measurement for how spread out the samples are from the mean.
|
|
`skewness`:: Per field measurement quantifying the asymmetric distribution around the mean.
|
|
`kurtosis`:: Per field measurement quantifying the shape of the distribution.
|
|
`covariance`:: A matrix that quantitatively describes how changes in one field are associated with another.
|
|
`correlation`:: The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field
|
|
distributions.
|
|
|
|
IMPORTANT: Unlike other metric aggregations, the `matrix_stats` aggregation does
|
|
not support scripting.
|
|
|
|
//////////////////////////
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT /statistics/_doc/0
|
|
{"poverty": 24.0, "income": 50000.0}
|
|
|
|
PUT /statistics/_doc/1
|
|
{"poverty": 13.0, "income": 95687.0}
|
|
|
|
PUT /statistics/_doc/2
|
|
{"poverty": 69.0, "income": 7890.0}
|
|
|
|
POST /_refresh
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
// TESTSETUP
|
|
|
|
//////////////////////////
|
|
|
|
The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.
|
|
|
|
[source,console,id=stats-aggregation-example]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"aggs": {
|
|
"statistics": {
|
|
"matrix_stats": {
|
|
"fields": [ "poverty", "income" ]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[s/_search/_search\?filter_path=aggregations/]
|
|
|
|
The aggregation type is `matrix_stats` and the `fields` setting defines the set of fields (as an array) for computing
|
|
the statistics. The above request returns the following response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
...
|
|
"aggregations": {
|
|
"statistics": {
|
|
"doc_count": 50,
|
|
"fields": [ {
|
|
"name": "income",
|
|
"count": 50,
|
|
"mean": 51985.1,
|
|
"variance": 7.383377037755103E7,
|
|
"skewness": 0.5595114003506483,
|
|
"kurtosis": 2.5692365287787124,
|
|
"covariance": {
|
|
"income": 7.383377037755103E7,
|
|
"poverty": -21093.65836734694
|
|
},
|
|
"correlation": {
|
|
"income": 1.0,
|
|
"poverty": -0.8352655256272504
|
|
}
|
|
}, {
|
|
"name": "poverty",
|
|
"count": 50,
|
|
"mean": 12.732000000000001,
|
|
"variance": 8.637730612244896,
|
|
"skewness": 0.4516049811903419,
|
|
"kurtosis": 2.8615929677997767,
|
|
"covariance": {
|
|
"income": -21093.65836734694,
|
|
"poverty": 8.637730612244896
|
|
},
|
|
"correlation": {
|
|
"income": -0.8352655256272504,
|
|
"poverty": 1.0
|
|
}
|
|
} ]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/\.\.\.//]
|
|
// TESTRESPONSE[s/: (\-)?[0-9\.E]+/: $body.$_path/]
|
|
|
|
The `doc_count` field indicates the number of documents involved in the computation of the statistics.
|
|
|
|
==== Multi Value Fields
|
|
|
|
The `matrix_stats` aggregation treats each document field as an independent sample. The `mode` parameter controls what
|
|
array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:
|
|
|
|
[horizontal]
|
|
`avg`:: (default) Use the average of all values.
|
|
`min`:: Pick the lowest value.
|
|
`max`:: Pick the highest value.
|
|
`sum`:: Use the sum of all values.
|
|
`median`:: Use the median of all values.
|
|
|
|
==== Missing Values
|
|
|
|
The `missing` parameter defines how documents that are missing a value should be treated.
|
|
By default they will be ignored but it is also possible to treat them as if they had a value.
|
|
This is done by adding a set of fieldname : value mappings to specify default values per field.
|
|
|
|
[source,console,id=stats-aggregation-missing-example]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"aggs": {
|
|
"matrixstats": {
|
|
"matrix_stats": {
|
|
"fields": [ "poverty", "income" ],
|
|
"missing": { "income": 50000 } <1>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> Documents without a value in the `income` field will have the default value `50000`.
|