elasticsearch/docs/reference/aggregations/bucket/adjacency-matrix-aggregation.asciidoc

152 lines
4.5 KiB
Text

[[search-aggregations-bucket-adjacency-matrix-aggregation]]
=== Adjacency matrix aggregation
++++
<titleabbrev>Adjacency matrix</titleabbrev>
++++
A bucket aggregation returning a form of {wikipedia}/Adjacency_matrix[adjacency matrix].
The request provides a collection of named filter expressions, similar to the `filters` aggregation
request.
Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
Given filters named `A`, `B` and `C` the response would return buckets with the following names:
[options="header"]
|=======================
| h|A h|B h|C
h|A |A |A&B |A&C
h|B | |B |B&C
h|C | | |C
|=======================
The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names with a default separator
of `&`. Note that the response does not also include a `C&A` bucket as this would be the
same set of documents as `A&C`. The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
the filter name strings and always use the lowest of a pair as the value to the left of the separator.
[[adjacency-matrix-agg-ex]]
==== Example
The following `interactions` aggregation uses `adjacency_matrix` to determine
which groups of individuals exchanged emails.
[source,console,id=adjacency-matrix-aggregation-example]
--------------------------------------------------
PUT emails/_bulk?refresh
{ "index" : { "_id" : 1 } }
{ "accounts" : ["hillary", "sidney"]}
{ "index" : { "_id" : 2 } }
{ "accounts" : ["hillary", "donald"]}
{ "index" : { "_id" : 3 } }
{ "accounts" : ["vladimir", "donald"]}
GET emails/_search
{
"size": 0,
"aggs" : {
"interactions" : {
"adjacency_matrix" : {
"filters" : {
"grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
"grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
"grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
}
}
}
}
}
--------------------------------------------------
The response contains buckets with document counts for each filter and
combination of filters. Buckets with no matching documents are excluded from the
response.
[source,console-result]
--------------------------------------------------
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"interactions": {
"buckets": [
{
"key":"grpA",
"doc_count": 2
},
{
"key":"grpA&grpB",
"doc_count": 1
},
{
"key":"grpB",
"doc_count": 2
},
{
"key":"grpB&grpC",
"doc_count": 1
},
{
"key":"grpC",
"doc_count": 1
}
]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 9/"took": $body.took/]
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
[role="child_attributes"]
[[adjacency-matrix-agg-params]]
==== Parameters
`filters`::
(Required, object)
Filters used to create buckets.
+
.Properties of `filters`
[%collapsible%open]
====
`<filter>`::
(Required, <<query-dsl,Query DSL object>>)
Query used to filter documents. The key is the filter name.
+
At least one filter is required. The total number of filters cannot exceed the
<<indices-query-bool-max-clause-count,`indices.query.bool.max_clause_count`>>
setting. See <<adjacency-matrix-agg-filter-limits>>.
====
`separator`::
(Optional, string)
Separator used to concatenate filter names. Defaults to `&`.
[[adjacency-matrix-agg-response]]
==== Response body
`key`::
(string)
Filters for the bucket. If the bucket uses multiple filters, filter names are
concatenated using a `separator`.
`document_count`::
(integer)
Number of documents matching the bucket's filters.
[[adjacency-matrix-agg-usage]]
==== Usage
On its own this aggregation can provide all of the data required to create an undirected weighted graph.
However, when used with child aggregations such as a `date_histogram` the results can provide the
additional levels of data required to perform {wikipedia}/Dynamic_network_analysis[dynamic network analysis]
where examining interactions _over time_ becomes important.
[[adjacency-matrix-agg-filter-limits]]
==== Filter limits
For N filters the matrix of buckets produced can be N²/2 which can be costly.
The circuit breaker settings prevent results producing too many buckets and to avoid excessive disk seeks
the `indices.query.bool.max_clause_count` setting is used to limit the number of filters.