elasticsearch/docs/reference/images/aggregations
Benjamin Trent b592d2bf01
New random_sampler aggregation for sampling documents in aggregations (#84363)
This adds a new sampling aggregation that performs a background sampling over all documents in an index. 

The syntax is as follows:
```
{
  "aggregations": {
    "sampling": {
      "random_sampler": {
        "probability": 0.1
      },
      "aggs": {
        "price_percentiles": {
          "percentiles": {
            "field": "taxful_total_price"
          }
        }
      }
    }
  }
}
```

This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations.

Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement.

Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.
2022-03-02 14:32:30 -05:00
..
random-sampler-agg-graph.png New random_sampler aggregation for sampling documents in aggregations (#84363) 2022-03-02 14:32:30 -05:00
relative-error-vs-doc-count.png New random_sampler aggregation for sampling documents in aggregations (#84363) 2022-03-02 14:32:30 -05:00