This replaces the `script` docs for bucket aggregations with runtime
fields. We expect runtime fields to be nicer to work with because you
can also fetch them or filter on them. We expect them to be faster
because their don't need this sort of `instanceof` tree:
a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)
Relates to #69291
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This commit allows for composite aggregations in datafeeds.
Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations.
The restrictions for this support are as follows:
- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).
Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary.
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.
This adds a heading for `shard_min_doc_count` and merges the paragraphs
for them. I wanted to link to this section earlier today and it wasn't a
"real" section so I couldn't.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
We expect runtime fields to perform a little better than our "native"
aggregation script so we should point folks to them instead of the
"native" aggregation script.
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.
Closes#65623
Its been several months and we haven't bumped into any good reason to
rework the variable width histogram. So let's drop experimental from it!
Closes#58573
In some cases when the rate aggregation is not a child of a date histogram
aggregation, it is not possible to determine the actual size of the date
histogram bucket. In this case the rate aggregation now throws an exception.
Closes#63703
Previously, geo_shape support was only mentioned in a dedicated x-pack
section. This may be misleading, as the introductory paragraph only
mentions geo_point.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.
#### specifics
A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.
the `sort` field would support any numeric field, including date.
#### sample usage
```
{
"query": {
"bool": {
"must": [
{ "term": { "person": "004" } },
{ "term": { "trajectory": "20090131002206.plt" } }
]
}
},
"aggs": {
"make_line": {
"geo_line": {
"point": {"field": "location"},
"sort": { "field": "timestamp" },
"include_sort": true,
"sort_order": "desc",
"size": 15
}
}
}
}
```
#### sample response
```
{
"took": 21,
"timed_out": false,
"_shards": {...},
"hits": {...},
"aggregations": {
"make_line": {
"type": "LineString",
"coordinates": [
[
121.52926194481552,
38.92878997139633
],
[
121.52922699227929,
38.92876998055726
],
]
}
}
}
```
#### visual response
<img width="540" alt="Screen Shot 2019-04-26 at 9 40 07 AM" src="https://user-images.githubusercontent.com/388837/56834977-cf278e00-6827-11e9-9c93-005ed48433cc.png">
#### limitations
Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.
One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore
closes#41649
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking
- Replaces more abstract docs about object structure and values source with task-based examples.
- Relocates several sections from the current `misc.asciidoc` file.
- Alphabetically sorts agg categories in the nav.
- Removes the matrix agg family. Moves the stats matrix agg under the metric agg family
Co-authored-by: debadair <debadair@elastic.co>
* Allow mixing set-based and regexp-based include and exclude
* Coding style
* Disallow having both set and regexp include (resp. exclude)
* Test correctness of every combination of include/exclude
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.