Commit graph

14 commits

Author SHA1 Message Date
James Rodewig
67f113314d
[DOCS] Fix acasting for agg types (#67469) 2021-01-13 14:44:54 -05:00
Tal Levy
b514d9bf2e
Add geo_line aggregation (#41612)
A metric aggregation that aggregates a set of points as 
a GeoJSON LineString ordered by some sort parameter.

#### specifics

A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString, 
while the `sort` values will be used as the total ordering of the points.

the `sort` field would support any numeric field, including date.

#### sample usage

```
{
	"query": {
		"bool": {
			"must": [
				{ "term": { "person": "004" } },
				{ "term": { "trajectory": "20090131002206.plt" } }
			]
		}
	},
	"aggs": {
		"make_line": {
			"geo_line": {
				"point": {"field": "location"},
				"sort": { "field": "timestamp" },
                                "include_sort": true,
                                "sort_order": "desc",
                                "size": 15
			}
		}
	}
}
```

#### sample response

```
{
    "took": 21,
    "timed_out": false,
    "_shards": {...},
    "hits": {...},
    "aggregations": {
        "make_line": {
            "type": "LineString",
            "coordinates": [
                [
                    121.52926194481552,
                    38.92878997139633
                ],
                [
                    121.52922699227929,
                    38.92876998055726
                ],
             ]
        }
    }
}
```

#### visual response

<img width="540" alt="Screen Shot 2019-04-26 at 9 40 07 AM" src="https://user-images.githubusercontent.com/388837/56834977-cf278e00-6827-11e9-9c93-005ed48433cc.png">

#### limitations

Due to the cardinality of points, an initial max of 10k points 
will be used. This should support many use-cases.

One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be 
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done, 
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number 
of points to filter........... something to explore


closes #41649
2020-11-23 10:26:27 -08:00
James Rodewig
2e9f95aa73
[DOCS] Change agg titles to sentence case (#64425) 2020-10-30 13:25:21 -04:00
James Rodewig
37b6adaf91
[DOCS] Rewrite aggs overview (#64318)
- Replaces more abstract docs about object structure and values source with task-based examples.
- Relocates several sections from the current `misc.asciidoc` file.
- Alphabetically sorts agg categories in the nav.
- Removes the matrix agg family. Moves the stats matrix agg under the metric agg family

Co-authored-by: debadair <debadair@elastic.co>
2020-10-30 08:39:38 -04:00
Igor Motov
f107dba741
Add rate aggregation (#61369)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 11:32:20 -04:00
Gil Raphaelli
f29c9ff652 [DOCS] Sort metric and pipeline agg docs (#56613) 2020-05-15 16:34:47 -04:00
Igor Motov
5fc9fc528d
Add Student's t-test aggregation support (#54469)
Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.

Relates to #53692
2020-04-03 11:31:13 -04:00
Nik Everett
5b2266601b
Implement top_metrics agg (#51155)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 07:13:52 -05:00
Igor Motov
0898df4aac
Add histogram field type support to boxplot aggs (#52265)
Add support for the histogram field type to boxplot aggs.

Closes #52233
Relates to #33112
2020-02-13 08:59:44 -05:00
Christos Soulios
b0e12c936b
Implement stats aggregation for string terms (#47468)
This PR adds a new metric aggregation called string_stats that operates on string terms of a document and returns the following:

    min_length: The length of the shortest term
    max_length: The length of the longest term
    avg_length: The average length of all terms
    distribution: The probability distribution of all characters appearing in all terms
    entropy: The total Shannon entropy value calculated for all terms

This aggregation has been implemented as an analytics plugin.
2019-11-14 16:07:54 +02:00
Andy Bristol
b8280ea7cc
median absolute deviation agg (#34482)
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation

Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
2018-10-30 07:22:52 -07:00
Zachary Tong
6ba144ae31
Add WeightedAvg metric aggregation (#31037)
Adds a new single-value metrics aggregation that computes the weighted 
average of numeric values that are extracted from the aggregated 
documents. These values can be extracted from specific numeric
fields in the documents.

When calculating a regular average, each datapoint has an equal "weight"; it
contributes equally to the final value.  In contrast, weighted averages
scale each datapoint differently.  The amount that each datapoint contributes 
to the final value is extracted from the document, or provided by a script.

As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`

A regular average can be thought of as a weighted average where every value has
an implicit weight of `1`.

Closes #15731
2018-07-23 18:33:15 -04:00
Nicholas Knize
b31d3ddd3e Adds geo_centroid metric aggregator
This commit adds a new metric aggregator for computing the geo_centroid over a set of geo_point fields. This can be combined with other aggregators (e.g., geohash_grid, significant_terms) for computing the geospatial centroid based on the document sets from other aggregation results.
2015-10-14 16:19:09 -05:00
Zachary Tong
e3ae1df6f0 [DOCS] Restructure Aggs documentation 2015-05-01 16:04:55 -04:00