elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 10:23:41 -04:00

Author	SHA1	Message	Date
Benjamin Trent	7f4df1632f	[7.x] Add support for range aggregations on histogram mapped fields (#74146 ) (#74682 ) * Add support for range aggregations on histogram mapped fields (#74146) This adds support for the range aggregation over `histogram` mapped fields. Decisions made for implementation: - Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation - Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future. - Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.	2021-06-29 08:45:51 -04:00
James Rodewig	0a5d4e740c	[DOCS] Deduplicate docs for `search.max_buckets`	2021-06-29 08:42:42 -04:00
Nik Everett	fc52651f0d	Document types `terms` agg can consume (#73272 ) (#74258 ) Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-06-17 15:12:40 -04:00
Igor Motov	d1e6e93544	Add keep_values gap policy (#73297 ) (#73927 ) Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes #27377 Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>	2021-06-08 13:37:34 -10:00
James Rodewig	9ec8d4c5aa	[DOCS] Clarify supported fields for `top_metrics` agg (#73907 ) (#73916 ) Changes: * Notes `metrics.field` supports `boolean` fields and runtime fields. * Notes `metrics.field` doesn't support array values. Closes #72889	2021-06-08 13:30:41 -04:00
James Rodewig	51bb214bfa	[DOCS] Make doc_count error docs more searchable (#73870 ) (#73902 ) Changes: * Combines the `Document counts are approximate` and `Calculating document count error` sections. * Rewrites the section to include `sum_other_doc_count` and `doc_count_error_upper_bound` for easier on-page (ctrl+f) searching. Closes #73200	2021-06-08 09:50:10 -04:00
Mark Tozzi	47d3d6a6d4	Docvalueformat errors (#73121 ) (#73863 ) Improve the error message when inconsistent mappings cause doc value formatting errors. For example, trying to format a binary encoded IP address as a UTF8 string often fails with something unexpected, like `ArrayIndexOutOfBounds`. This change catches that and wraps it with a message suggesting the user check their mappings. Also gets rid of anonymous instances for doc value formatters, which made it hard to see what format was failing to be applied.	2021-06-07 16:11:38 -04:00
Benjamin Trent	3392358f71	[ML] adding new KS test pipeline aggregation (#73334 ) (#73782 ) This adds a new pipeline aggregation for calculating Kolmogorov–Smirnov test for a given sample and buckets path. For now, the buckets path resolution needs to be `_count`. But, this may be relaxed in the future. It accepts a parameter `fractions` that indicates the distribution of documents from some other pre-calculated sample. This particular version of the K-S test is Two-sample, meaning, it calculates if the `fractions` and the distribution of `_count` values in the buckets_path are taken from the same distribution. This in combination with the hypothesis alternatives (`less`, `greater`, `two_sided`) and sampling logic (`upper_tail`, `lower_tail`, `uniform`) allow for flexibility and usefulness when comparing two samples and determining the likelihood of them being from the same overall distribution. Usage: ``` POST correlate_latency/_search?size=0&filter_path=aggregations { "aggs": { "buckets": { "terms": { <1> "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { <2> "field": "latency", "ranges": [ { "to": 0.0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "ks_test": { <3> "bucket_count_ks_test": { "buckets_path": "latency_ranges>_count", "alternative": ["less", "greater", "two_sided"] } } } } } } ``` Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-06-07 11:42:20 -04:00
Nik Everett	70e7946e7e	More debugging info for significant_text (backport of #72727 ) (#72895 ) Adds some extra debugging information to make it clear that you are running `significant_text`. Also adds some using timing information around the `_source` fetch and the `terms` accumulation. This lets you calculate a third useful timing number: the analysis time. It is `collect_ns - fetch_ns - accumulation_ns`. This also adds a half dozen extra REST tests to get a fairly comprehensive set of the operations this supports. It doesn't cover all of the significance heuristic parsing, but its certainly much better than what we had.	2021-05-11 08:20:25 -04:00
Benjamin Trent	374f995e4e	[7.x] [ML] add new bucket_correlation aggregation with initial count_correlation function (#72133 ) (#72896 ) * [ML] add new bucket_correlation aggregation with initial count_correlation function (#72133) This commit adds a new pipeline aggregation that allows correlation within the aggregation frame work in bucketed values. The initial function is a `count_correlation` function. The purpose of which is to correlate the count in a consistent number of buckets with a pre calculated indicator. The indicator and the aggregated buckets should related to the same metrics with in documents. Example for correlating terms within a `service.version.keyword` with latency percentiles. The percentiles and provided correlation indicator both refer to the same source data where the indicator was previously calculated.: ``` GET apm-7.12.0-transaction-generated/_search { "size": 0, "aggs": { "field_terms": { "terms": { "field": "service.version.keyword", "size": 20 }, "aggs": { "latency_range": { "range": { "field": "transaction.duration.us", "ranges": [<snip>], "keyed": true } }, "correlation": { "bucket_correlation": { "buckets_path": "latency_range>_count", "count_correlation": { "indicator": { "expectations": [<snip>], "doc_count": 20000 } } } } } } } } ```	2021-05-10 14:34:21 -04:00
Nik Everett	9a9950e9f2	Update docs for `filter` agg (backport of #72508 ) (#72828 ) The docs for the `filter` agg seemed to suggest that it was the preferred way to filter results for aggs but its really mostly for when you need to filter things under another bucketing agg. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-05-06 15:07:41 -04:00
Ignacio Vera	c6aab5ffcc	[GeoPoint] Grid aggregations with bounds should exclude touching tiles (#72493 ) (#72520 )	2021-04-30 09:51:33 +02:00
James Rodewig	d84cac0590	[DOCS] Fix typos (#72227 ) (#72256 ) Co-authored-by: Pierre Grimaud <grimaud.pierre@gmail.com>	2021-04-26 14:18:27 -04:00
Nik Everett	121ecb959d	Convert metric aggs docs runtime fields (backport of #71260 ) (#71298 ) This replaces the `script` docs for bucket aggregations with runtime fields. We expect runtime fields to be nicer to work with because you can also fetch them or filter on them. We expect them to be faster because their don't need this sort of `instanceof` tree: `a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)` Relates to #69291 Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-04-05 13:24:19 -04:00
Nik Everett	1b35100ab0	Convert bucket aggs docs to runtime fields (backport #71202 ) (#71248 ) This replaces the `script` docs for bucket aggregations with runtime fields. We expect runtime fields to be nicer to work with because you can also fetch them or filter on them. We expect them to be faster because their don't need this sort of `instanceof` tree: `a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)` Relates to #69291 Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-04-02 12:40:19 -04:00
James Rodewig	c757f9e4e7	[DOCS] Fix double spaces (#71082 ) (#71120 )	2021-03-31 11:43:34 -04:00
Benjamin Trent	abb182d95c	[7.x] [ML] adding support for composite aggs in anomaly detection (#69970 ) (#71052 ) * [ML] adding support for composite aggs in anomaly detection (#69970) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 12:04:54 -04:00
István Zoltán Szabó	591e93397a	[DOCS] Removes beta labels from DFA related docs. (#70808 ) (#70902 )	2021-03-26 10:25:36 +01:00
Nik Everett	05c5ec00f1	Docs: Clean doc for agg parameter (backport of #70675 ) (#70841 ) This adds a heading for `shard_min_doc_count` and merges the paragraphs for them. I wanted to link to this section earlier today and it wasn't a "real" section so I couldn't. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-03-24 16:38:44 -04:00
Ignacio Vera	81da10f2e6	Increase search.max_bucket by one (#70645 ) (#70706 )	2021-03-23 10:20:25 +01:00
James Rodewig	896d4f0d13	[DOCS] Reformat adjacency matrix agg reference (#70034 ) (#70101 )	2021-03-08 13:15:13 -05:00
James Rodewig	c7d2cfb920	[DOCS] Fix gap policy xref	2021-03-03 09:31:26 -05:00
James Rodewig	d495b49cc3	[DOCS] Reformat avg bucket agg reference (#69751 ) (#69830 )	2021-03-02 15:16:10 -05:00
Nik Everett	aef2567496	Docs: Switch terms agg scripting to runtime fields (backport of #69628 ) (#69821 ) We expect runtime fields to perform a little better than our "native" aggregation script so we should point folks to them instead of the "native" aggregation script.	2021-03-02 11:54:23 -05:00
James Rodewig	8313701dc0	[DOCS] Update example for `serial_diff` agg (#69635 ) (#69694 ) Co-authored-by: RomainGeffraye <romain.geffraye@elastic.co>	2021-03-01 08:53:29 -05:00
Lisa Cawley	1430e52669	[DOCS] Adds model alias to inference processor and agg (#69576 ) (#69577 )	2021-02-24 16:49:45 -08:00
Igor Motov	a140161f53	Clarify the intended use case for multi_terms aggs (#69397 ) (#69484 ) This PR clarifies when multi_terms aggs should be used instead of composite aggs or nested term aggs. Relates to #65623	2021-02-23 16:00:30 -05:00
Nik Everett	ec9c9a884b	Docs: Add example fetching keyword in top_metrics (backport of #69135 ) (#69141 ) Adds an example of fetching a keyword field.	2021-02-17 14:30:08 -05:00
James Rodewig	b55249507e	[DOCS] Fix typos for duplicate words (#69125 ) (#69132 )	2021-02-17 11:16:58 -05:00
James Rodewig	59f9f41cf2	[DOCS] Add missing newline for bulleted list in top_metrics docs (#68481 ) (#68551 ) Co-authored-by: Nathan L Smith <nathan.smith@elastic.co>	2021-02-04 14:49:09 -05:00
Igor Motov	a0604825c6	[7.x] Add multi_terms aggs (#67597 ) (#68490 ) Adds a multi_terms aggregation support. The multi terms aggregation works very similarly to the terms aggregation but supports multiple terms. The goal of this PR is to add the basic functionality so it is not optimized at the moment. It will be done in follow up PRs. Closes #65623	2021-02-04 11:19:25 -05:00
James Rodewig	f4f5c7c227	[DOCS] Fix casing for agg type titles (#67469 ) (#67470 )	2021-01-13 15:04:08 -05:00
Adam Locke	0324892ed5	[DOCS] Adding headers in TOC for aggregation docs. (#66604 ) (#66607 )	2020-12-18 12:00:11 -05:00
James Rodewig	e4bf2afd58	[DOCS] Fix `search.max_buckets` default (#66311 ) (#66312 )	2020-12-15 08:17:50 -05:00
Nik Everett	d13c4b3f4b	Drop experimental from variable width histogram (backport of #66055 ) (#66060 ) Its been several months and we haven't bumped into any good reason to rework the variable width histogram. So let's drop experimental from it! Closes #58573	2020-12-08 14:38:00 -05:00
James Rodewig	24cc2139c7	[DOCS] Fix typo in histogram agg docs (#65822 ) (#65827 )	2020-12-03 10:53:09 -05:00
Igor Motov	de3ee05b33	Return an error when a rate aggregation cannot calculate bucket sizes (#65429 ) (#65502 ) In some cases when the rate aggregation is not a child of a date histogram aggregation, it is not possible to determine the actual size of the date histogram bucket. In this case the rate aggregation now throws an exception. Closes #63703	2020-11-25 12:27:08 -05:00
Tal Levy	0e6280ae3e	Add mention of geo_shape support in geotile and geohash grid agg docs (#61129 ) Previously, geo_shape support was only mentioned in a dedicated x-pack section. This may be misleading, as the introductory paragraph only mentions geo_point. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2020-11-24 13:58:29 -08:00
Tal Levy	cd7d1c9183	Add geo_line aggregation (#41612 ) (#65442 ) A metric aggregation that aggregates a set of points as a GeoJSON LineString ordered by some sort parameter. A `geo_line` aggregation request would specify a `geo_point` field, as well as a `sort` field. `geo_point` represents the values used in the LineString, while the `sort` values will be used as the total ordering of the points. the `sort` field would support any numeric field, including date. ``` { "query": { "bool": { "must": [ { "term": { "person": "004" } }, { "term": { "trajectory": "20090131002206.plt" } } ] } }, "aggs": { "make_line": { "geo_line": { "point": {"field": "location"}, "sort": { "field": "timestamp" }, "include_sort": true, "sort_order": "desc", "size": 15 } } } } ``` ``` { "took": 21, "timed_out": false, "_shards": {...}, "hits": {...}, "aggregations": { "make_line": { "type": "LineString", "coordinates": [ [ 121.52926194481552, 38.92878997139633 ], [ 121.52922699227929, 38.92876998055726 ], ] } } } ``` Due to the cardinality of points, an initial max of 10k points will be used. This should support many use-cases. One solution to overcome this limitation is to keep a PriorityQueue of points, and simplifying the line once it hits this max. If simplifying makes sense, it may be a nice option, in general. The ability to use a parameter to specify how aggressive one wants to simplify. This parameter could be the number of points. Example algorithm one could use with a PriorityQueue: https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m is the number of points returned. And would also require heapifying triangles sorted by their areas, which would be O(log(m)) operations. Since sorting is done, anyways, simplifying would still be a O(n log(m)) operation, where n is the total number of points to filter........... something to explore closes #41649	2020-11-24 09:30:05 -08:00
Wylie Conlon	4d9f5b1867	Clarify field data cache behavior in docs (#64375 ) * Clarify that field data cache includes global ordinals * Describe that the cache should be cleared once the limit is reached * Clarify that the `_id` field does not supported aggregations anymore * Fold the `fielddata` mapping parameter page into the `text field docs * Improve cross-linking	2020-11-20 13:56:02 -08:00
Adam Locke	8530eaaf98	Explicitly defining types for sources parameter (#65006 ) (#65021 )	2020-11-12 17:08:33 -05:00
Mark Tozzi	491a5a08f3	[7.x] Add supports for upper and lower values on boxplot based on the IQR value (#63617 ) (#64611 ) * Add supports for upper and lower values on boxplot based on the IQR value (#63617) * fix List.of usage	2020-11-05 09:18:27 -05:00
James Rodewig	354602e798	[DOCS] Change agg titles to sentence case (#64425 ) (#64430 )	2020-10-30 13:45:54 -04:00
James Rodewig	5b1700b660	[DOCS] Rewrite aggs overview (#64318 ) (#64409 ) - Replaces more abstract docs about object structure and values source with task-based examples. - Relocates several sections from the current `misc.asciidoc` file. - Alphabetically sorts agg categories in the nav. - Removes the matrix agg family. Moves the stats matrix agg under the metric agg family Co-authored-by: debadair <debadair@elastic.co>	2020-10-30 09:29:26 -04:00
Mark Tozzi	51916aa677	[7.x] Allow mixing set-based and regexp-based include and exclude (#63325 ) (#64014 ) * Allow mixing set-based and regexp-based include and exclude (#63325) Co-authored-by: Hugo Chargois <hugo.chargois@free.fr>	2020-10-27 10:11:24 -04:00
István Zoltán Szabó	b822e582c3	[DOCS] Changes experimental flag to beta in DFA related docs (#63992 ) (#64176 )	2020-10-26 18:04:21 +01:00
Igor Motov	5ebe90daa0	Add value_count mode to rate agg (#63687 ) (#63847 ) Adds a new value count mode to the rate aggregation. Closes #63575	2020-10-19 13:04:38 -04:00
Aref Razavi	6f7d0d7018	Remove useless parentheses in bucket_key formula (#63868 )	2020-10-19 11:53:09 +02:00
Igor Motov	3bfb11a32a	Add support for histogram fields to rate aggregation (#63289 ) (#63511 ) The rate aggregation now supports histogram fields. At the moment only sum is supported. Closes #62939	2020-10-08 19:16:34 -04:00
Benjamin Trent	cfcf973259	[7.x] [ML] renames /inference apis to /trained_models (#63097 ) (#63136 ) * [ML] renames /inference apis to /trained_models (#63097) This commit renames all `inference` CRUD APIs to `trained_models`. This aligns with internal terminology, documentation, and use-cases.	2020-10-02 07:34:28 -04:00

1 2 3 4 5 ...

475 commits