elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 18:33:26 -04:00

Author	SHA1	Message	Date
Benjamin Trent	7f4df1632f	[7.x] Add support for range aggregations on histogram mapped fields (#74146 ) (#74682 ) * Add support for range aggregations on histogram mapped fields (#74146) This adds support for the range aggregation over `histogram` mapped fields. Decisions made for implementation: - Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation - Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future. - Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.	2021-06-29 08:45:51 -04:00
Nik Everett	fc52651f0d	Document types `terms` agg can consume (#73272 ) (#74258 ) Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com> Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-06-17 15:12:40 -04:00
James Rodewig	51bb214bfa	[DOCS] Make doc_count error docs more searchable (#73870 ) (#73902 ) Changes: * Combines the `Document counts are approximate` and `Calculating document count error` sections. * Rewrites the section to include `sum_other_doc_count` and `doc_count_error_upper_bound` for easier on-page (ctrl+f) searching. Closes #73200	2021-06-08 09:50:10 -04:00
Mark Tozzi	47d3d6a6d4	Docvalueformat errors (#73121 ) (#73863 ) Improve the error message when inconsistent mappings cause doc value formatting errors. For example, trying to format a binary encoded IP address as a UTF8 string often fails with something unexpected, like `ArrayIndexOutOfBounds`. This change catches that and wraps it with a message suggesting the user check their mappings. Also gets rid of anonymous instances for doc value formatters, which made it hard to see what format was failing to be applied.	2021-06-07 16:11:38 -04:00
Nik Everett	70e7946e7e	More debugging info for significant_text (backport of #72727 ) (#72895 ) Adds some extra debugging information to make it clear that you are running `significant_text`. Also adds some using timing information around the `_source` fetch and the `terms` accumulation. This lets you calculate a third useful timing number: the analysis time. It is `collect_ns - fetch_ns - accumulation_ns`. This also adds a half dozen extra REST tests to get a fairly comprehensive set of the operations this supports. It doesn't cover all of the significance heuristic parsing, but its certainly much better than what we had.	2021-05-11 08:20:25 -04:00
Nik Everett	9a9950e9f2	Update docs for `filter` agg (backport of #72508 ) (#72828 ) The docs for the `filter` agg seemed to suggest that it was the preferred way to filter results for aggs but its really mostly for when you need to filter things under another bucketing agg. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-05-06 15:07:41 -04:00
Ignacio Vera	c6aab5ffcc	[GeoPoint] Grid aggregations with bounds should exclude touching tiles (#72493 ) (#72520 )	2021-04-30 09:51:33 +02:00
Nik Everett	1b35100ab0	Convert bucket aggs docs to runtime fields (backport #71202 ) (#71248 ) This replaces the `script` docs for bucket aggregations with runtime fields. We expect runtime fields to be nicer to work with because you can also fetch them or filter on them. We expect them to be faster because their don't need this sort of `instanceof` tree: `a92a647b9f/server/src/main/java/org/elasticsearch/search/aggregations/support/values/ScriptDoubleValues.java (L42)` Relates to #69291 Co-authored-by: Adam Locke <adam.locke@elastic.co>	2021-04-02 12:40:19 -04:00
James Rodewig	c757f9e4e7	[DOCS] Fix double spaces (#71082 ) (#71120 )	2021-03-31 11:43:34 -04:00
Benjamin Trent	abb182d95c	[7.x] [ML] adding support for composite aggs in anomaly detection (#69970 ) (#71052 ) * [ML] adding support for composite aggs in anomaly detection (#69970) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 12:04:54 -04:00
Nik Everett	05c5ec00f1	Docs: Clean doc for agg parameter (backport of #70675 ) (#70841 ) This adds a heading for `shard_min_doc_count` and merges the paragraphs for them. I wanted to link to this section earlier today and it wasn't a "real" section so I couldn't. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2021-03-24 16:38:44 -04:00
James Rodewig	896d4f0d13	[DOCS] Reformat adjacency matrix agg reference (#70034 ) (#70101 )	2021-03-08 13:15:13 -05:00
Nik Everett	aef2567496	Docs: Switch terms agg scripting to runtime fields (backport of #69628 ) (#69821 ) We expect runtime fields to perform a little better than our "native" aggregation script so we should point folks to them instead of the "native" aggregation script.	2021-03-02 11:54:23 -05:00
Igor Motov	a140161f53	Clarify the intended use case for multi_terms aggs (#69397 ) (#69484 ) This PR clarifies when multi_terms aggs should be used instead of composite aggs or nested term aggs. Relates to #65623	2021-02-23 16:00:30 -05:00
James Rodewig	b55249507e	[DOCS] Fix typos for duplicate words (#69125 ) (#69132 )	2021-02-17 11:16:58 -05:00
Igor Motov	a0604825c6	[7.x] Add multi_terms aggs (#67597 ) (#68490 ) Adds a multi_terms aggregation support. The multi terms aggregation works very similarly to the terms aggregation but supports multiple terms. The goal of this PR is to add the basic functionality so it is not optimized at the moment. It will be done in follow up PRs. Closes #65623	2021-02-04 11:19:25 -05:00
Adam Locke	0324892ed5	[DOCS] Adding headers in TOC for aggregation docs. (#66604 ) (#66607 )	2020-12-18 12:00:11 -05:00
Nik Everett	d13c4b3f4b	Drop experimental from variable width histogram (backport of #66055 ) (#66060 ) Its been several months and we haven't bumped into any good reason to rework the variable width histogram. So let's drop experimental from it! Closes #58573	2020-12-08 14:38:00 -05:00
James Rodewig	24cc2139c7	[DOCS] Fix typo in histogram agg docs (#65822 ) (#65827 )	2020-12-03 10:53:09 -05:00
Tal Levy	0e6280ae3e	Add mention of geo_shape support in geotile and geohash grid agg docs (#61129 ) Previously, geo_shape support was only mentioned in a dedicated x-pack section. This may be misleading, as the introductory paragraph only mentions geo_point. Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>	2020-11-24 13:58:29 -08:00
Wylie Conlon	4d9f5b1867	Clarify field data cache behavior in docs (#64375 ) * Clarify that field data cache includes global ordinals * Describe that the cache should be cleared once the limit is reached * Clarify that the `_id` field does not supported aggregations anymore * Fold the `fielddata` mapping parameter page into the `text field docs * Improve cross-linking	2020-11-20 13:56:02 -08:00
Adam Locke	8530eaaf98	Explicitly defining types for sources parameter (#65006 ) (#65021 )	2020-11-12 17:08:33 -05:00
James Rodewig	354602e798	[DOCS] Change agg titles to sentence case (#64425 ) (#64430 )	2020-10-30 13:45:54 -04:00
Mark Tozzi	51916aa677	[7.x] Allow mixing set-based and regexp-based include and exclude (#63325 ) (#64014 ) * Allow mixing set-based and regexp-based include and exclude (#63325) Co-authored-by: Hugo Chargois <hugo.chargois@free.fr>	2020-10-27 10:11:24 -04:00
Aref Razavi	6f7d0d7018	Remove useless parentheses in bucket_key formula (#63868 )	2020-10-19 11:53:09 +02:00
Przemyslaw Gomulka	ee500c10b9	[doc] Rounding range query rules backport(#63109 ) (#63155 ) a documentation explaining defaulting of missing fields when using date math parser. relates #62268	2020-10-02 09:40:01 +02:00
James Rodewig	42437e4b29	[DOCS] Fix elasticsearch-croneval chunking (#63008 ) (#63009 )	2020-09-29 10:35:23 -04:00
Nik Everett	8a387d6df1	Redo experimental tag on vwh (#61065 ) The docs didn't have the standard experimental text. This adds it.	2020-08-18 10:02:26 -04:00
James Rodewig	60876a0e32	[DOCS] Replace Wikipedia links with attribute (#61171 ) (#61209 )	2020-08-17 11:27:04 -04:00
James Rodewig	cfa67e933f	[DOCS] Fix chunking in query docs (#61053 ) (#61054 ) Changes: * Moves "Notes" sections for the joining queries and percolate query pages to the parent page * Adds related redirects for the moved "Notes" pages * Assigns explicit anchor IDs to other "Notes" headings. This was required for the redirects to work.	2020-08-12 14:01:10 -04:00
Mark Tozzi	ab8518fb5b	[7.x] Extensibility for Composite Agg #59648 (#60842 )	2020-08-11 09:14:33 -04:00
James Rodewig	a21ec410c7	[DOCS] Replace `twitter` dataset in search/agg docs (#60667 ) (#60675 )	2020-08-04 14:16:38 -04:00
James Rodewig	5a2c6f0d4f	[DOCS] http -> https, remove outdated plugin docs (#60380 ) (#60545 ) Plugin discovery documentation contained information about installing Elasticsearch 2.0 and installing an oracle JDK, both of which is no longer valid. While noticing that the instructions used cleartext HTTP to install packages, this commit replaces HTTPs links instead of HTTP where possible. In addition a few community links have been removed, as they do not seem to exist anymore. Co-authored-by: Alexander Reelsen <alexander@reelsen.net>	2020-07-31 16:16:31 -04:00
James Rodewig	771e9f142a	[DOCS] Move search pagination content to one page (#60515 ) (#60525 )	2020-07-31 12:40:40 -04:00
James Rodewig	aba785cb6e	[DOCS] Update my-index examples (#60132 ) (#60248 ) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex`	2020-07-27 15:58:26 -04:00
Howard	466e947b0e	[DOCS] Fix missing punctuation in agg docs (#59823 )	2020-07-21 10:19:29 -04:00
James Rodewig	ff8a042580	[DOCS] Reformat agg snippets to use two-space indents (#59912 ) (#59922 )	2020-07-20 15:59:00 -04:00
Igor Motov	96a5284484	Add hard_bounds documentation (#59809 ) (#59883 ) Fixes #59774	2020-07-20 10:51:23 -04:00
Nik Everett	514b2f3414	Clean up a few of vwh's rough edges (#59341 ) (#59807 ) This cleans up a few rough edged in the `variable_width_histogram`, mostly found by @wwang500: 1. Setting its tuning parameters in an unexpected order could cause the request to fail. 2. We checked that the maximum number of buckets was both less than 50000 and MAX_BUCKETS. This drops the 50000. 3. Fixes a divide by 0 that can occur of the `shard_size` is 1. 4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows a signed int. 5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less than `buckets` we will very consistently return fewer buckets than requested. For the most part we expect folks to leave it at the default. If they change it, we expect it to be much bigger than `buckets`. 6. Allocate a smaller `mergeMap` in when initially bucketing requests that don't use the entire `shard_size * 3 / 4`. Its just a waste. 7. Default `shard_size` to `10 * buckets` rather than `100`. It looks like that was our intention the whole time. And it feels like it'd keep the algorithm humming along more smoothly. 8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like we've documented it rather than `5000`. Like the point above, this feels like the right thing to do to keep the algorithm happy. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-17 15:16:09 -04:00
Adam Locke	aa260636e5	Indicating that the size parameter defaults to 10. (#59438 ) (#59461 )	2020-07-13 16:27:20 -04:00
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00
Nik Everett	40850a780d	Fail variable_width_histogram that collects from many (#58619 ) (#58780 ) Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035	2020-06-30 18:26:45 -04:00
Nik Everett	d22a242613	Docs: Mark variable_width_histogram experimental (#58574 ) We're tracking this aggregation's experimental-progress in #58573. We'd like a little time to be able to make backwards incompatible changes to the aggregation because we're not 100% sure about the request and response format yet.	2020-06-25 16:54:57 -04:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Tal Levy	11086d5c7d	add geo_shape documentation for supported aggregations (#58284 ) (#58354 ) This commit adds documentation for geo_shape fields in aggregations Closes #55495.	2020-06-18 12:36:24 -07:00
Benjamin Trent	1aea9d5f49	Adding transform docs for geotile_grid (#57000 ) (#57474 ) transforms and composite aggs support geotile_grid as a source. This adds documentation explaining that support.	2020-06-01 15:46:37 -04:00
Nik Everett	07c76f2894	Update date_histogram docs (#56922 ) (#57387 ) * Make it more clear that you can use `month` or `1M`. * Explain rounding rules * Consistently use "time zone" instead of "timezone". It looks like both are right but I see "time zone" much more. And the parameter in elasticsearch is `time_zone` so we may as well line up. Closes #56760 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-05-29 17:40:40 -04:00
Gabriel Petrovay	cb4d5f5042	Fixed calendar intervals documentation (#56666 ) - the 1-letter intervals are not parseable (`m`, `h`, `d`, `w`, `M`, `q`, `y`) - fixed formatting broken by new lines	2020-05-15 16:55:57 -04:00
Gabriel Petrovay	ca586f2a8d	[Docs] Correct formatting in datehistogram-aggregation.asciidoc (#56664 )	2020-05-13 12:01:42 +02:00

1 2 3 4 5

248 commits