Commit graph

437 commits

Author SHA1 Message Date
James Rodewig
8bc922512c
[DOCS] Redirect moving avg aggregation (#64435) 2020-10-30 14:12:09 -04:00
James Rodewig
2e9f95aa73
[DOCS] Change agg titles to sentence case (#64425) 2020-10-30 13:25:21 -04:00
James Rodewig
37b6adaf91
[DOCS] Rewrite aggs overview (#64318)
- Replaces more abstract docs about object structure and values source with task-based examples.
- Relocates several sections from the current `misc.asciidoc` file.
- Alphabetically sorts agg categories in the nav.
- Removes the matrix agg family. Moves the stats matrix agg under the metric agg family

Co-authored-by: debadair <debadair@elastic.co>
2020-10-30 08:39:38 -04:00
István Zoltán Szabó
6093518f4a
[DOCS] Changes experimental flag to beta in DFA related docs (#63992) 2020-10-26 17:02:46 +01:00
Hugo Chargois
ff736f078b
Allow mixing set-based and regexp-based include and exclude (#63325)
* Allow mixing set-based and regexp-based include and exclude

* Coding style

* Disallow having both set and regexp include (resp. exclude)

* Test correctness of every combination of include/exclude
2020-10-21 10:26:42 -04:00
Aref Razavi
245663e5b7 Remove useless parentheses in bucket_key formula (#63868) 2020-10-19 11:54:21 +02:00
Igor Motov
e6c70f6811
Add value_count mode to rate agg (#63687)
Adds a new value count mode to the rate aggregation.

Closes #63575
2020-10-15 18:00:44 -04:00
Igor Motov
34bff3f776
Add support for histogram fields to rate aggregation (#63289)
The rate aggregation now supports histogram fields. At the moment only sum
is supported. 

Closes #62939
2020-10-08 16:54:25 -04:00
Przemyslaw Gomulka
b38eaae47f
[doc] Rounding range query rules (#63109)
a documentation explaining defaulting of missing fields when using date math parser.
relates #62268
2020-10-02 08:59:27 +02:00
Benjamin Trent
1084aaf18a
[ML] renames */inference* apis to */trained_models* (#63097)
This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-01 12:13:49 -04:00
Lisa Cawley
ecf9e929ba
[DOCS] Add experimental tag to inference processor and bucket aggregation (#63023) 2020-09-30 07:20:38 -07:00
James Rodewig
277709004e
[DOCS] Fix elasticsearch-croneval chunking (#63008) 2020-09-29 09:53:20 -04:00
Christos Soulios
b857768bb5
Histogram field type support for min/max aggregations (#62532)
Implement min/max aggregations for histogram fields.

Closes #60951
2020-09-19 23:34:43 +03:00
Julie Tibshirani
f29c743a47
Support the 'fields' option in inner_hits and top_hits. (#62259)
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation

Addresses #61949.
2020-09-14 10:08:58 -07:00
Igor Motov
f107dba741
Add rate aggregation (#61369)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 11:32:20 -04:00
István Zoltán Szabó
8da6bba0fc
[DOCS] Adds example to the inference aggregation description (#61290) 2020-08-19 11:20:42 +02:00
Nik Everett
cebd5d47e2
Redo experimental tag on vwh (#61065)
The docs didn't have the standard experimental text. This adds it.
2020-08-18 10:00:54 -04:00
James Rodewig
456c37b186
[DOCS] Add usage tips to top_hits agg (#61215) 2020-08-17 12:42:04 -04:00
Adam Locke
fdc867e395
[DOCS] Update info about geo_shape bounding boxes (#61214)
* Adding information about geo_shape bounding boxes.

* Fixing cross link and incorporating review feedback.
2020-08-17 11:07:18 -04:00
James Rodewig
a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
Gilad Gal
8534bd5ce7
Update normalize-aggregation.asciidoc
The second method normalizes linearly between 0..100
2020-08-12 22:24:36 +03:00
James Rodewig
a0f4edff66
[DOCS] Fix chunking in query docs (#61053)
Changes:
* Moves "Notes" sections for the joining queries and percolate query
  pages to the parent page
* Adds related redirects for the moved "Notes" pages
* Assigns explicit anchor IDs to other "Notes" headings. This was required for
  the redirects to work.
2020-08-12 13:45:49 -04:00
James Rodewig
6b9b8c5e31
[DOCS] Move script and stored fields content to search fields page (#60826)
Changes:

* Moves `Retrieve selected fields` to its own page and adds a title abbreviation.
* Adds existing script and stored fields content to `Retrieve selected fields`
* Adds a xref for `Retrieve selected fields` to `Search your data`
* Adds related redirects and updates existing xrefs
2020-08-06 12:45:03 -04:00
Mark Tozzi
65caee9163
Extensibility for Composite Agg (#59648)
This PR adds the ability to plug new ValuesSourceType support into Composite aggregations via the ValuesSourceRegistry. This should let plugins which define new field types wire those types into composite.  It also updates composite's use of ValueType to follow the conventions we're using in the rest of aggregations, namely splitting the user supplied value out from the default value.
2020-08-06 12:34:14 -04:00
James Rodewig
929033f9dd
[DOCS] Move named query content to bool query (#60748) 2020-08-05 13:27:10 -04:00
James Rodewig
a4dc336c16
[DOCS] Replace twitter dataset in search/agg docs (#60667) 2020-08-04 13:31:52 -04:00
Alexander Reelsen
c7ac9e7073
[DOCS] http -> https, remove outdated plugin docs (#60380)
Plugin discovery documentation contained information about installing
Elasticsearch 2.0 and installing an oracle JDK, both of which is no
longer valid.

While noticing that the instructions used cleartext HTTP to install
packages, this commit replaces HTTPs links instead of HTTP where possible.

In addition a few community links have been removed, as they do not seem
to exist anymore.
2020-07-31 15:58:38 -04:00
James Rodewig
aec26b1a23
[DOCS] Move search pagination content to one page (#60515) 2020-07-31 11:43:06 -04:00
Julie Tibshirani
8a89d95372
Add search fields parameter to support high-level field retrieval. (#60100)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-27 13:25:55 -07:00
James Rodewig
441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
James Rodewig
74c9e56735
[DOCS] Fix default gap policy for moving fn, moving avg aggs (#60223) (#60230) 2020-07-27 12:32:35 -04:00
James Rodewig
d5b03f668b
[DOCS] Move search sort docs to separate page (#60123)
Moves the search sort docs from the deprecated 'Request Body Search'
page to a new subpage of 'Run a search'.

No substantive changes were made to the content.
2020-07-23 12:58:57 -04:00
James Rodewig
2774cd6938
[DOCS] Swap [float] for [discrete] (#60124)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 11:48:22 -04:00
Howard
b8e3ba783a
[DOCS] Fix missing punctuation in agg docs (#59822) 2020-07-21 10:17:59 -04:00
James Rodewig
2c5d6e9c95
[DOCS] Reformat agg snippets to use two-space indents (#59912) 2020-07-20 15:08:04 -04:00
James Rodewig
8a57800f1b
[DOCS] Add performance warning for scripts (#59890) 2020-07-20 14:04:35 -04:00
Igor Motov
6bfde550f9
Add hard_bounds documentation (#59809)
Fixes #59774
2020-07-20 09:54:02 -04:00
Nik Everett
27efb5f3b8
Clean up a few of vwh's rough edges (#59341)
This cleans up a few rough edged in the `variable_width_histogram`,
mostly found by @wwang500:
1. Setting its tuning parameters in an unexpected order could cause the
   request to fail.
2. We checked that the maximum number of buckets was both less than
   50000 and MAX_BUCKETS. This drops the 50000.
3. Fixes a divide by 0 that can occur of the `shard_size` is 1.
4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows
   a signed int.
5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less
   than `buckets` we will very consistently return fewer buckets than
   requested. For the most part we expect folks to leave it at the
   default. If they change it, we expect it to be much bigger than
   `buckets`.
6. Allocate a smaller `mergeMap` in when initially bucketing requests
   that don't use the entire `shard_size * 3 / 4`. Its just a waste.
7. Default `shard_size` to `10 * buckets` rather than `100`. It *looks*
   like that was our intention the whole time. And it feels like it'd
   keep the algorithm humming along more smoothly.
8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like
   we've documented it rather than `5000`. Like the point above, this
   feels like the right thing to do to keep the algorithm happy.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-17 13:39:28 -04:00
James Rodewig
aa3ddfeefb
[DOCS] Move highlighting docs to separate page (#59768)
Moves the highlighting docs from the deprecated 'Request Body Search'
chapter to the new subpage of the 'Run a search chapter' section.

No substantive changes were made to the content.
2020-07-17 10:15:20 -04:00
István Zoltán Szabó
edccf14478
[DOCS] Adds security privilege info to inference bucket aggregation (#59604) 2020-07-16 18:02:17 +02:00
Adam Locke
4dc5c87211
Indicating that the size parameter defaults to 10. (#59438) 2020-07-13 16:04:48 -04:00
Christos Soulios
2976ba471a
Histogram integration on Histogram field type (#58930)
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 17:07:16 +03:00
David Kyle
b9deb660a8
Include the ml inference aggregation doc (#59219)
Add to the list of pipeline aggregations
2020-07-08 14:22:19 +01:00
Nik Everett
3b3ed4b4a7
Fix lookup support in adjacency matrix (#59099)
This request:
```
POST /_search
{
  "aggs": {
    "a": {
      "adjacency_matrix": {
        "filters": {
          "1": {
            "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
          }
        }
      }
    }
  }
}
```

Would fail with a 500 error and a message like:
```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason":"async actions are left after rewrite"
      }
    ]
  }
}
```

This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
2020-07-06 18:53:19 -04:00
David Kyle
7daed3b8af
Pipeline Inference Aggregation (#58193)
Adds a pipeline aggregation that loads a model and performs inference on the 
input aggregation results.
2020-07-02 14:33:02 +01:00
Nik Everett
32bdf8549b
Fail variable_width_histogram that collects from many (#58619)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 15:42:46 -04:00
Nik Everett
dda78ff760
Docs: Mark variable_width_histogram experimental (#58574)
We're tracking this aggregation's experimental-progress in #58573. We'd
like a little time to be able to make backwards incompatible changes to
the aggregation because we're not 100% sure about the request and
response format yet.
2020-06-25 16:54:37 -04:00
James Dorfman
e99d287fbb
Add Variable Width Histogram Aggregation (#42035)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue. 

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
2020-06-23 09:26:54 -04:00
Cris da Rocha
b5de14d3f6
Missing comma between value types (#58383)
This applies to all versions of this document (7.7, 7.8, 7.x, current and master).
2020-06-19 23:01:25 +02:00
Tal Levy
c765993d82
add geo_shape documentation for supported aggregations (#58284)
This commit adds documentation for geo_shape fields in aggregations

Closes #55495.
2020-06-18 10:17:49 -07:00