Currently, the 'fields' option only supports fetching mapped fields. Since
'fields' is meant to be the central place to retrieve document content, it
should allow for loading unmapped values. This change adds implementation and
tests for this feature.
Closes#63690
Currently, if you write a date range query with numeric 'to' or 'from' bounds,
they can be interpreted as years if no format is provided. We use
"strict_date_optional_time||epoch_millis" in this case that can interpret inputs
like 1000 as the year 1000 for example.
This PR change this to always interpret and parse numbers with the "epoch_millis"
parser if no other formatter was provided.
Closes#63680
Today we describe snapshots as "incremental" but their incrementality is
rather different beast from e.g. incremental filesystem backups. With
traditional backups you take a large and relatively infrequent "full"
backup and then a sequence of smaller "incremental" ones, and this whole
sequence of backups is required for a restore so it must be kept around
until at least the next full backup. In contrast, Elasticsearch
snapshots are logically independent and each can be deleted without
affecting the integrity of the others.
This distinction frequently causes confusion amongst newer users, so
this commit clarifies what we mean by "incremental" in the docs.
Whether the cold tier can handle years depends a lot on the use case and
for instance our BWC guarantees. This would need to be part of a
specific sizing exercise, so in the spirit of not over-promising, the
description of the cold tier has been changed to not mention years.
Generating a CA on the fly is an attempt at workflow optimisation that was
inherited from certgen. There are potential pitfalls with this approach. Overall
it is recommended to separate the step of CA creation and mandate a CA to be
specified when generating certificate.
This PR add a deprecation message if the cert command is used without specifying
a CA. A follow up PR will throw error for this usage in 8.0.
For use case where we explicitly trust a certificate without needing a CA, e.g.
SAML message signing, the PR adds a --self-signed option to the cert sub-command
to generate self-signed certificate.
Clarify that searchable snapshots only result in cost savings for less
frequently accessed data and that the savings do not apply to the entire
cluster.
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Introduce an additional hasher that is PBKDF2 but pads the input to > 14 chars before hashing to comply with FIPS Approve Only mode
* Addressing the PR feedback
adding doc changes
* Renaming the hash function + rephrasing the doc descriptions
* Removing leftover from the doc
* Return HexCharArray instead of Base64 encoding and avoid intermediate
String
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
During highlighting, we now load all values that were copied into the field
through copy_to. So there's no longer a reason to set 'store: true' to account
for fields not available in _source.
In some cases when the rate aggregation is not a child of a date histogram
aggregation, it is not possible to determine the actual size of the date
histogram bucket. In this case the rate aggregation now throws an exception.
Closes#63703
Previously, geo_shape support was only mentioned in a dedicated x-pack
section. This may be misleading, as the introductory paragraph only
mentions geo_point.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adds the capability to have functions with two optional arguments
* Adds two new optional arguments to `PERCENTILE()` and
`PERCENTILE_RANK()` functions, namely the method and
method_parameter which can be: 1) `tdigest` and a double `compression`
parameter or 2) `hdr` and an integer representing the
`number_of_digits` parameter.
* Integration tests
* Documentation updates
Closes#63567
This PR adds detail to the explanation of the soft_limit
memory_status in ML job stats. A consequence that was not
mentioned before is that examples are not added to category
definitions.
Relates elastic/ml-cpp#1590
A metric aggregation that aggregates a set of points as
a GeoJSON LineString ordered by some sort parameter.
#### specifics
A `geo_line` aggregation request would specify a `geo_point` field, as well
as a `sort` field. `geo_point` represents the values used in the LineString,
while the `sort` values will be used as the total ordering of the points.
the `sort` field would support any numeric field, including date.
#### sample usage
```
{
"query": {
"bool": {
"must": [
{ "term": { "person": "004" } },
{ "term": { "trajectory": "20090131002206.plt" } }
]
}
},
"aggs": {
"make_line": {
"geo_line": {
"point": {"field": "location"},
"sort": { "field": "timestamp" },
"include_sort": true,
"sort_order": "desc",
"size": 15
}
}
}
}
```
#### sample response
```
{
"took": 21,
"timed_out": false,
"_shards": {...},
"hits": {...},
"aggregations": {
"make_line": {
"type": "LineString",
"coordinates": [
[
121.52926194481552,
38.92878997139633
],
[
121.52922699227929,
38.92876998055726
],
]
}
}
}
```
#### visual response
<img width="540" alt="Screen Shot 2019-04-26 at 9 40 07 AM" src="https://user-images.githubusercontent.com/388837/56834977-cf278e00-6827-11e9-9c93-005ed48433cc.png">
#### limitations
Due to the cardinality of points, an initial max of 10k points
will be used. This should support many use-cases.
One solution to overcome this limitation is to keep a PriorityQueue of
points, and simplifying the line once it hits this max. If simplifying
makes sense, it may be a nice option, in general. The ability to use a parameter
to specify how aggressive one wants to simplify. This parameter could be
the number of points. Example algorithm one could use with a PriorityQueue:
https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m
is the number of points returned. And would also require heapifying triangles
sorted by their areas, which would be O(log(m)) operations. Since sorting is done,
anyways, simplifying would still be a O(n log(m)) operation, where n is the total number
of points to filter........... something to explore
closes#41649
The _Important Elasticsearch configuration_ docs lists a number of items
that you should consider before moving to production. Today this list
does not include configuring snapshots, even though they're very
important to have in production. This commit addresses that omission,
removes some repetition from the introductory paragraphs, and notes that
this config is handled for you on Cloud.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking