This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.
This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible.
Contributes to #106460, #78744.
* Implement synthetic source support for range fields
This PR adds basic synthetic source support for range fields. There are
following notable properties of synthetic source produced:
* Ranges are always normalized to be inclusive on both ends (this is how
they are stored).
* Original order of ranges is not preserved.
* Date ranges are always expressed in epoch millis, format is not
preserved.
* IP ranges are always expressed as a range of IPs while it could
have been originally provided as a CIDR.
This PR only implements retrieval of data for source reconstruction from
doc values.
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:
* flattened fields use sorted set doc values of keywords, which means two things:
first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
have no way to distinguish single-valued from multi-values fields other then looking
at the count of values
Synthetic _source's array flattening activities can remove some arrays
entirely. Specifically:
```
{
"foo": [
{
"bar": 1
},
{
"baz": 2
}
]
}
```
Turns into:
```
{
"foo": {
"bar": 1,
"baz": 2
}
}
```
See, no more array! It's because the values are flattend to the leaf
fields and didn't have multiple values. This is implied by the docs we
had, but sure wasn't obvious. So now it's documented specifically.
I got some new this morning that we're going to have to rework how we
handle ignore-above in synthetic _source which makes me a bit weary of
removing tech-preview in 8.5. I asked a few folks and they felt more
comfortable giving it a little longer in tech preview. I expect until
ignore-above is in.
I've been hacking on synthetic source for a while now and not seen any
need to break backwards compatibility or any major bugs. I think it's
time to remove the `preview` marker from it so folks can use it without
fear.
This adds support for synthetic _source to the `version` field type. It
works very similarly to `keyword` but with an extra decode step.
I modified the decoder to return a `BytesRef` instead of a `String`
because many of the callers seemed to be converting that string directly
into bytes again. Synthetic source would have wanted to do that. As was
the query infrastructure.
Now that we're releasing synthetic _source as a tech preview feature, we
no longer want to remove the docs from the non-release builds. And we
want to mark all of the headings describing synthetic `_source` as a
preview.
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
The `_routing` metadata field docs currently include formulas for how
Elasticsearch routes documents to shards. However, these formulas were not
updated for #18699. This updates the routing formulas and adds xrefs for
related settings.
Closes#76072
* Clarify that field data cache includes global ordinals
* Describe that the cache should be cleared once the limit is reached
* Clarify that the `_id` field does not supported aggregations anymore
* Fold the `fielddata` mapping parameter page into the `text field docs
* Improve cross-linking
Bucket aggregations compute bucket doc_count values by incrementing the doc_count by 1 for every document collected in the bucket.
When using summary fields (such as aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (named doc_count field mapper). This field is a positive integer representing the number of documents aggregated in a single summary field.
Bucket aggregations will check if a field of type doc_count exists in a document and will take this value into consideration when computing doc counts.
We don't need a special TypeFieldMapper for anything in particular; all access
to the type field can be done via a TypeFieldType that issues appropriate
deprecation warnings.
Relates to #41059
Moves the highlighting docs from the deprecated 'Request Body Search'
chapter to the new subpage of the 'Run a search chapter' section.
No substantive changes were made to the content.
Changes:
* Condenses and relocates the `docvalue_fields` example to the 'Run a search'
page.
* Adds docs for the `docvalue_fields` request body parameter.
* Updates several related xrefs.
Co-authored-by: debadair <debadair@elastic.co>