We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.
This PR uses infrastructure from #107567 to implement a fallback implementation of synthetic source for field mappers that don't support it natively. In that case we will store source of such field as is in a separate stored field.
This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible.
Contributes to #106460, #78744.
* Implement synthetic source support for range fields
This PR adds basic synthetic source support for range fields. There are
following notable properties of synthetic source produced:
* Ranges are always normalized to be inclusive on both ends (this is how
they are stored).
* Original order of ranges is not preserved.
* Date ranges are always expressed in epoch millis, format is not
preserved.
* IP ranges are always expressed as a range of IPs while it could
have been originally provided as a CIDR.
This PR only implements retrieval of data for source reconstruction from
doc values.
Here we add synthetic source support for fields whose type is flattened.
Note that flattened fields and synthetic source have the following limitations,
all arising from the fact that in synthetic source we just see key/value pairs
when reconstructing the original object and have no type information in mappings:
* flattened fields use sorted set doc values of keywords, which means two things:
first we do not allow duplicate values, second we treat all values as keywords
* reconstructing array of objects results in nested objects (no array)
* reconstructing arrays with just one element results in a single-value field since we
have no way to distinguish single-valued from multi-values fields other then looking
at the count of values
Synthetic _source's array flattening activities can remove some arrays
entirely. Specifically:
```
{
"foo": [
{
"bar": 1
},
{
"baz": 2
}
]
}
```
Turns into:
```
{
"foo": {
"bar": 1,
"baz": 2
}
}
```
See, no more array! It's because the values are flattend to the leaf
fields and didn't have multiple values. This is implied by the docs we
had, but sure wasn't obvious. So now it's documented specifically.
I got some new this morning that we're going to have to rework how we
handle ignore-above in synthetic _source which makes me a bit weary of
removing tech-preview in 8.5. I asked a few folks and they felt more
comfortable giving it a little longer in tech preview. I expect until
ignore-above is in.
I've been hacking on synthetic source for a while now and not seen any
need to break backwards compatibility or any major bugs. I think it's
time to remove the `preview` marker from it so folks can use it without
fear.
This adds support for synthetic _source to the `version` field type. It
works very similarly to `keyword` but with an extra decode step.
I modified the decoder to return a `BytesRef` instead of a `String`
because many of the callers seemed to be converting that string directly
into bytes again. Synthetic source would have wanted to do that. As was
the query infrastructure.
Now that we're releasing synthetic _source as a tech preview feature, we
no longer want to remove the docs from the non-release builds. And we
want to mark all of the headings describing synthetic `_source` as a
preview.
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.