mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible. Contributes to #106460, #78744.
178 lines
4.8 KiB
Text
178 lines
4.8 KiB
Text
[[synthetic-source]]
|
|
==== Synthetic `_source`
|
|
|
|
IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
|
|
(indices that have `index.mode` set to `time_series`). For other indices
|
|
synthetic `_source` is in technical preview. Features in technical preview may
|
|
be changed or removed in a future release. Elastic will work to fix
|
|
any issues, but features in technical preview are not subject to the support SLA
|
|
of official GA features.
|
|
|
|
Though very handy to have around, the source field takes up a significant amount
|
|
of space on disk. Instead of storing source documents on disk exactly as you
|
|
send them, Elasticsearch can reconstruct source content on the fly upon retrieval.
|
|
Enable this by setting `mode: synthetic` in `_source`:
|
|
|
|
[source,console,id=enable-synthetic-source-example]
|
|
----
|
|
PUT idx
|
|
{
|
|
"mappings": {
|
|
"_source": {
|
|
"mode": "synthetic"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTSETUP
|
|
|
|
While this on the fly reconstruction is *generally* slower than saving the source
|
|
documents verbatim and loading them at query time, it saves a lot of storage
|
|
space.
|
|
|
|
[[synthetic-source-restrictions]]
|
|
===== Synthetic `_source` restrictions
|
|
|
|
There are a couple of restrictions to be aware of:
|
|
|
|
* When you retrieve synthetic `_source` content it undergoes minor
|
|
<<synthetic-source-modifications,modifications>> compared to the original JSON.
|
|
* Synthetic `_source` can be used with indices that contain only these field
|
|
types:
|
|
|
|
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
|
|
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
|
|
** <<binary-synthetic-source,`binary`>>
|
|
** <<boolean-synthetic-source,`boolean`>>
|
|
** <<numeric-synthetic-source,`byte`>>
|
|
** <<date-synthetic-source,`date`>>
|
|
** <<date-nanos-synthetic-source,`date_nanos`>>
|
|
** <<dense-vector-synthetic-source,`dense_vector`>>
|
|
** <<numeric-synthetic-source,`double`>>
|
|
** <<flattened-synthetic-source, `flattened`>>
|
|
** <<numeric-synthetic-source,`float`>>
|
|
** <<geo-point-synthetic-source,`geo_point`>>
|
|
** <<numeric-synthetic-source,`half_float`>>
|
|
** <<histogram-synthetic-source,`histogram`>>
|
|
** <<numeric-synthetic-source,`integer`>>
|
|
** <<ip-synthetic-source,`ip`>>
|
|
** <<keyword-synthetic-source,`keyword`>>
|
|
** <<numeric-synthetic-source,`long`>>
|
|
** <<range-synthetic-source,`range` types>>
|
|
** <<numeric-synthetic-source,`scaled_float`>>
|
|
** <<numeric-synthetic-source,`short`>>
|
|
** <<text-synthetic-source,`text`>>
|
|
** <<version-synthetic-source,`version`>>
|
|
** <<wildcard-synthetic-source,`wildcard`>>
|
|
|
|
[[synthetic-source-modifications]]
|
|
===== Synthetic `_source` modifications
|
|
|
|
When synthetic `_source` is enabled, retrieved documents undergo some
|
|
modifications compared to the original JSON.
|
|
|
|
[[synthetic-source-modifications-leaf-arrays]]
|
|
====== Arrays moved to leaf fields
|
|
Synthetic `_source` arrays are moved to leaves. For example:
|
|
|
|
[source,console,id=synthetic-source-leaf-arrays-example]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo": [
|
|
{
|
|
"bar": 1
|
|
},
|
|
{
|
|
"bar": 2
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": [1, 2]
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
This can cause some arrays to vanish:
|
|
|
|
[source,console,id=synthetic-source-leaf-arrays-example-sneaky]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo": [
|
|
{
|
|
"bar": 1
|
|
},
|
|
{
|
|
"baz": 2
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": 1,
|
|
"baz": 2
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
[[synthetic-source-modifications-field-names]]
|
|
====== Fields named as they are mapped
|
|
Synthetic source names fields as they are named in the mapping. When used
|
|
with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
|
|
default, interpreted as multiple objects, while dots in field names are
|
|
preserved within objects that have <<subobjects>> disabled. For example:
|
|
|
|
[source,console,id=synthetic-source-objecty-example]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo.bar.baz": 1
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": {
|
|
"baz": 1
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
[[synthetic-source-modifications-alphabetical]]
|
|
====== Alphabetical sorting
|
|
Synthetic `_source` fields are sorted alphabetically. The
|
|
https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as
|
|
"an unordered collection of zero or more name/value pairs" so applications
|
|
shouldn't care but without synthetic `_source` the original ordering is
|
|
preserved and some applications may, counter to the spec, do something with
|
|
that ordering.
|
|
|
|
[[synthetic-source-modifications-ranges]]
|
|
====== Representation of ranges
|
|
Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.
|