mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 15:47:23 -04:00
Here we add synthetic source support for fields whose type is flattened. Note that flattened fields and synthetic source have the following limitations, all arising from the fact that in synthetic source we just see key/value pairs when reconstructing the original object and have no type information in mappings: * flattened fields use sorted set doc values of keywords, which means two things: first we do not allow duplicate values, second we treat all values as keywords * reconstructing array of objects results in nested objects (no array) * reconstructing arrays with just one element results in a single-value field since we have no way to distinguish single-valued from multi-values fields other then looking at the count of values
170 lines
4.6 KiB
Text
170 lines
4.6 KiB
Text
[[synthetic-source]]
|
|
==== Synthetic `_source`
|
|
|
|
IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
|
|
(indices that have `index.mode` set to `time_series`). For other indices
|
|
synthetic `_source` is in technical preview. Features in technical preview may
|
|
be changed or removed in a future release. Elastic will apply best effort to fix
|
|
any issues, but features in technical preview are not subject to the support SLA
|
|
of official GA features.
|
|
|
|
Though very handy to have around, the source field takes up a significant amount
|
|
of space on disk. Instead of storing source documents on disk exactly as you
|
|
send them, Elasticsearch can reconstruct source content on the fly upon retrieval.
|
|
Enable this by setting `mode: synthetic` in `_source`:
|
|
|
|
[source,console,id=enable-synthetic-source-example]
|
|
----
|
|
PUT idx
|
|
{
|
|
"mappings": {
|
|
"_source": {
|
|
"mode": "synthetic"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TESTSETUP
|
|
|
|
While this on the fly reconstruction is *generally* slower than saving the source
|
|
documents verbatim and loading them at query time, it saves a lot of storage
|
|
space. There are a couple of restrictions to be aware of:
|
|
|
|
* When you retrieve synthetic `_source` content it undergoes minor
|
|
<<synthetic-source-modifications,modifications>> compared to the original JSON.
|
|
* The `params._source` is unavailable in scripts. Instead use the
|
|
{painless}/painless-field-context.html[`doc`] API or the <<script-fields-api, `field`>>.
|
|
* Synthetic `_source` can be used with indices that contain only these field
|
|
types:
|
|
|
|
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
|
|
** <<boolean-synthetic-source,`boolean`>>
|
|
** <<numeric-synthetic-source,`byte`>>
|
|
** <<date-synthetic-source,`date`>>
|
|
** <<date-nanos-synthetic-source,`date_nanos`>>
|
|
** <<dense-vector-synthetic-source,`dense_vector`>>
|
|
** <<numeric-synthetic-source,`double`>>
|
|
** <<flattened-synthetic-source, `flattened`>>
|
|
** <<numeric-synthetic-source,`float`>>
|
|
** <<geo-point-synthetic-source,`geo_point`>>
|
|
** <<numeric-synthetic-source,`half_float`>>
|
|
** <<histogram-synthetic-source,`histogram`>>
|
|
** <<numeric-synthetic-source,`integer`>>
|
|
** <<ip-synthetic-source,`ip`>>
|
|
** <<keyword-synthetic-source,`keyword`>>
|
|
** <<numeric-synthetic-source,`long`>>
|
|
** <<numeric-synthetic-source,`scaled_float`>>
|
|
** <<numeric-synthetic-source,`short`>>
|
|
** <<text-synthetic-source,`text`>>
|
|
** <<version-synthetic-source,`version`>>
|
|
** <<wildcard-synthetic-source,`wildcard`>>
|
|
|
|
Runtime fields cannot, at this stage, use synthetic `_source`.
|
|
|
|
[[synthetic-source-modifications]]
|
|
===== Synthetic `_source` modifications
|
|
|
|
When synthetic `_source` is enabled, retrieved documents undergo some
|
|
modifications compared to the original JSON.
|
|
|
|
[[synthetic-source-modifications-leaf-arrays]]
|
|
====== Arrays moved to leaf fields
|
|
Synthetic `_source` arrays are moved to leaves. For example:
|
|
|
|
[source,console,id=synthetic-source-leaf-arrays-example]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo": [
|
|
{
|
|
"bar": 1
|
|
},
|
|
{
|
|
"bar": 2
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": [1, 2]
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
This can cause some arrays to vanish:
|
|
|
|
[source,console,id=synthetic-source-leaf-arrays-example-sneaky]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo": [
|
|
{
|
|
"bar": 1
|
|
},
|
|
{
|
|
"baz": 2
|
|
}
|
|
]
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": 1,
|
|
"baz": 2
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
[[synthetic-source-modifications-field-names]]
|
|
====== Fields named as they are mapped
|
|
Synthetic source names fields as they are named in the mapping. When used
|
|
with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
|
|
default, interpreted as multiple objects, while dots in field names are
|
|
preserved within objects that have <<subobjects>> disabled. For example:
|
|
|
|
[source,console,id=synthetic-source-objecty-example]
|
|
----
|
|
PUT idx/_doc/1
|
|
{
|
|
"foo.bar.baz": 1
|
|
}
|
|
----
|
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
|
|
|
Will become:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"foo": {
|
|
"bar": {
|
|
"baz": 1
|
|
}
|
|
}
|
|
}
|
|
----
|
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
|
|
|
[[synthetic-source-modifications-alphabetical]]
|
|
====== Alphabetical sorting
|
|
Synthetic `_source` fields are sorted alphabetically. The
|
|
https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as
|
|
"an unordered collection of zero or more name/value pairs" so applications
|
|
shouldn't care but without synthetic `_source` the original ordering is
|
|
preserved and some applications may, counter to the spec, do something with
|
|
that ordering.
|