[[synthetic-source]] ==== Synthetic `_source` IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices (indices that have `index.mode` set to `time_series`). For other indices synthetic `_source` is in technical preview. Features in technical preview may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. Though very handy to have around, the source field takes up a significant amount of space on disk. Instead of storing source documents on disk exactly as you send them, Elasticsearch can reconstruct source content on the fly upon retrieval. Enable this by setting `mode: synthetic` in `_source`: [source,console,id=enable-synthetic-source-example] ---- PUT idx { "mappings": { "_source": { "mode": "synthetic" } } } ---- // TESTSETUP While this on the fly reconstruction is *generally* slower than saving the source documents verbatim and loading them at query time, it saves a lot of storage space. Additional latency can be avoided by not loading `_source` field in queries when it is not needed. [[synthetic-source-fields]] ===== Supported fields Synthetic `_source` is supported by all field types. Depending on implementation details, field types have different properties when used with synthetic `_source`. <> construct synthetic `_source` using existing data, most commonly <> and <>. For these field types, no additional space is needed to store the contents of `_source` field. Due to the storage layout of <>, the generated `_source` field undergoes <> compared to original document. For all other field types, the original value of the field is stored as is, in the same way as the `_source` field in non-synthetic mode. In this case there are no modifications and field data in `_source` is the same as in the original document. Similarly, malformed values of fields that use <> or <> need to be stored as is. This approach is less storage efficient since data needed for `_source` reconstruction is stored in addition to other data required to index the field (like `doc_values`). [[synthetic-source-restrictions]] ===== Synthetic `_source` restrictions Synthetic `_source` cannot be used together with field mappings that use <>. Some field types have additional restrictions. These restrictions are documented in the **synthetic `_source`** section of the field type's <>. [[synthetic-source-modifications]] ===== Synthetic `_source` modifications When synthetic `_source` is enabled, retrieved documents undergo some modifications compared to the original JSON. [[synthetic-source-modifications-leaf-arrays]] ====== Arrays moved to leaf fields Synthetic `_source` arrays are moved to leaves. For example: [source,console,id=synthetic-source-leaf-arrays-example] ---- PUT idx/_doc/1 { "foo": [ { "bar": 1 }, { "bar": 2 } ] } ---- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/] Will become: [source,console-result] ---- { "foo": { "bar": [1, 2] } } ---- // TEST[s/^/{"_source":/ s/\n$/}/] This can cause some arrays to vanish: [source,console,id=synthetic-source-leaf-arrays-example-sneaky] ---- PUT idx/_doc/1 { "foo": [ { "bar": 1 }, { "baz": 2 } ] } ---- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/] Will become: [source,console-result] ---- { "foo": { "bar": 1, "baz": 2 } } ---- // TEST[s/^/{"_source":/ s/\n$/}/] [[synthetic-source-modifications-field-names]] ====== Fields named as they are mapped Synthetic source names fields as they are named in the mapping. When used with <>, fields with dots (`.`) in their names are, by default, interpreted as multiple objects, while dots in field names are preserved within objects that have <> disabled. For example: [source,console,id=synthetic-source-objecty-example] ---- PUT idx/_doc/1 { "foo.bar.baz": 1 } ---- // TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/] Will become: [source,console-result] ---- { "foo": { "bar": { "baz": 1 } } } ---- // TEST[s/^/{"_source":/ s/\n$/}/] [[synthetic-source-modifications-alphabetical]] ====== Alphabetical sorting Synthetic `_source` fields are sorted alphabetically. The https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as "an unordered collection of zero or more name/value pairs" so applications shouldn't care but without synthetic `_source` the original ordering is preserved and some applications may, counter to the spec, do something with that ordering. [[synthetic-source-modifications-ranges]] ====== Representation of ranges Range field values (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <>. [[synthetic-source-precision-loss-for-point-types]] ====== Reduced precision of `geo_point` values Values of `geo_point` fields are represented in synthetic `_source` with reduced precision. See <>. [[synthetic-source-fields-native-list]] ===== Field types that support synthetic source with no storage overhead The following field types support synthetic source using data from <> or <>, and require no additional storage space to construct the `_source` field. NOTE: If you enable the <> or <> settings, then additional storage is required to store ignored field values for these types. ** <> ** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`] ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <> ** <>