[DOCS] Focus retrieving selected fields on fields parameter (#71506)

* [DOCS] Focus retrieving selected fields on fields parameter

* Incorporating changes from reviews

* Adding clarifications from review feedback

* Slight wording revisions.

* Clarify language around format parameter and move text out of callout.
This commit is contained in:
Adam Locke 2021-04-20 15:11:35 -04:00 committed by GitHub
parent bfb85bcecb
commit 6dfd92c46f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 253 additions and 252 deletions

View file

@ -441,15 +441,22 @@ GET /my-data-stream/_eql/search?filter_path=-hits.events._source
"event.type", "event.type",
"process.*", <1> "process.*", <1>
{ {
"field": "@timestamp", <2> "field": "@timestamp",
"format": "epoch_millis" "format": "epoch_millis" <2>
} }
] ]
} }
---- ----
// TEST[setup:sec_logs] // TEST[setup:sec_logs]
include::{es-repo-dir}/search/search-your-data/retrieve-selected-fields.asciidoc[tag=fields-param-callouts] <1> Both full field names and wildcard patterns are accepted.
<2> Use the `format` parameter to apply a custom format for the field's values.
<<date,`date`>> and <<date_nanos, `date_nanos`>> fields accept a
<<mapping-date-format,date format>>. <<spatial_datatypes, Spatial fields>>
accept either `geojson` for http://www.geojson.org[GeoJSON] (the default)
or `wkt` for
{wikipedia}/Well-known_text_representation_of_geometry[Well Known Text].
Other field types do not support the `format` parameter.
The values are returned as a flat list in the `fields` section of each hit: The values are returned as a flat list in the `fields` section of each hit:

View file

@ -6,72 +6,57 @@
By default, each hit in the search response includes the document By default, each hit in the search response includes the document
<<mapping-source-field,`_source`>>, which is the entire JSON object that was <<mapping-source-field,`_source`>>, which is the entire JSON object that was
provided when indexing the document. To retrieve specific fields in the search provided when indexing the document. There are two recommended methods to
response, you can use the `fields` parameter: retrieve selected fields from a search query:
[source,console] * Use the <<search-fields-param,`fields` option>> to extract the values of
---- fields present in the index mapping
POST my-index-000001/_search * Use the <<source-filtering,`_source` option>> if you need to access the original data that was passed at index time
{
"query": {
"match": {
"message": "foo"
}
},
"fields": ["user.id", "@timestamp"],
"_source": false
}
----
// TEST[setup:my_index]
The `fields` parameter consults both a document's `_source` and the index You can use both of these methods, though the `fields` option is preferred
mappings to load and return values. Because it makes use of the mappings, because it consults both the document data and index mappings. In some
`fields` has some advantages over referencing the `_source` directly: it instances, you might want to use <<field-retrieval-methods,other methods>> of
accepts <<multi-fields, multi-fields>> and <<alias, field aliases>>, and retrieving data.
also formats field values like dates in a consistent way.
A document's `_source` is stored as a single field in Lucene. So the whole
`_source` object must be loaded and parsed even if only a small number of
fields are requested. To avoid this limitation, you can try another option for
loading fields:
* Use the <<docvalue-fields, `docvalue_fields`>>
parameter to get values for selected fields. This can be a good
choice when returning a fairly small number of fields that support doc values,
such as keywords and dates.
* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
get the values for specific stored fields (fields that use the
<<mapping-store,`store`>> mapping option).
You can also use the <<script-fields,`script_field`>> parameter to transform
field values in the response using a script.
You can find more detailed information on each of these methods in the
following sections:
* <<search-fields-param>>
* <<docvalue-fields>>
* <<stored-fields>>
* <<source-filtering>>
* <<script-fields>>
[discrete] [discrete]
[[search-fields-param]] [[search-fields-param]]
=== Fields === The `fields` option
// tag::fields-param-desc[] // tag::fields-param-desc[]
The `fields` parameter allows for retrieving a list of document fields in To retrieve specific fields in the search response, use the `fields` parameter.
the search response. It consults both the document `_source` and the index Because it consults the index mappings, the `fields` parameter provides several
mappings to return each value in a standardized way that matches its mapping advantages over referencing the `_source` directly. Specifically, the `fields`
type. By default, date fields are formatted according to the parameter:
<<mapping-date-format,date format>> parameter in their mappings. You can also
use the `fields` parameter to retrieve <<runtime-retrieving-fields,runtime field * Returns each value in a standardized way that matches its mapping type
values>>. * Accepts <<multi-fields,multi-fields>> and <<alias,field aliases>>
* Formats dates and spatial data types
* Retrieves <<runtime-retrieving-fields,runtime field values>>
* Returns fields calculated by a script at index time
// end::fields-param-desc[] // end::fields-param-desc[]
Other mapping options are also respected, including
<<ignore-above,`ignore_above`>>, <<ignore-malformed,`ignore_malformed`>>, and
<<null-value,`null_value`>>.
The `fields` option returns values in the way that matches how {es} indexes
them. For standard fields, this means that the `fields` option looks in
`_source` to find the values, then parses and formats them using the mappings.
[discrete]
[[search-fields-request]]
==== Search for specific fields
The following search request uses the `fields` parameter to retrieve values The following search request uses the `fields` parameter to retrieve values
for the `user.id` field, all fields starting with `http.response.`, and the for the `user.id` field, all fields starting with `http.response.`, and the
`@timestamp` field: `@timestamp` field.
Using object notation, you can pass a `format` parameter for certain fields to
apply a custom format for the field's values:
* <<date,`date`>> and <<date_nanos,`date_nanos`>> fields accept a <<mapping-date-format,date format>>
* <<spatial_datatypes, Spatial fields>> accept either `geojson` for http://www.geojson.org[GeoJSON] (the default) or `wkt` for
{wikipedia}/Well-known_text_representation_of_geometry[Well Known Text]
Other field types do not support the `format` parameter.
[source,console] [source,console]
---- ----
@ -94,32 +79,28 @@ POST my-index-000001/_search
} }
---- ----
// TEST[setup:my_index] // TEST[setup:my_index]
// TEST[s/_search/_search\?filter_path=hits/]
// tag::fields-param-callouts[]
<1> Both full field names and wildcard patterns are accepted. <1> Both full field names and wildcard patterns are accepted.
<2> Using object notation, you can pass a `format` parameter to apply a custom <2> Use the `format` parameter to apply a custom format for the field's values.
format for the field's values.
<<date,`date`>> and <<date_nanos, `date_nanos`>> fields accept a
<<mapping-date-format,date format>>. <<spatial_datatypes, Spatial fields>>
accept either `geojson` for http://www.geojson.org[GeoJSON] (the default)
or `wkt` for
{wikipedia}/Well-known_text_representation_of_geometry[Well Known Text].
Other field types do not support the `format` parameter.
// end::fields-param-callouts[]
The values are returned as a flat list in the `fields` section in each hit: [discrete]
[[search-fields-response]]
==== Response always returns an array
The `fields` response always returns an array of values for each field,
even when there is a single value in the `_source`. This is because {es} has
no dedicated array type, and any field could contain multiple values. The
`fields` parameter also does not guarantee that array values are returned in
a specific order. See the mapping documentation on <<array,arrays>> for more
background.
The response includes values as a flat list in the `fields` section for each
hit. Because the `fields` parameter doesn't fetch entire objects, only leaf
fields are returned.
[source,console-result] [source,console-result]
---- ----
{ {
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : { "hits" : {
"total" : { "total" : {
"value" : 1, "value" : 1,
@ -150,29 +131,12 @@ The values are returned as a flat list in the `fields` section in each hit:
} }
} }
---- ----
// TESTRESPONSE[s/"took" : 2/"took": $body.took/]
// TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/] // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/] // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
Only leaf fields are returned -- `fields` does not allow for fetching entire
objects.
The `fields` parameter handles field types like <<alias, field aliases>> and
<<constant-keyword-field-type, `constant_keyword`>> whose values aren't always present in
the `_source`. Other mapping options are also respected, including
<<ignore-above, `ignore_above`>>, <<ignore-malformed, `ignore_malformed`>> and
<<null-value, `null_value`>>.
NOTE: The `fields` response always returns an array of values for each field,
even when there is a single value in the `_source`. This is because {es} has
no dedicated array type, and any field could contain multiple values. The
`fields` parameter also does not guarantee that array values are returned in
a specific order. See the mapping documentation on <<array, arrays>> for more
background.
[discrete] [discrete]
[[search-fields-nested]] [[search-fields-nested]]
==== Handling of nested fields ==== Retrieve nested fields
The `fields` response for <<nested,`nested` fields>> is slightly different from that The `fields` response for <<nested,`nested` fields>> is slightly different from that
of regular object fields. While leaf values inside regular `object` fields are of regular object fields. While leaf values inside regular `object` fields are
@ -225,7 +189,7 @@ POST my-index-000001/_search
} }
-------------------------------------------------- --------------------------------------------------
the response will group `first` and `last` name instead of The response will group `first` and `last` name instead of
returning them as a flat list. returning them as a flat list.
[source,console-result] [source,console-result]
@ -269,8 +233,9 @@ returning them as a flat list.
// TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/] // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/] // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
Nested fields will be grouped by their nested paths, no matter the pattern used to retrieve them. Nested fields will be grouped by their nested paths, no matter the pattern used
For example, querying only for the `user.first` field in the example above: to retrieve them. For example, if you query only for the `user.first` field from
the previous example:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
@ -282,7 +247,8 @@ POST my-index-000001/_search
-------------------------------------------------- --------------------------------------------------
// TEST[continued] // TEST[continued]
will return only the users first name but still maintain the structure of the nested `user` array: The response returns only the user's first name, but still maintains the
structure of the nested `user` array:
[source,console-result] [source,console-result]
---- ----
@ -323,19 +289,19 @@ will return only the users first name but still maintain the structure of the ne
// TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/] // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
However, when the `fields` pattern targets the nested `user` field directly, no However, when the `fields` pattern targets the nested `user` field directly, no
values will be returned since the pattern doesn't match any leaf fields. values will be returned because the pattern doesn't match any leaf fields.
[discrete] [discrete]
[[retrieve-unmapped-fields]] [[retrieve-unmapped-fields]]
==== Retrieving unmapped fields ==== Retrieve unmapped fields
By default, the `fields` parameter returns only values of mapped fields.
However, {es} allows storing fields in `_source` that are unmapped, such as
setting <<dynamic-field-mapping,dynamic field mapping>> to `false` or by using
an object field with `enabled: false`. These options disable parsing and
indexing of the object content.
By default, the `fields` parameter returns only values of mapped fields. However, To retrieve unmapped fields in an object from `_source`, use the
Elasticsearch allows storing fields in `_source` that are unmapped, for example by `include_unmapped` option in the `fields` section:
setting <<dynamic-field-mapping,Dynamic field mapping>> to `false` or by using an
object field with `enabled: false`, thereby disabling parsing and indexing of its content.
Fields in such an object can be retrieved from `_source` using the `include_unmapped` option
in the `fields` section:
[source,console] [source,console]
---- ----
@ -372,9 +338,10 @@ POST my-index-000001/_search
<1> Disable all mappings. <1> Disable all mappings.
<2> Include unmapped fields matching this field pattern. <2> Include unmapped fields matching this field pattern.
The response will contain fields results under the `session_data.object.*` path even if the The response will contain field results under the `session_data.object.*` path,
fields are unmapped, but will not contain `user_id` since it is unmapped but the `include_unmapped` even if the fields are unmapped. The `user_id` field is also unmapped, but it
flag hasn't been set to `true` for that field pattern. won't be included in the response because `include_unmapped` isn't set to
`true` for that field pattern.
[source,console-result] [source,console-result]
---- ----
@ -412,137 +379,9 @@ flag hasn't been set to `true` for that field pattern.
// TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/] // TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
// TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/] // TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
[discrete]
[[docvalue-fields]]
=== Doc value fields
You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
<<doc-values,doc values>> for one or more fields in the search response.
Doc values store the same values as the `_source` but in an on-disk,
column-based structure that's optimized for sorting and aggregations. Since each
field is stored separately, {es} only reads the field values that were requested
and can avoid loading the whole document `_source`.
Doc values are stored for supported fields by default. However, doc values are
not supported for <<text,`text`>> or
{plugins}/mapper-annotated-text-usage.html[`text_annotated`] fields.
The following search request uses the `docvalue_fields` parameter to retrieve
doc values for the `user.id` field, all fields starting with `http.response.`, and the
`@timestamp` field:
[source,console]
----
GET my-index-000001/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
},
"docvalue_fields": [
"user.id",
"http.response.*", <1>
{
"field": "date",
"format": "epoch_millis" <2>
}
]
}
----
// TEST[setup:my_index]
<1> Both full field names and wildcard patterns are accepted.
<2> Using object notation, you can pass a `format` parameter to apply a custom
format for the field's doc values. <<date,Date fields>> support a
<<mapping-date-format,date `format`>>. <<number,Numeric fields>> support a
https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html[DecimalFormat
pattern]. Other field datatypes do not support the `format` parameter.
TIP: You cannot use the `docvalue_fields` parameter to retrieve doc values for
nested objects. If you specify a nested object, the search returns an empty
array (`[ ]`) for the field. To access nested fields, use the
<<inner-hits, `inner_hits`>> parameter's `docvalue_fields`
property.
[discrete]
[[stored-fields]]
=== Stored fields
It's also possible to store an individual field's values by using the
<<mapping-store,`store`>> mapping option. You can use the
`stored_fields` parameter to include these stored values in the search response.
WARNING: The `stored_fields` parameter is for fields that are explicitly marked as
stored in the mapping, which is off by default and generally not recommended.
Use <<source-filtering,source filtering>> instead to select
subsets of the original source document to be returned.
Allows to selectively load specific stored fields for each document represented
by a search hit.
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
`*` can be used to load all stored fields from the document.
An empty array will cause only the `_id` and `_type` for each hit to be
returned, for example:
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields" : [],
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
If the requested fields are not stored (`store` mapping set to `false`), they will be ignored.
Stored field values fetched from the document itself are always returned as an array. On the contrary, metadata fields like `_routing` are never returned as an array.
Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored.
NOTE: On its own, `stored_fields` cannot be used to load fields in nested
objects -- if a field contains a nested object in its path, then no data will
be returned for that stored field. To access nested fields, `stored_fields`
must be used within an <<inner-hits, `inner_hits`>> block.
[discrete]
[[disable-stored-fields]]
==== Disable stored fields
To disable the stored fields (and metadata fields) entirely use: `_none_`:
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields": "_none_",
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
[discrete] [discrete]
[[source-filtering]] [[source-filtering]]
=== Source filtering === The `_source` option
You can use the `_source` parameter to select what fields of the source are You can use the `_source` parameter to select what fields of the source are
returned. This is called _source filtering_. returned. This is called _source filtering_.
@ -625,9 +464,164 @@ GET /_search
} }
---- ----
[discrete]
[[field-retrieval-methods]]
=== Other methods of retrieving data
.Using `fields` is typically better
****
These options are usually not required. Using the `fields` option is typically
the better choice, unless you absolutely need to force loading a stored or
`docvalue_fields`.
****
A document's `_source` is stored as a single field in Lucene. This structure
means that the whole `_source` object must be loaded and parsed even if you're
only requesting part of it. To avoid this limitation, you can try other options
for loading fields:
* Use the <<docvalue-fields,`docvalue_fields`>>
parameter to get values for selected fields. This can be a good
choice when returning a fairly small number of fields that support doc values,
such as keywords and dates.
* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
get the values for specific stored fields (fields that use the
<<mapping-store,`store`>> mapping option).
{es} always attempts to load values from `_source`. This behavior has the same
implications of source filtering where {es} needs to load and parse the entire
`_source` to retrieve just one field.
[discrete]
[[docvalue-fields]]
==== Doc value fields
You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
<<doc-values,doc values>> for one or more fields in the search response.
Doc values store the same values as the `_source` but in an on-disk,
column-based structure that's optimized for sorting and aggregations. Since each
field is stored separately, {es} only reads the field values that were requested
and can avoid loading the whole document `_source`.
Doc values are stored for supported fields by default. However, doc values are
not supported for <<text,`text`>> or
{plugins}/mapper-annotated-text-usage.html[`text_annotated`] fields.
The following search request uses the `docvalue_fields` parameter to retrieve
doc values for the `user.id` field, all fields starting with `http.response.`, and the
`@timestamp` field:
[source,console]
----
GET my-index-000001/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
},
"docvalue_fields": [
"user.id",
"http.response.*", <1>
{
"field": "date",
"format": "epoch_millis" <2>
}
]
}
----
// TEST[setup:my_index]
<1> Both full field names and wildcard patterns are accepted.
<2> Using object notation, you can pass a `format` parameter to apply a custom
format for the field's doc values. <<date,Date fields>> support a
<<mapping-date-format,date `format`>>. <<number,Numeric fields>> support a
https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html[DecimalFormat
pattern]. Other field datatypes do not support the `format` parameter.
TIP: You cannot use the `docvalue_fields` parameter to retrieve doc values for
nested objects. If you specify a nested object, the search returns an empty
array (`[ ]`) for the field. To access nested fields, use the
<<inner-hits, `inner_hits`>> parameter's `docvalue_fields`
property.
[discrete]
[[stored-fields]]
==== Stored fields
It's also possible to store an individual field's values by using the
<<mapping-store,`store`>> mapping option. You can use the
`stored_fields` parameter to include these stored values in the search response.
WARNING: The `stored_fields` parameter is for fields that are explicitly marked as
stored in the mapping, which is off by default and generally not recommended.
Use <<source-filtering,source filtering>> instead to select
subsets of the original source document to be returned.
Allows to selectively load specific stored fields for each document represented
by a search hit.
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
`*` can be used to load all stored fields from the document.
An empty array will cause only the `_id` and `_type` for each hit to be
returned, for example:
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields" : [],
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
If the requested fields are not stored (`store` mapping set to `false`), they will be ignored.
Stored field values fetched from the document itself are always returned as an array. On the contrary, metadata fields like `_routing` are never returned as an array.
Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored.
NOTE: On its own, `stored_fields` cannot be used to load fields in nested
objects -- if a field contains a nested object in its path, then no data will
be returned for that stored field. To access nested fields, `stored_fields`
must be used within an <<inner-hits, `inner_hits`>> block.
[discrete]
[[disable-stored-fields]]
===== Disable stored fields
To disable the stored fields (and metadata fields) entirely use: `_none_`:
[source,console]
--------------------------------------------------
GET /_search
{
"stored_fields": "_none_",
"query" : {
"term" : { "user" : "kimchy" }
}
}
--------------------------------------------------
NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
[discrete] [discrete]
[[script-fields]] [[script-fields]]
=== Script fields ==== Script fields
You can use the `script_fields` parameter to retrieve a <<modules-scripting,script You can use the `script_fields` parameter to retrieve a <<modules-scripting,script
evaluation>> (based on different fields) for each hit. For example: evaluation>> (based on different fields) for each hit. For example:
@ -671,16 +665,16 @@ Here is an example:
[source,console] [source,console]
-------------------------------------------------- --------------------------------------------------
GET /_search GET /_search
{ {
"query" : { "query": {
"match_all": {} "match_all": {}
}, },
"script_fields" : { "script_fields": {
"test1" : { "test1": {
"script" : "params['_source']['message']" "script": "params['_source']['message']"
}
} }
} }
}
-------------------------------------------------- --------------------------------------------------
// TEST[setup:my_index] // TEST[setup:my_index]