mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
* Add warning admonition for removing runtime fields. * Add cross-link to runtime fields. * Expanding examples for runtime fields in a search request. * Clarifying language and simplifying response tests.
1126 lines
34 KiB
Text
1126 lines
34 KiB
Text
[[runtime]]
|
||
== Runtime fields
|
||
A _runtime field_ is a field that is evaluated at query time. Runtime fields
|
||
enable you to:
|
||
|
||
* Add fields to existing documents without reindexing your data
|
||
* Start working with your data without understanding how it’s structured
|
||
* Override the value returned from an indexed field at query time
|
||
* Define fields for a specific use without modifying the underlying schema
|
||
|
||
You access runtime fields from the search API like any other field, and {es}
|
||
sees runtime fields no differently. You can define runtime fields in the
|
||
<<runtime-mapping-fields,index mapping>> or in the
|
||
<<runtime-search-request,search request>>. Your choice, which is part of the
|
||
inherent flexibility of runtime fields.
|
||
|
||
Runtime fields are useful when working with log data
|
||
(see <<runtime-examples,examples>>), especially when you're unsure about the
|
||
data structure. Your search speed decreases, but your index size is much
|
||
smaller and you can more quickly process logs without having to index them.
|
||
|
||
[discrete]
|
||
[[runtime-benefits]]
|
||
=== Benefits
|
||
Because runtime fields aren't indexed, adding a runtime field doesn't increase
|
||
the index size. You define runtime fields directly in the index mapping, saving
|
||
storage costs and increasing ingestion speed. You can more quickly ingest
|
||
data into the Elastic Stack and access it right away. When you define a runtime
|
||
field, you can immediately use it in search requests, aggregations, filtering,
|
||
and sorting.
|
||
|
||
If you make a runtime field an indexed field, you don't need to modify any
|
||
queries that refer to the runtime field. Better yet, you can refer to some
|
||
indices where the field is a runtime field, and other indices where the field
|
||
is an indexed field. You have the flexibility to choose which fields to index
|
||
and which ones to keep as runtime fields.
|
||
|
||
At its core, the most important benefit of runtime fields is the ability to
|
||
add fields to documents after you've ingested them. This capability simplifies
|
||
mapping decisions because you don't have to decide how to parse your data up
|
||
front, and can use runtime fields to amend the mapping at any time. Using
|
||
runtime fields allows for a smaller index and faster ingest time, which
|
||
combined use less resources and reduce your operating costs.
|
||
|
||
[discrete]
|
||
[[runtime-compromises]]
|
||
=== Compromises
|
||
Runtime fields use less disk space and provide flexibility in how you access
|
||
your data, but can impact search performance based on the computation defined in
|
||
the runtime script.
|
||
|
||
To balance search performance and flexibility, index fields that you'll
|
||
commonly search for and filter on, such as a timestamp. {es} automatically uses
|
||
these indexed fields first when running a query, resulting in a fast response
|
||
time. You can then use runtime fields to limit the number of fields that {es}
|
||
needs to calculate values for. Using indexed fields in tandem with runtime
|
||
fields provides flexibility in the data that you index and how you define
|
||
queries for other fields.
|
||
|
||
Use the <<async-search,asynchronous search API>> to run searches that include
|
||
runtime fields. This method of search helps to offset the performance impacts
|
||
of computing values for runtime fields in each document containing that field.
|
||
If the query can't return the result set synchronously, you'll get results
|
||
asynchronously as they become available.
|
||
|
||
IMPORTANT: Queries against runtime fields are considered expensive. If
|
||
<<query-dsl-allow-expensive-queries,`search.allow_expensive_queries`>> is set
|
||
to `false`, expensive queries are not allowed and {es} will reject any queries
|
||
against runtime fields.
|
||
|
||
[[runtime-mapping-fields]]
|
||
=== Map a runtime field
|
||
You map runtime fields by adding a `runtime` section under the mapping
|
||
definition and defining
|
||
<<modules-scripting-using,a Painless script>>. This script has access to the
|
||
entire context of a document, including the original `_source` and any mapped
|
||
fields plus their values. At query time, the script runs and generates values
|
||
for each scripted field that is required for the query.
|
||
|
||
.Emitting runtime field values
|
||
****
|
||
When defining a Painless script to use with runtime fields, you must include
|
||
the {painless}/painless-runtime-fields-context.html[`emit` method] to emit
|
||
calculated values.
|
||
****
|
||
|
||
For example, the script in the following request calculates the day of the week
|
||
from the `@timestamp` field, which is defined as a `date` type. The script
|
||
calculates the day of the week based on the value of `timestamp`, and uses
|
||
`emit` to return the calculated value.
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/
|
||
{
|
||
"mappings": {
|
||
"runtime": {
|
||
"day_of_week": {
|
||
"type": "keyword",
|
||
"script": {
|
||
"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
|
||
}
|
||
}
|
||
},
|
||
"properties": {
|
||
"@timestamp": {"type": "date"}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
The `runtime` section can be any of these data types:
|
||
|
||
// tag::runtime-data-types[]
|
||
* `boolean`
|
||
* `date`
|
||
* `double`
|
||
* `geo_point`
|
||
* `ip`
|
||
* `keyword`
|
||
* `long`
|
||
// end::runtime-data-types[]
|
||
|
||
Runtime fields with a `type` of `date` can accept the
|
||
<<mapping-date-format,`format`>> parameter exactly as the `date` field type.
|
||
|
||
If <<dynamic-field-mapping,dynamic field mapping>> is enabled where the
|
||
`dynamic` parameter is set to `runtime`, new fields are automatically added to
|
||
the index mapping as runtime fields:
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index
|
||
{
|
||
"mappings": {
|
||
"dynamic": "runtime",
|
||
"properties": {
|
||
"@timestamp": {
|
||
"type": "date"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[[runtime-fields-scriptless]]
|
||
==== Define runtime fields without a script
|
||
You can define a runtime field in the mapping definition without a
|
||
script. At query time, {es} looks in `_source` for a field with the same name
|
||
and returns a value if one exists. If a field with the same name doesn’t
|
||
exist, the response doesn't include any values for that runtime field.
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/
|
||
{
|
||
"mappings": {
|
||
"runtime": {
|
||
"day_of_week": {
|
||
"type": "keyword"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[[runtime-updating-scripts]]
|
||
==== Updating and removing runtime fields
|
||
|
||
You can update or remove runtime fields at any time. To replace an existing
|
||
runtime field, add a new runtime field to the mappings with the same name. To
|
||
remove a runtime field from the mappings, set the value of the runtime field to
|
||
`null`:
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/_mapping
|
||
{
|
||
"runtime": {
|
||
"day_of_week": null
|
||
}
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
.Downstream impacts
|
||
****
|
||
Updating or removing a runtime field while a dependent query is running can return
|
||
inconsistent results. Each shard might have access to different versions of the
|
||
script, depending on when the mapping change takes effect.
|
||
|
||
WARNING: Existing queries or visualizations in {kib} that rely on runtime fields can
|
||
fail if you remove or update the field. For example, a bar chart visualization
|
||
that uses a runtime field of type `ip` will fail if the type is changed
|
||
to `boolean`, or if the runtime field is removed.
|
||
****
|
||
|
||
[[runtime-search-request]]
|
||
=== Define runtime fields in a search request
|
||
You can specify a `runtime_mappings` section in a search request to create
|
||
runtime fields that exist only as part of the query. You specify a script
|
||
as part of the `runtime_mappings` section, just as you would if
|
||
<<runtime-mapping-fields,adding a runtime field to the mappings>>.
|
||
|
||
Defining a runtime field in a search request uses the same format as defining
|
||
a runtime field in the index mapping. Just copy the field definition from
|
||
the `runtime_mappings` in the search request to the `runtime` section of the
|
||
index mapping.
|
||
|
||
The following search request adds a `day_of_week` field to the
|
||
`runtime_mappings` section. The field values will be calculated dynamically,
|
||
and only within the context of this search request:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"runtime_mappings": {
|
||
"day_of_week": {
|
||
"type": "keyword",
|
||
"script": {
|
||
"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
|
||
}
|
||
}
|
||
},
|
||
"aggs": {
|
||
"day_of_week": {
|
||
"terms": {
|
||
"field": "day_of_week"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
[[runtime-search-request-examples]]
|
||
[discrete]
|
||
=== Create runtime fields that use other runtime fields
|
||
You can even define runtime fields in a search request that return values from
|
||
other runtime fields. For example, let's say you bulk index some sensor data:
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index/_bulk?refresh=true
|
||
{"index":{}}
|
||
{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":"5.2","start": "300","end":"8675309"}}
|
||
{"index":{}}
|
||
{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":"5.8","start": "300","end":"8675309"}}
|
||
{"index":{}}
|
||
{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":"5.1","start": "300","end":"8675309"}}
|
||
{"index":{}}
|
||
{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":"5.6","start": "300","end":"8675309"}}
|
||
{"index":{}}
|
||
{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":"4.2","start": "400","end":"8625309"}}
|
||
{"index":{}}
|
||
{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":"4.0","start": "400","end":"8625309"}}
|
||
----
|
||
|
||
You realize after indexing that your numeric data was mapped as type `text`.
|
||
You want to aggregate on the `measures.start` and `measures.end` fields, but
|
||
the aggregation fails because you can't aggregate on fields of type `text`.
|
||
Runtime fields to the rescue! You can add runtime fields with the same name as
|
||
your indexed fields and modify the data type:
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/_mapping
|
||
{
|
||
"runtime": {
|
||
"measures.start": {
|
||
"type": "long"
|
||
},
|
||
"measures.end": {
|
||
"type": "long"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
Runtime fields take precedence over fields defined with the same name in the
|
||
index mappings. This flexibility allows you to shadow existing fields and
|
||
calculate a different value, without modifying the field itself. If you made a
|
||
mistake in your index mapping, you can use runtime fields to calculate values
|
||
that <<runtime-override-values,override values>> in the mapping during the
|
||
search request.
|
||
|
||
Now, you can easily run an
|
||
<<search-aggregations-metrics-avg-aggregation,average aggregation>> on the
|
||
`measures.start` and `measures.end` fields:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"aggs": {
|
||
"avg_start": {
|
||
"avg": {
|
||
"field": "measures.start"
|
||
}
|
||
},
|
||
"avg_end": {
|
||
"avg": {
|
||
"field": "measures.end"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
// TEST[s/_search/_search\?filter_path=aggregations/]
|
||
|
||
The response includes the aggregation results without changing the values for
|
||
the underlying data:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"aggregations" : {
|
||
"avg_start" : {
|
||
"value" : 333.3333333333333
|
||
},
|
||
"avg_end" : {
|
||
"value" : 8658642.333333334
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
Further, you can define a runtime field as part of a search query that
|
||
calculates a value, and then run a
|
||
<<search-aggregations-metrics-stats-aggregation,stats aggregation>> on that
|
||
field _in the same query_.
|
||
|
||
The `duration` runtime field doesn't exist in the index mapping, but we can
|
||
still search and aggregate on that field. The following query returns the
|
||
calculated value for the `duration` field and runs a stats aggregation to
|
||
compute statistics over numeric values extracted from the aggregated documents.
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"runtime_mappings": {
|
||
"duration": {
|
||
"type": "long",
|
||
"script": {
|
||
"source": """
|
||
emit(doc['measures.end'].value - doc['measures.start'].value);
|
||
"""
|
||
}
|
||
}
|
||
},
|
||
"aggs": {
|
||
"duration_stats": {
|
||
"stats": {
|
||
"field": "duration"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
// TEST[s/_search/_search\?filter_path=aggregations/]
|
||
|
||
Even though the `duration` runtime field only exists in the context of a search
|
||
query, you can search and aggregate on that field. This flexibility is
|
||
incredibly powerful, enabling you to rectify mistakes in your index mappings
|
||
and dynamically complete calculations all within a single search request.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"aggregations" : {
|
||
"duration_stats" : {
|
||
"count" : 6,
|
||
"min" : 8624909.0,
|
||
"max" : 8675009.0,
|
||
"avg" : 8658309.0,
|
||
"sum" : 5.1949854E7
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[[runtime-override-values]]
|
||
=== Override field values at query time
|
||
If you create a runtime field with the same name as a field that
|
||
already exists in the mapping, the runtime field shadows the mapped field. At
|
||
query time, {es} evaluates the runtime field, calculates a value based on the
|
||
script, and returns the value as part of the query. Because the runtime field
|
||
shadows the mapped field, you can override the value returned in search without
|
||
modifying the mapped field.
|
||
|
||
For example, let's say you indexed the following documents into `my-index`:
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index/_bulk?refresh=true
|
||
{"index":{}}
|
||
{"@timestamp":1516729294000,"model_number":"QVKC92Q","measures":{"voltage":5.2}}
|
||
{"index":{}}
|
||
{"@timestamp":1516642894000,"model_number":"QVKC92Q","measures":{"voltage":5.8}}
|
||
{"index":{}}
|
||
{"@timestamp":1516556494000,"model_number":"QVKC92Q","measures":{"voltage":5.1}}
|
||
{"index":{}}
|
||
{"@timestamp":1516470094000,"model_number":"QVKC92Q","measures":{"voltage":5.6}}
|
||
{"index":{}}
|
||
{"@timestamp":1516383694000,"model_number":"HG537PU","measures":{"voltage":4.2}}
|
||
{"index":{}}
|
||
{"@timestamp":1516297294000,"model_number":"HG537PU","measures":{"voltage":4.0}}
|
||
----
|
||
|
||
You later realize that the `HG537PU` sensors aren't reporting their true
|
||
voltage. The indexed values are supposed to be 1.7 times higher than
|
||
the reported values! Instead of reindexing your data, you can define a script in
|
||
the `runtime_mappings` section of the `_search` request to shadow the `voltage`
|
||
field and calculate a new value at query time.
|
||
|
||
If you search for documents where the model number matches `HG537PU`:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"model_number": "HG537PU"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
The response includes indexed values for documents matching model number
|
||
`HG537PU`:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 2,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0296195,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "F1BeSXYBg_szTodcYCmk",
|
||
"_score" : 1.0296195,
|
||
"_source" : {
|
||
"@timestamp" : 1516383694000,
|
||
"model_number" : "HG537PU",
|
||
"measures" : {
|
||
"voltage" : 4.2
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "l02aSXYBkpNf6QRDO62Q",
|
||
"_score" : 1.0296195,
|
||
"_source" : {
|
||
"@timestamp" : 1516297294000,
|
||
"model_number" : "HG537PU",
|
||
"measures" : {
|
||
"voltage" : 4.0
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "F1BeSXYBg_szTodcYCmk"/"_id": $body.hits.hits.0._id/]
|
||
// TESTRESPONSE[s/"_id" : "l02aSXYBkpNf6QRDO62Q"/"_id": $body.hits.hits.1._id/]
|
||
|
||
The following request defines a runtime field where the script evaluates the
|
||
`model_number` field where the value is `HG537PU`. For each match, the script
|
||
multiplies the value for the `voltage` field by `1.7`.
|
||
|
||
Using the <<search-fields,`fields`>> parameter on the `_search` API, you can
|
||
retrieve the value that the script calculates for the `measures.voltage` field
|
||
for documents matching the search request:
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index/_search
|
||
{
|
||
"runtime_mappings": {
|
||
"measures.voltage": {
|
||
"type": "double",
|
||
"script": {
|
||
"source":
|
||
"""if (doc['model_number.keyword'].value.equals('HG537PU'))
|
||
{emit(1.7 * params._source['measures']['voltage']);}
|
||
else{emit(params._source['measures']['voltage']);}"""
|
||
}
|
||
}
|
||
},
|
||
"query": {
|
||
"match": {
|
||
"model_number": "HG537PU"
|
||
}
|
||
},
|
||
"fields": ["measures.voltage"]
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
Looking at the response, the calculated values for `measures.voltage` on each
|
||
result are `7.14` and `6.8`. That's more like it! The runtime field calculated
|
||
this value as part of the search request without modifying the mapped value,
|
||
which still returns in the response:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 2,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0296195,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "F1BeSXYBg_szTodcYCmk",
|
||
"_score" : 1.0296195,
|
||
"_source" : {
|
||
"@timestamp" : 1516383694000,
|
||
"model_number" : "HG537PU",
|
||
"measures" : {
|
||
"voltage" : 4.2
|
||
}
|
||
},
|
||
"fields" : {
|
||
"measures.voltage" : [
|
||
7.14
|
||
]
|
||
}
|
||
},
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "l02aSXYBkpNf6QRDO62Q",
|
||
"_score" : 1.0296195,
|
||
"_source" : {
|
||
"@timestamp" : 1516297294000,
|
||
"model_number" : "HG537PU",
|
||
"measures" : {
|
||
"voltage" : 4.0
|
||
}
|
||
},
|
||
"fields" : {
|
||
"measures.voltage" : [
|
||
6.8
|
||
]
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "F1BeSXYBg_szTodcYCmk"/"_id": $body.hits.hits.0._id/]
|
||
// TESTRESPONSE[s/"_id" : "l02aSXYBkpNf6QRDO62Q"/"_id": $body.hits.hits.1._id/]
|
||
|
||
[[runtime-retrieving-fields]]
|
||
=== Retrieve a runtime field
|
||
Use the <<search-fields,`fields`>> parameter on the `_search` API to retrieve
|
||
the values of runtime fields. Runtime fields won't display in `_source`, but
|
||
the `fields` API works for all fields, even those that were not sent as part of
|
||
the original `_source`.
|
||
|
||
[discrete]
|
||
[[runtime-define-field-dayofweek]]
|
||
==== Define a runtime field to calculate the day of week
|
||
For example, the following request adds a runtime field called `day_of_week`.
|
||
The runtime field includes a script that calculates the day of the week based
|
||
on the value of the `@timestamp` field. We'll include `"dynamic":"runtime"` in
|
||
the request so that new fields are added to the mapping as runtime fields.
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/
|
||
{
|
||
"mappings": {
|
||
"dynamic": "runtime",
|
||
"runtime": {
|
||
"day_of_week": {
|
||
"type": "keyword",
|
||
"script": {
|
||
"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
|
||
}
|
||
}
|
||
},
|
||
"properties": {
|
||
"@timestamp": {"type": "date"}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[discrete]
|
||
[[runtime-ingest-data]]
|
||
==== Ingest some data
|
||
Let's ingest some sample data, which will result in two indexed fields:
|
||
`@timestamp` and `message`.
|
||
|
||
[source,console]
|
||
----
|
||
POST /my-index/_bulk?refresh
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-06-21T15:00:01-05:00", "message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:30:17-05:00", "message" : "40.135.0.0 - - [2020-04-30T14:30:17-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:30:53-05:00", "message" : "232.0.0.0 - - [2020-04-30T14:30:53-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:12-05:00", "message" : "26.1.0.0 - - [2020-04-30T14:31:12-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:19-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:19-05:00] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:27-05:00", "message" : "252.0.0.0 - - [2020-04-30T14:31:27-05:00] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET /images/hm_brdl.gif HTTP/1.0\" 304 0"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:29-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:29-05:00] \"GET /images/hm_arw.gif HTTP/1.0\" 304 0"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:32-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:32-05:00] \"GET /images/nav_bg_top.gif HTTP/1.0\" 200 929"}
|
||
{ "index": {}}
|
||
{ "@timestamp": "2020-04-30T14:31:43-05:00", "message" : "247.37.0.0 - - [2020-04-30T14:31:43-05:00] \"GET /french/images/nav_venue_off.gif HTTP/1.0\" 304 0"}
|
||
----
|
||
//TEST[continued]
|
||
|
||
[discrete]
|
||
[[runtime-search-dayofweek]]
|
||
==== Search for the calculated day of week
|
||
The following request uses the search API to retrieve the `day_of_week` field
|
||
that the original request defined as a runtime field in the mapping. The value
|
||
for this field is calculated dynamically at query time without reindexing
|
||
documents or indexing the `day_of_week` field. This flexibility allows you to
|
||
modify the mapping without changing any field values.
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"fields": [
|
||
"@timestamp",
|
||
"day_of_week"
|
||
],
|
||
"_source": false
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
The previous request returns the `day_of_week` field for all matching documents.
|
||
We can define another runtime field called `client_ip` that also operates on
|
||
the `message` field and will further refine the query:
|
||
|
||
[source,console]
|
||
----
|
||
PUT /my-index/_mapping
|
||
{
|
||
"runtime": {
|
||
"client_ip": {
|
||
"type": "ip",
|
||
"script" : {
|
||
"source" : "String m = doc[\"message\"].value; int end = m.indexOf(\" \"); emit(m.substring(0, end));"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
Run another query, but search for a specific IP address using the `client_ip`
|
||
runtime field:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"size": 1,
|
||
"query": {
|
||
"match": {
|
||
"client_ip": "211.11.9.0"
|
||
}
|
||
},
|
||
"fields" : ["*"]
|
||
}
|
||
----
|
||
//TEST[continued]
|
||
|
||
This time, the response includes only two hits. The value for `day_of_week`
|
||
(`Sunday`) was calculated at query time using the runtime script defined in the
|
||
mapping, and the result includes only documents matching the `211.11.9.0` IP
|
||
address.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 2,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "oWs5KXYB-XyJbifr9mrz",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"@timestamp" : "2020-06-21T15:00:01-05:00",
|
||
"message" : "211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"
|
||
},
|
||
"fields" : {
|
||
"@timestamp" : [
|
||
"2020-06-21T20:00:01.000Z"
|
||
],
|
||
"client_ip" : [
|
||
"211.11.9.0"
|
||
],
|
||
"message" : [
|
||
"211.11.9.0 - - [2020-06-21T15:00:01-05:00] \"GET /english/index.html HTTP/1.0\" 304 0"
|
||
],
|
||
"day_of_week" : [
|
||
"Sunday"
|
||
]
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "oWs5KXYB-XyJbifr9mrz"/"_id": $body.hits.hits.0._id/]
|
||
// TESTRESPONSE[s/"day_of_week" : \[\n\s+"Sunday"\n\s\]/"day_of_week": $body.hits.hits.0.fields.day_of_week/]
|
||
|
||
|
||
[[runtime-examples]]
|
||
=== Explore your data with runtime fields
|
||
Consider a large set of log data that you want to extract fields from.
|
||
Indexing the data is time consuming and uses a lot of disk space, and you just
|
||
want to explore the data structure without committing to a schema up front.
|
||
|
||
You know that your log data contains specific fields that you want to extract.
|
||
In this case, we want to focus on the `@timestamp` and `message` fields. By
|
||
using runtime fields, you can define scripts to calculate values at search
|
||
time for these fields.
|
||
|
||
[[runtime-examples-define-fields]]
|
||
==== Define indexed fields as a starting point
|
||
|
||
You can start with a simple example by adding the `@timestamp` and `message`
|
||
fields to the `my-index` mapping as indexed fields. To remain flexible, use
|
||
`wildcard` as the field type for `message`:
|
||
|
||
[source,console]
|
||
----
|
||
PUT /my-index/
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"@timestamp": {
|
||
"format": "strict_date_optional_time||epoch_second",
|
||
"type": "date"
|
||
},
|
||
"message": {
|
||
"type": "wildcard"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[[runtime-examples-ingest-data]]
|
||
==== Ingest some data
|
||
After mapping the fields you want to retrieve, index a few records from
|
||
your log data into {es}. The following request uses the <<docs-bulk,bulk API>>
|
||
to index raw log data into `my-index`. Instead of indexing all of your log
|
||
data, you can use a small sample to experiment with runtime fields.
|
||
|
||
The final document is not a valid Apache log format, but we can account for
|
||
that scenario in our script.
|
||
|
||
[source,console]
|
||
----
|
||
POST /my-index/_bulk?refresh
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
|
||
{"index":{}}
|
||
{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}
|
||
----
|
||
// TEST[continued]
|
||
|
||
At this point, you can view how {es} stores your raw data.
|
||
|
||
[source,console]
|
||
----
|
||
GET /my-index
|
||
----
|
||
// TEST[continued]
|
||
|
||
The mapping contains two fields: `@timestamp` and `message`.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"my-index" : {
|
||
"aliases" : { },
|
||
"mappings" : {
|
||
"properties" : {
|
||
"@timestamp" : {
|
||
"type" : "date",
|
||
"format" : "strict_date_optional_time||epoch_second"
|
||
},
|
||
"message" : {
|
||
"type" : "wildcard"
|
||
},
|
||
"timestamp" : {
|
||
"type" : "date"
|
||
}
|
||
}
|
||
},
|
||
...
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"settings": $body.my-index.settings/]
|
||
|
||
[[runtime-examples-grok]]
|
||
==== Define a runtime field with a grok pattern
|
||
If you want to retrieve results that include `clientip`, you can add that
|
||
field as a runtime field in the mapping. The following runtime script defines a
|
||
grok pattern that extracts structured fields out of a single text
|
||
field within a document. A grok pattern is like a regular expression that
|
||
supports aliased expressions that you can reuse. See <<grok-basics,Grok basics>> to learn more about grok syntax.
|
||
|
||
The script matches on the `%{COMMONAPACHELOG}` log pattern, which understands
|
||
the structure of Apache logs. If the pattern matches, the script emits the
|
||
value matching IP address. If the pattern doesn't match
|
||
(`clientip != null`), the script just returns the field value without crashing.
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/_mappings
|
||
{
|
||
"runtime": {
|
||
"http.clientip": {
|
||
"type": "ip",
|
||
"script": """
|
||
String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;
|
||
if (clientip != null) emit(clientip); <1>
|
||
"""
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
<1> This condition ensures that the script doesn't crash even if the pattern of
|
||
the message doesn't match.
|
||
|
||
[[runtime-examples-grok-ip]]
|
||
===== Search for a specific IP address
|
||
Using the `http.clientip` runtime field, you can define a simple query to run a
|
||
search for a specific IP address and return all related fields.
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"http.clientip": "40.135.0.0"
|
||
}
|
||
},
|
||
"fields" : ["*"]
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
The API returns the following result. Without building your data structure in
|
||
advance, you can search and explore your data in meaningful ways to experiment
|
||
and determine which fields to index.
|
||
|
||
Also, remember that `if` statement in the script?
|
||
|
||
[source,painless]
|
||
----
|
||
if (clientip != null) emit(clientip);
|
||
----
|
||
|
||
If the script didn't include this condition, the query would fail on any shard
|
||
that doesn't match the pattern. By including this condition, the query skips
|
||
data that doesn't match the grok pattern.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 1,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "FdLqu3cBhqheMnFKd0gK",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"timestamp" : "2020-04-30T14:30:17-05:00",
|
||
"message" : "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
|
||
},
|
||
"fields" : {
|
||
"http.clientip" : [
|
||
"40.135.0.0"
|
||
],
|
||
"message" : [
|
||
"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
|
||
],
|
||
"timestamp" : [
|
||
"2020-04-30T19:30:17.000Z"
|
||
]
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "FdLqu3cBhqheMnFKd0gK"/"_id": $body.hits.hits.0._id/]
|
||
|
||
[[runtime-examples-grok-range]]
|
||
===== Search for documents in a specific range
|
||
You can also run a <<query-dsl-range-query,range query>> that operates on the
|
||
`timestamp` field. The following query returns any documents where the
|
||
`timestamp` is greater than or equal to `2020-04-30T14:31:27-05:00`:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"query": {
|
||
"range": {
|
||
"timestamp": {
|
||
"gte": "2020-04-30T14:31:27-05:00"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
The response includes the document where the log format doesn't match, but the
|
||
timestamp falls within the defined range.
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 2,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "hdEhyncBRSB6iD-PoBqe",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"timestamp" : "2020-04-30T14:31:27-05:00",
|
||
"message" : "252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
|
||
}
|
||
},
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "htEhyncBRSB6iD-PoBqe",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"timestamp" : "2020-04-30T14:31:28-05:00",
|
||
"message" : "not a valid apache log"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "hdEhyncBRSB6iD-PoBqe"/"_id": $body.hits.hits.0._id/]
|
||
// TESTRESPONSE[s/"_id" : "htEhyncBRSB6iD-PoBqe"/"_id": $body.hits.hits.1._id/]
|
||
|
||
[[runtime-examples-dissect]]
|
||
==== Define a runtime field with a dissect pattern
|
||
If you don't need the power of regular expressions, you can use
|
||
<<dissect-processor,dissect patterns>> instead of grok patterns. Dissect
|
||
patterns match on fixed delimiters but are typically faster that grok.
|
||
|
||
You can use dissect to achieve the same results as parsing the Apache logs with
|
||
a <<runtime-examples-grok,grok pattern>>. Instead of matching on a log
|
||
pattern, you include the parts of the string that you want to discard. Paying
|
||
special attention to the parts of the string you want to discard will help build
|
||
successful dissect patterns.
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/_mappings
|
||
{
|
||
"runtime": {
|
||
"http.client.ip": {
|
||
"type": "ip",
|
||
"script": """
|
||
String clientip=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{status} %{size}').extract(doc["message"].value)?.clientip;
|
||
if (clientip != null) emit(clientip);
|
||
"""
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
Similarly, you can define a dissect pattern to extract the https://developer.mozilla.org/en-US/docs/Web/HTTP/Status[HTTP response code]:
|
||
|
||
[source,console]
|
||
----
|
||
PUT my-index/_mappings
|
||
{
|
||
"runtime": {
|
||
"http.response": {
|
||
"type": "long",
|
||
"script": """
|
||
String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
|
||
if (response != null) emit(Integer.parseInt(response));
|
||
"""
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
You can then run a query to retrieve a specific HTTP response using the
|
||
`http.response` runtime field:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"http.response": "304"
|
||
}
|
||
},
|
||
"fields" : ["*"]
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
The response includes a single document where the HTTP response is `304`:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
...
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 1,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index",
|
||
"_id" : "A2qDy3cBWRMvVAuI7F8M",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"timestamp" : "2020-04-30T14:31:22-05:00",
|
||
"message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
|
||
},
|
||
"fields" : {
|
||
"http.clientip" : [
|
||
"247.37.0.0"
|
||
],
|
||
"http.response" : [
|
||
304
|
||
],
|
||
"message" : [
|
||
"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
|
||
],
|
||
"http.client.ip" : [
|
||
"247.37.0.0"
|
||
],
|
||
"timestamp" : [
|
||
"2020-04-30T19:31:22.000Z"
|
||
]
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/\.\.\./"took" : $body.took,"timed_out" : $body.timed_out,"_shards" : $body._shards,/]
|
||
// TESTRESPONSE[s/"_id" : "A2qDy3cBWRMvVAuI7F8M"/"_id": $body.hits.hits.0._id/]
|