elasticsearch/docs/reference/ingest/processors/user-agent.asciidoc
edh-oss 62a471aefe
Update JSON parser and snippets (#77983)
Related to issue  #77823

This does the following:

- Updates several asciidoc files that contained code snippets with
  invalid JSON, most involving unnecessary trailing commas.

- Makes the switch from the Groovy JSON parser to the Jackson parser,
  pursuant to the general goal of eliminating Groovy dependence.

- Makes testing of JSON validity at build time more strict.

Note that this update still allows backslash escaping for any
character. Currently that matters because of the file
"docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc",
specifically this part:

    "attributes" : {
      "ml.machine_memory" :
        "$body.datafeeds.0.node.attributes.ml\.machine_memory",
      "ml.max_open_jobs" : "512"
    }

It's not clear to me what change, if any, is appropriate there. So,
I've left in the escaped period and configured the parser to ignore
it for the time being.
2021-09-20 11:08:26 +01:00

100 lines
4.7 KiB
Text

[[user-agent-processor]]
=== User agent processor
++++
<titleabbrev>User agent</titleabbrev>
++++
The `user_agent` processor extracts details from the user agent string a browser sends with its web requests.
This processor adds this information by default under the `user_agent` field.
The ingest-user-agent module ships by default with the regexes.yaml made available by uap-java with an Apache 2.0 license. For more details see https://github.com/ua-parser/uap-core.
[[using-ingest-user-agent]]
==== Using the user_agent Processor in a Pipeline
[[ingest-user-agent-options]]
.User-agent options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field containing the user agent string.
| `target_field` | no | user_agent | The field that will be filled with the user agent details.
| `regex_file` | no | - | The name of the file in the `config/ingest-user-agent` directory containing the regular expressions for parsing the user agent string. Both the directory and the file have to be created before starting Elasticsearch. If not specified, ingest-user-agent will use the regexes.yaml from uap-core it ships with (see below).
| `properties` | no | [`name`, `major`, `minor`, `patch`, `build`, `os`, `os_name`, `os_major`, `os_minor`, `device`] | Controls what properties are added to `target_field`.
| `extract_device_type` | no | `false` | beta:[] Extracts device type from the user agent string on a best-effort basis.
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
|======
Here is an example that adds the user agent details to the `user_agent` field based on the `agent` field:
[source,console]
--------------------------------------------------
PUT _ingest/pipeline/user_agent
{
"description" : "Add user agent information",
"processors" : [
{
"user_agent" : {
"field" : "agent"
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=user_agent
{
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}
GET my-index-000001/_doc/my_id
--------------------------------------------------
Which returns
[source,console-result]
--------------------------------------------------
{
"found": true,
"_index": "my-index-000001",
"_id": "my_id",
"_version": 1,
"_seq_no": 22,
"_primary_term": 1,
"_source": {
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"user_agent": {
"name": "Chrome",
"original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"version": "51.0.2704.103",
"os": {
"name": "Mac OS X",
"version": "10.10.5",
"full": "Mac OS X 10.10.5"
},
"device" : {
"name" : "Mac"
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/]
===== Using a custom regex file
To use a custom regex file for parsing the user agents, that file has to be put into the `config/ingest-user-agent` directory and
has to have a `.yml` filename extension. The file has to be present at node startup, any changes to it or any new files added
while the node is running will not have any effect.
In practice, it will make most sense for any custom regex file to be a variant of the default file, either a more recent version
or a customised version.
The default file included in `ingest-user-agent` is the `regexes.yaml` from uap-core: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
[[ingest-user-agent-settings]]
===== Node Settings
The `user_agent` processor supports the following setting:
`ingest.user_agent.cache_size`::
The maximum number of results that should be cached. Defaults to `1000`.
Note that these settings are node settings and apply to all `user_agent` processors, i.e. there is one cache for all defined `user_agent` processors.