elasticsearch/docs/reference/ingest/processors/json.asciidoc
Keith Massey f327352601
Making JsonProcessor stricter so that it does not silently drop data (#93179)
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:

```
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "",
    "processors" : [
      {
        "json" : {
          "field" : "message",
          "strict_json_parsing": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "123 \"foo\""
      }
    }
  ]
}'
```

Closes #92898
2023-01-24 18:43:35 -05:00

95 lines
3.4 KiB
Text

[[json-processor]]
=== JSON processor
++++
<titleabbrev>JSON</titleabbrev>
++++
Converts a JSON string into a structured JSON object.
[[json-options]]
.Json Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to be parsed.
| `target_field` | no | `field` | The field that the converted structured object will be written into. Any existing content in this field will be overwritten.
| `add_to_root` | no | false | Flag that forces the parsed JSON to be added at the top level of the document. `target_field` must not be set when this option is chosen.
| `add_to_root_conflict_strategy` | no | `replace` | When set to `replace`, root fields that conflict with fields from the parsed JSON will be overridden. When set to `merge`, conflicting fields will be merged. Only applicable if `add_to_root` is set to `true`.
| `allow_duplicate_keys` | no | false | When set to `true`, the JSON parser will not fail if the JSON contains duplicate keys. Instead, the last encountered value for any duplicate key wins.
| `strict_json_parsing` | no | true | When set to `true`, the JSON parser will strictly parse the field value. When set to `false`, the JSON parser will be more lenient but also more likely to drop parts of the field value. For example if `strict_json_parsing` is set to `true` and the field value is `123 "foo"` then the processor will throw an IllegalArgumentException. But if `strict_json_parsing` is set to `false` then the field value will be parsed as `123`.
include::common-options.asciidoc[]
|======
All JSON-supported types will be parsed (null, boolean, number, array, object, string).
Suppose you provide this configuration of the `json` processor:
[source,js]
--------------------------------------------------
{
"json" : {
"field" : "string_source",
"target_field" : "json_target"
}
}
--------------------------------------------------
// NOTCONSOLE
If the following document is processed:
[source,js]
--------------------------------------------------
{
"string_source": "{\"foo\": 2000}"
}
--------------------------------------------------
// NOTCONSOLE
after the `json` processor operates on it, it will look like:
[source,js]
--------------------------------------------------
{
"string_source": "{\"foo\": 2000}",
"json_target": {
"foo": 2000
}
}
--------------------------------------------------
// NOTCONSOLE
If the following configuration is provided, omitting the optional `target_field` setting:
[source,js]
--------------------------------------------------
{
"json" : {
"field" : "source_and_target"
}
}
--------------------------------------------------
// NOTCONSOLE
then after the `json` processor operates on this document:
[source,js]
--------------------------------------------------
{
"source_and_target": "{\"foo\": 2000}"
}
--------------------------------------------------
// NOTCONSOLE
it will look like:
[source,js]
--------------------------------------------------
{
"source_and_target": {
"foo": 2000
}
}
--------------------------------------------------
// NOTCONSOLE
This illustrates that, unless it is explicitly named in the processor configuration, the `target_field`
is the same field provided in the required `field` configuration.