mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-25 07:37:19 -04:00
This PR makes JsonProcessor's JSON parsing a little bit stricter so that we are not silently dropping data when given bad inputs. Previously if the input string began with something that could be parsed as a valid json field, then the processor would grab that and ignore the rest. For example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now by default it will throw an IllegalArgumentException on a string like this. A user can now set the `strict_json_parsing` parameter to false to get the old behavior. For example: ``` POST _ingest/pipeline/_simulate { "pipeline": { "description": "", "processors" : [ { "json" : { "field" : "message", "strict_json_parsing": false } } ] }, "docs": [ { "_source": { "message": "123 \"foo\"" } } ] }' ``` Closes #92898
95 lines
3.4 KiB
Text
95 lines
3.4 KiB
Text
[[json-processor]]
|
|
=== JSON processor
|
|
++++
|
|
<titleabbrev>JSON</titleabbrev>
|
|
++++
|
|
|
|
Converts a JSON string into a structured JSON object.
|
|
|
|
[[json-options]]
|
|
.Json Options
|
|
[options="header"]
|
|
|======
|
|
| Name | Required | Default | Description
|
|
| `field` | yes | - | The field to be parsed.
|
|
| `target_field` | no | `field` | The field that the converted structured object will be written into. Any existing content in this field will be overwritten.
|
|
| `add_to_root` | no | false | Flag that forces the parsed JSON to be added at the top level of the document. `target_field` must not be set when this option is chosen.
|
|
| `add_to_root_conflict_strategy` | no | `replace` | When set to `replace`, root fields that conflict with fields from the parsed JSON will be overridden. When set to `merge`, conflicting fields will be merged. Only applicable if `add_to_root` is set to `true`.
|
|
| `allow_duplicate_keys` | no | false | When set to `true`, the JSON parser will not fail if the JSON contains duplicate keys. Instead, the last encountered value for any duplicate key wins.
|
|
| `strict_json_parsing` | no | true | When set to `true`, the JSON parser will strictly parse the field value. When set to `false`, the JSON parser will be more lenient but also more likely to drop parts of the field value. For example if `strict_json_parsing` is set to `true` and the field value is `123 "foo"` then the processor will throw an IllegalArgumentException. But if `strict_json_parsing` is set to `false` then the field value will be parsed as `123`.
|
|
include::common-options.asciidoc[]
|
|
|======
|
|
|
|
All JSON-supported types will be parsed (null, boolean, number, array, object, string).
|
|
|
|
Suppose you provide this configuration of the `json` processor:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"json" : {
|
|
"field" : "string_source",
|
|
"target_field" : "json_target"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
If the following document is processed:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"string_source": "{\"foo\": 2000}"
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
after the `json` processor operates on it, it will look like:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"string_source": "{\"foo\": 2000}",
|
|
"json_target": {
|
|
"foo": 2000
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
If the following configuration is provided, omitting the optional `target_field` setting:
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"json" : {
|
|
"field" : "source_and_target"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
then after the `json` processor operates on this document:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"source_and_target": "{\"foo\": 2000}"
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
it will look like:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"source_and_target": {
|
|
"foo": 2000
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
This illustrates that, unless it is explicitly named in the processor configuration, the `target_field`
|
|
is the same field provided in the required `field` configuration.
|