mirror of
https://github.com/elastic/logstash.git
synced 2025-04-24 14:47:19 -04:00
* add failing tests for Event.new with field that look like field references * fix: correctly handle FieldReference-special characters in field names. Keys passed to most methods of `ConvertedMap`, based on `IdentityHashMap` depend on identity and not equivalence, and therefore rely on the keys being _interned_ strings. In order to avoid hitting the JVM's global String intern pool (which can have performance problems), operations to normalize a string to its interned counterpart have traditionally relied on the behaviour of `FieldReference#from` returning a likely-cached `FieldReference`, that had an interned `key` and an empty `path`. This is problematic on two points. First, when `ConvertedMap` was given data with keys that _were_ valid string field references representing a nested field (such as `[host][geo][location]`), the implementation of `ConvertedMap#put` effectively silently discarded the path components because it assumed them to be empty, and only the key was kept (`location`). Second, when `ConvertedMap` was given a map whose keys contained what the field reference parser considered special characters but _were NOT_ valid field references, the resulting `FieldReference.IllegalSyntaxException` caused the operation to abort. Instead of using the `FieldReference` cache, which sits on top of objects whose `key` and `path`-components are known to have been interned, we introduce an internment helper on our `ConvertedMap` that is also backed by the global string intern pool, and ensure that our field references are primed through this pool. In addition to fixing the `ConvertedMap#newFromMap` functionality, this has three net effects: - Our ConvertedMap operations still use strings from the global intern pool - We have a new, smaller cache of individual field names, improving lookup performance - Our FieldReference cache no longer is flooded with fragments and therefore is more likely to remain performant NOTE: this does NOT create isolated intern pools, as doing so would require a careful audit of the possible code-paths to `ConvertedMap#putInterned`. The new cache is limited to 10k strings, and when more are used only the FIRST 10k strings will be primed into the cache, leaving the remainder to always hit the global String intern pool. NOTE: by fixing this bug, we alow events to be created whose fields _CANNOT_ be referenced with the existing FieldReference implementation. Resolves: https://github.com/elastic/logstash/issues/13606 Resolves: https://github.com/elastic/logstash/issues/11608 * field_reference: support escape sequences Adds a `config.field_reference.escape_style` option and a companion command-line flag `--field-reference-escape-style` allowing a user to opt into one of two proposed escape-sequence implementations for field reference parsing: - `PERCENT`: URI-style `%`+`HH` hexadecimal encoding of UTF-8 bytes - `AMPERSAND`: HTML-style `&#`+`DD`+`;` encoding of decimal Unicode code-points The default is `NONE`, which does _not_ proccess escape sequences. With this setting a user effectively cannot reference a field whose name contains FieldReference-reserved characters. | ESCAPE STYLE | `[` | `]` | | ------------ | ------- | ------- | | `NONE` | _N/A_ | _N/A_ | | `PERCENT` | `%5B` | `%5D` | | `AMPERSAND` | `[` | `]` | * fixup: no need to double-escape HTML-ish escape sequences in docs * Apply suggestions from code review Co-authored-by: Karol Bucek <kares@users.noreply.github.com> * field-reference: load escape style in runner * docs: sentences over semiciolons * field-reference: faster shortcut for PERCENT escape mode * field-reference: escape mode control downcase * field_reference: more s/experimental/technical preview/ * field_reference: still more s/experimental/technical preview/ Co-authored-by: Karol Bucek <kares@users.noreply.github.com>
147 lines
5.7 KiB
Text
147 lines
5.7 KiB
Text
[role="exclude",id="field-references-deepdive"]
|
|
== Field References Deep Dive
|
|
|
|
It is often useful to be able to refer to a field or collection of fields by name. To do this,
|
|
you can use the Logstash field reference syntax.
|
|
|
|
The syntax to access a field specifies the entire path to the field, with each fragment wrapped in square brackets.
|
|
When a field name contains square brackets, they must be properly <<formal-grammar-escape-sequences, _escaped_>>.
|
|
|
|
_Field References_ can be expressed literally within <<conditionals,_Conditional_>> statements in your pipeline configurations,
|
|
as string arguments to your pipeline plugins, or within sprintf statements that will be used by your pipeline plugins:
|
|
|
|
[source,pipelineconf]
|
|
filter {
|
|
# +----literal----+ +----literal----+
|
|
# | | | |
|
|
if [@metadata][date] and [@metadata][time] {
|
|
mutate {
|
|
add_field {
|
|
"[@metadata][timestamp]" => "%{[@metadata][date]} %{[@metadata][time]}"
|
|
# | | | | | | | |
|
|
# +----string-argument---+ | +--field-ref----+ +--field-ref----+ |
|
|
# +-------- sprintf format string ----------+
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
[float]
|
|
[[formal-grammar]]
|
|
=== Formal Grammar
|
|
|
|
Below is the formal grammar of the Field Reference, with notes and examples.
|
|
|
|
[float]
|
|
[[formal-grammar-field-reference-literal]]
|
|
==== Field Reference Literal
|
|
|
|
A _Field Reference Literal_ is a sequence of one or more _Path Fragments_ that can be used directly in Logstash pipeline <<conditionals,conditionals>> without any additional quoting (e.g. `[request]`, `[response][status]`).
|
|
|
|
[source,antlr]
|
|
fieldReferenceLiteral
|
|
: ( pathFragment )+
|
|
;
|
|
|
|
NOTE: In Logstash 7.x and earlier, a quoted value (such as `["foo"]`) is
|
|
considered a field reference and isn't treated as a single element array. This
|
|
behavior might cause confusion in conditionals, such as `[message] in ["foo",
|
|
"bar"]` compared to `[message] in ["foo"]`. We discourage using names with
|
|
quotes, such as `"\"foo\""`, as this behavior might change in the future.
|
|
|
|
[float]
|
|
[[formal-grammar-field-reference]]
|
|
==== Field Reference (Event APIs)
|
|
|
|
The Event API's methods for manipulating the fields of an event or using the sprintf syntax are more flexible than the pipeline grammar in what they accept as a Field Reference.
|
|
Top-level fields can be referenced directly by their _Field Name_ without the square brackets, and there is some support for _Composite Field References_, simplifying use of programmatically-generated Field References.
|
|
|
|
A _Field Reference_ for use with the Event API is therefore one of:
|
|
|
|
- a single _Field Reference Literal_; OR
|
|
- a single _Field Name_ (referencing a top-level field); OR
|
|
- a single _Composite Field Reference_.
|
|
|
|
[source,antlr]
|
|
eventApiFieldReference
|
|
: fieldReferenceLiteral
|
|
| fieldName
|
|
| compositeFieldReference
|
|
;
|
|
|
|
[float]
|
|
[[formal-grammar-path-fragment]]
|
|
==== Path Fragment
|
|
|
|
A _Path Fragment_ is a _Field Name_ wrapped in square brackets (e.g., `[request]`).
|
|
|
|
[source,antlr]
|
|
pathFragment
|
|
: '[' fieldName ']'
|
|
;
|
|
|
|
[float]
|
|
[[formal-grammar-field-name]]
|
|
==== Field Name
|
|
|
|
A _Field Name_ is a sequence of characters that are _not_ square brackets (`[` or `]`).
|
|
|
|
[source,antlr]
|
|
fieldName
|
|
: ( ~( '[' | ']' ) )+
|
|
;
|
|
|
|
[float]
|
|
[[formal-grammar-event-api-composite-field-reference]]
|
|
==== Composite Field Reference
|
|
|
|
In some cases, it may be necessary to programmatically _compose_ a Field Reference from one or more Field References,
|
|
such as when manipulating fields in a plugin or while using the Ruby Filter plugin and the Event API.
|
|
|
|
[source,ruby]
|
|
fieldReference = "[path][to][deep nested field]"
|
|
compositeFieldReference = "[@metadata][#{fieldReference}][size]"
|
|
# => "[@metadata][[path][to][deep nested field]][size]"
|
|
|
|
// NOTE: table below uses "plus for passthrough" quoting to prevent double square-brackets
|
|
// from being interpreted as asciidoc anchors when converted to HTML.
|
|
[float]
|
|
===== Canonical Representations of Composite Field References
|
|
|===
|
|
| Acceptable _Composite Field Reference_ | Canonical _Field Reference_ Representation
|
|
|
|
| `+[[deep][nesting]][field]+` | `+[deep][nesting][field]+`
|
|
| `+[foo][[bar]][bingo]+` | `+[foo][bar][bingo]+`
|
|
| `+[[ok]]+` | `+[ok]+`
|
|
|===
|
|
|
|
A _Composite Field Reference_ is a sequence of one or more _Path Fragments_ or _Embedded Field References_.
|
|
|
|
[source,antlr]
|
|
compositeFieldReference
|
|
: ( pathFragment | embeddedFieldReference )+
|
|
;
|
|
|
|
_Composite Field References_ are supported by the Event API, but are _not_ supported as literals in the Pipeline Configuration.
|
|
|
|
[float]
|
|
[[formal-grammar-event-api-embedded-field-reference]]
|
|
==== Embedded Field Reference
|
|
|
|
[source,antlr]
|
|
embeddedFieldReference
|
|
: '[' fieldReference ']'
|
|
;
|
|
|
|
An _Embedded Field Reference_ is a _Field Reference_ that is itself wrapped in square brackets (`[` and `]`), and can be a component of a _Composite Field Reference_.
|
|
|
|
[float]
|
|
[[formal-grammar-escape-sequences]]
|
|
=== Escape Sequences
|
|
|
|
For {ls} to reference a field whose name contains a character that has special meaning in the field reference grammar, the character must be escaped.
|
|
Logstash can be globally configured to use one of two field reference escape modes:
|
|
|
|
- `none` (default): no escape sequence processing is done. Fields containing literal square brackets cannot be referenced by the Event API.
|
|
- `percent`: URI-style percent encoding of UTF-8 bytes. The left square bracket (`[`) is expressed as `%5B`, and the right square bracket (`]`) is expressed as `%5D`.
|
|
- `ampersand`: HTML-style ampersand encoding (`&#` + decimal unicode codepoint + `;`). The left square bracket (`[`) is expressed as `[`, and the right square bracket (`]`) is expressed as `]`.
|