elasticsearch/docs/reference/mapping/params/ignore-above.asciidoc
Salvatore Campagna 208a1fe571
Introduce an ignore_above index-level setting (#113121)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
2024-09-23 18:05:02 +02:00

89 lines
3.1 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[ignore-above]]
=== `ignore_above`
Strings longer than the `ignore_above` setting will not be indexed or stored.
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.
NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
[source,console]
--------------------------------------------------
PUT my-index-000001
{
"mappings": {
"properties": {
"message": {
"type": "keyword",
"ignore_above": 20 <1>
}
}
}
}
PUT my-index-000001/_doc/1 <2>
{
"message": "Syntax error"
}
PUT my-index-000001/_doc/2 <3>
{
"message": "Syntax error with some long stacktrace"
}
GET my-index-000001/_search <4>
{
"aggs": {
"messages": {
"terms": {
"field": "message"
}
}
}
}
--------------------------------------------------
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.
TIP: The `ignore_above` setting can be updated on
existing fields using the <<indices-put-mapping,update mapping API>>.
This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.
[[index-mapping-ignore-above]]
=== `index.mapping.ignore_above`
The `ignore_above` setting, typically used at the field level, can also be applied at the index level using
`index.mapping.ignore_above`. This setting lets you define a maximum string length for all applicable fields across
the index, including `keyword`, `wildcard`, and keyword values in `flattened` fields. Any values that exceed this
limit will be ignored during indexing and wont be stored.
This index-wide setting ensures a consistent approach to managing excessively long values. It works the same as the
field-level setting—if a strings length goes over the specified limit, that string wont be indexed or stored.
When dealing with arrays, each element is evaluated separately, and only the elements that exceed the limit are ignored.
[source,console]
--------------------------------------------------
PUT my-index-000001
{
"settings": {
"index.mapping.ignore_above": 256
}
}
--------------------------------------------------
In this example, all applicable fields in `my-index-000001` will ignore any strings longer than 256 characters.
TIP: You can override this index-wide setting for specific fields by specifying a custom `ignore_above` value in the
field mapping.
NOTE: Just like the field-level `ignore_above`, this setting only affects indexing and storage. The original values
are still available in the `_source` field if `_source` is enabled, which is the default behavior in Elasticsearch.