mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 02:13:33 -04:00
Here we introduce a new index-level setting, `ignore_above`, similar to what we have for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened` fields. Each field mapping will still be allowed to override the index-level setting using a mapping-level `ignore_above` value.
89 lines
3.1 KiB
Text
89 lines
3.1 KiB
Text
[[ignore-above]]
|
||
=== `ignore_above`
|
||
|
||
Strings longer than the `ignore_above` setting will not be indexed or stored.
|
||
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.
|
||
|
||
NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT my-index-000001
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"message": {
|
||
"type": "keyword",
|
||
"ignore_above": 20 <1>
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
PUT my-index-000001/_doc/1 <2>
|
||
{
|
||
"message": "Syntax error"
|
||
}
|
||
|
||
PUT my-index-000001/_doc/2 <3>
|
||
{
|
||
"message": "Syntax error with some long stacktrace"
|
||
}
|
||
|
||
GET my-index-000001/_search <4>
|
||
{
|
||
"aggs": {
|
||
"messages": {
|
||
"terms": {
|
||
"field": "message"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
<1> This field will ignore any string longer than 20 characters.
|
||
<2> This document is indexed successfully.
|
||
<3> This document will be indexed, but without indexing the `message` field.
|
||
<4> Search returns both documents, but only the first is present in the terms aggregation.
|
||
|
||
TIP: The `ignore_above` setting can be updated on
|
||
existing fields using the <<indices-put-mapping,update mapping API>>.
|
||
|
||
This option is also useful for protecting against Lucene's term byte-length
|
||
limit of `32766`.
|
||
|
||
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
|
||
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
|
||
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
|
||
4 bytes.
|
||
|
||
[[index-mapping-ignore-above]]
|
||
=== `index.mapping.ignore_above`
|
||
|
||
The `ignore_above` setting, typically used at the field level, can also be applied at the index level using
|
||
`index.mapping.ignore_above`. This setting lets you define a maximum string length for all applicable fields across
|
||
the index, including `keyword`, `wildcard`, and keyword values in `flattened` fields. Any values that exceed this
|
||
limit will be ignored during indexing and won’t be stored.
|
||
|
||
This index-wide setting ensures a consistent approach to managing excessively long values. It works the same as the
|
||
field-level setting—if a string’s length goes over the specified limit, that string won’t be indexed or stored.
|
||
When dealing with arrays, each element is evaluated separately, and only the elements that exceed the limit are ignored.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT my-index-000001
|
||
{
|
||
"settings": {
|
||
"index.mapping.ignore_above": 256
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
In this example, all applicable fields in `my-index-000001` will ignore any strings longer than 256 characters.
|
||
|
||
TIP: You can override this index-wide setting for specific fields by specifying a custom `ignore_above` value in the
|
||
field mapping.
|
||
|
||
NOTE: Just like the field-level `ignore_above`, this setting only affects indexing and storage. The original values
|
||
are still available in the `_source` field if `_source` is enabled, which is the default behavior in Elasticsearch.
|