mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
* reorg files for docs-assembler and create toc.yml files * fix build error, add redirects * only toc * move images
1.6 KiB
1.6 KiB
navigation_title | mapped_pages | |
---|---|---|
Keyword |
|
Keyword tokenizer [analysis-keyword-tokenizer]
The keyword
tokenizer is a noop tokenizer that accepts whatever text it is given and outputs the exact same text as a single term. It can be combined with token filters to normalise output, e.g. lower-casing email addresses.
Example output [_example_output_10]
POST _analyze
{
"tokenizer": "keyword",
"text": "New York"
}
The above sentence would produce the following term:
[ New York ]
Combine with token filters [analysis-keyword-tokenizer-token-filters]
You can combine the keyword
tokenizer with token filters to normalise structured data, such as product IDs or email addresses.
For example, the following analyze API request uses the keyword
tokenizer and lowercase
filter to convert an email address to lowercase.
POST _analyze
{
"tokenizer": "keyword",
"filter": [ "lowercase" ],
"text": "john.SMITH@example.COM"
}
The request produces the following token:
[ john.smith@example.com ]
Configuration [_configuration_11]
The keyword
tokenizer accepts the following parameters:
buffer_size
- The number of characters read into the term buffer in a single pass. Defaults to
256
. The term buffer will grow by this size until all the text has been consumed. It is advisable not to change this setting.