elasticsearch/docs/reference/analysis/tokenizers
Christoph Büscher ed86750fa4
Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:36:39 +01:00
..
chargroup-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
classic-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
edgengram-tokenizer.asciidoc Allow custom characters in token_chars of ngram tokenizers (#49250) 2019-11-20 10:36:39 +01:00
keyword-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
letter-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
lowercase-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
ngram-tokenizer.asciidoc Allow custom characters in token_chars of ngram tokenizers (#49250) 2019-11-20 10:36:39 +01:00
pathhierarchy-tokenizer-examples.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
pathhierarchy-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
pattern-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
simplepattern-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
simplepatternsplit-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
standard-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
thai-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
uaxurlemail-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
whitespace-tokenizer.asciidoc [DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00