mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Colleen McGinnis b7e3a1e14b

[docs] Migrate docs from AsciiDoc to Markdown (#123507 )

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648d.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

2025-02-27 17:56:14 +01:00

1.6 KiB

Raw Blame History

navigation_title

mapped_pages

Keyword

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-tokenizer.html

Keyword tokenizer [analysis-keyword-tokenizer]

The keyword tokenizer is a noop tokenizer that accepts whatever text it is given and outputs the exact same text as a single term. It can be combined with token filters to normalise output, e.g. lower-casing email addresses.

Example output [_example_output_10]

POST _analyze
{
  "tokenizer": "keyword",
  "text": "New York"
}

The above sentence would produce the following term:

[ New York ]

Combine with token filters [analysis-keyword-tokenizer-token-filters]

You can combine the keyword tokenizer with token filters to normalise structured data, such as product IDs or email addresses.

For example, the following analyze API request uses the keyword tokenizer and lowercase filter to convert an email address to lowercase.

POST _analyze
{
  "tokenizer": "keyword",
  "filter": [ "lowercase" ],
  "text": "john.SMITH@example.COM"
}

The request produces the following token:

[ john.smith@example.com ]

Configuration [_configuration_11]

The keyword tokenizer accepts the following parameters:

buffer_size: The number of characters read into the term buffer in a single pass. Defaults to 256. The term buffer will grow by this size until all the text has been consumed. It is advisable not to change this setting.

1.6 KiB Raw Blame History

Keyword tokenizer [analysis-keyword-tokenizer]

Example output [_example_output_10]

Combine with token filters [analysis-keyword-tokenizer-token-filters]

Configuration [_configuration_11]

1.6 KiB

Raw Blame History