mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
[docs] Prepare for docs-assembler (#125118)
* reorg files for docs-assembler and create toc.yml files * fix build error, add redirects * only toc * move images
This commit is contained in:
parent
52bc96240c
commit
9bcd59596d
396 changed files with 1905 additions and 2214 deletions
|
@ -5,7 +5,7 @@ mapped_pages:
|
|||
|
||||
# ICU folding token filter [analysis-icu-folding]
|
||||
|
||||
Case folding of Unicode characters based on `UTR#30`, like the [ASCII-folding token filter](/reference/data-analysis/text-analysis/analysis-asciifolding-tokenfilter.md) on steroids. It registers itself as the `icu_folding` token filter and is available to all indices:
|
||||
Case folding of Unicode characters based on `UTR#30`, like the [ASCII-folding token filter](/reference/text-analysis/analysis-asciifolding-tokenfilter.md) on steroids. It registers itself as the `icu_folding` token filter and is available to all indices:
|
||||
|
||||
```console
|
||||
PUT icu_sample
|
||||
|
|
|
@ -5,7 +5,7 @@ mapped_pages:
|
|||
|
||||
# ICU tokenizer [analysis-icu-tokenizer]
|
||||
|
||||
Tokenizes text into words on word boundaries, as defined in [UAX #29: Unicode Text Segmentation](https://www.unicode.org/reports/tr29/). It behaves much like the [`standard` tokenizer](/reference/data-analysis/text-analysis/analysis-standard-tokenizer.md), but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables.
|
||||
Tokenizes text into words on word boundaries, as defined in [UAX #29: Unicode Text Segmentation](https://www.unicode.org/reports/tr29/). It behaves much like the [`standard` tokenizer](/reference/text-analysis/analysis-standard-tokenizer.md), but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables.
|
||||
|
||||
```console
|
||||
PUT icu_sample
|
||||
|
|
|
@ -13,7 +13,7 @@ The `kuromoji` analyzer uses the following analysis chain:
|
|||
* [`kuromoji_part_of_speech`](/reference/elasticsearch-plugins/analysis-kuromoji-speech.md) token filter
|
||||
* [`ja_stop`](/reference/elasticsearch-plugins/analysis-kuromoji-stop.md) token filter
|
||||
* [`kuromoji_stemmer`](/reference/elasticsearch-plugins/analysis-kuromoji-stemmer.md) token filter
|
||||
* [`lowercase`](/reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) token filter
|
||||
* [`lowercase`](/reference/text-analysis/analysis-lowercase-tokenfilter.md) token filter
|
||||
|
||||
It supports the `mode` and `user_dictionary` settings from [`kuromoji_tokenizer`](/reference/elasticsearch-plugins/analysis-kuromoji-tokenizer.md).
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ mapped_pages:
|
|||
|
||||
# ja_stop token filter [analysis-kuromoji-stop]
|
||||
|
||||
The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_japanese_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_japanese_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
|
||||
```console
|
||||
PUT kuromoji_sample
|
||||
|
|
|
@ -10,7 +10,7 @@ The `nori` analyzer consists of the following tokenizer and token filters:
|
|||
* [`nori_tokenizer`](/reference/elasticsearch-plugins/analysis-nori-tokenizer.md)
|
||||
* [`nori_part_of_speech`](/reference/elasticsearch-plugins/analysis-nori-speech.md) token filter
|
||||
* [`nori_readingform`](/reference/elasticsearch-plugins/analysis-nori-readingform.md) token filter
|
||||
* [`lowercase`](/reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) token filter
|
||||
* [`lowercase`](/reference/text-analysis/analysis-lowercase-tokenfilter.md) token filter
|
||||
|
||||
It supports the `decompound_mode` and `user_dictionary` settings from [`nori_tokenizer`](/reference/elasticsearch-plugins/analysis-nori-tokenizer.md) and the `stoptags` setting from [`nori_part_of_speech`](/reference/elasticsearch-plugins/analysis-nori-speech.md).
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ mapped_pages:
|
|||
|
||||
# polish_stop token filter [analysis-polish-stop]
|
||||
|
||||
The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_polish_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_polish_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
|
||||
```console
|
||||
PUT /polish_stop_example
|
||||
|
|
|
@ -5,7 +5,7 @@ mapped_pages:
|
|||
|
||||
# smartcn_stop token filter [analysis-smartcn_stop]
|
||||
|
||||
The `smartcn_stop` token filter filters out stopwords defined by `smartcn` analyzer (`_smartcn_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_smartcn_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
The `smartcn_stop` token filter filters out stopwords defined by `smartcn` analyzer (`_smartcn_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_smartcn_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
|
||||
|
||||
```console
|
||||
PUT smartcn_example
|
||||
|
|
|
@ -84,7 +84,7 @@ Bundles
|
|||
|
||||
The dictionary `synonyms.txt` can be used as `synonyms.txt` or using the full path `/app/config/synonyms.txt` in the `synonyms_path` of the `synonym-filter`.
|
||||
|
||||
To learn more about analyzing with synonyms, check [Synonym token filter](/reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md) and [Formatting Synonyms](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/synonym-formats.html).
|
||||
To learn more about analyzing with synonyms, check [Synonym token filter](/reference/text-analysis/analysis-synonym-tokenfilter.md) and [Formatting Synonyms](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/synonym-formats.html).
|
||||
|
||||
**GeoIP database bundle**
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue