[docs] Prepare for docs-assembler (#125118)

* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images
This commit is contained in:
Colleen McGinnis 2025-03-20 12:09:12 -05:00 committed by GitHub
parent 52bc96240c
commit 9bcd59596d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
396 changed files with 1905 additions and 2214 deletions

View file

@ -5,7 +5,7 @@ mapped_pages:
# ICU folding token filter [analysis-icu-folding]
Case folding of Unicode characters based on `UTR#30`, like the [ASCII-folding token filter](/reference/data-analysis/text-analysis/analysis-asciifolding-tokenfilter.md) on steroids. It registers itself as the `icu_folding` token filter and is available to all indices:
Case folding of Unicode characters based on `UTR#30`, like the [ASCII-folding token filter](/reference/text-analysis/analysis-asciifolding-tokenfilter.md) on steroids. It registers itself as the `icu_folding` token filter and is available to all indices:
```console
PUT icu_sample

View file

@ -5,7 +5,7 @@ mapped_pages:
# ICU tokenizer [analysis-icu-tokenizer]
Tokenizes text into words on word boundaries, as defined in [UAX #29: Unicode Text Segmentation](https://www.unicode.org/reports/tr29/). It behaves much like the [`standard` tokenizer](/reference/data-analysis/text-analysis/analysis-standard-tokenizer.md), but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables.
Tokenizes text into words on word boundaries, as defined in [UAX #29: Unicode Text Segmentation](https://www.unicode.org/reports/tr29/). It behaves much like the [`standard` tokenizer](/reference/text-analysis/analysis-standard-tokenizer.md), but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables.
```console
PUT icu_sample

View file

@ -13,7 +13,7 @@ The `kuromoji` analyzer uses the following analysis chain:
* [`kuromoji_part_of_speech`](/reference/elasticsearch-plugins/analysis-kuromoji-speech.md) token filter
* [`ja_stop`](/reference/elasticsearch-plugins/analysis-kuromoji-stop.md) token filter
* [`kuromoji_stemmer`](/reference/elasticsearch-plugins/analysis-kuromoji-stemmer.md) token filter
* [`lowercase`](/reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) token filter
* [`lowercase`](/reference/text-analysis/analysis-lowercase-tokenfilter.md) token filter
It supports the `mode` and `user_dictionary` settings from [`kuromoji_tokenizer`](/reference/elasticsearch-plugins/analysis-kuromoji-tokenizer.md).

View file

@ -5,7 +5,7 @@ mapped_pages:
# ja_stop token filter [analysis-kuromoji-stop]
The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_japanese_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_japanese_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
```console
PUT kuromoji_sample

View file

@ -10,7 +10,7 @@ The `nori` analyzer consists of the following tokenizer and token filters:
* [`nori_tokenizer`](/reference/elasticsearch-plugins/analysis-nori-tokenizer.md)
* [`nori_part_of_speech`](/reference/elasticsearch-plugins/analysis-nori-speech.md) token filter
* [`nori_readingform`](/reference/elasticsearch-plugins/analysis-nori-readingform.md) token filter
* [`lowercase`](/reference/data-analysis/text-analysis/analysis-lowercase-tokenfilter.md) token filter
* [`lowercase`](/reference/text-analysis/analysis-lowercase-tokenfilter.md) token filter
It supports the `decompound_mode` and `user_dictionary` settings from [`nori_tokenizer`](/reference/elasticsearch-plugins/analysis-nori-tokenizer.md) and the `stoptags` setting from [`nori_part_of_speech`](/reference/elasticsearch-plugins/analysis-nori-speech.md).

View file

@ -5,7 +5,7 @@ mapped_pages:
# polish_stop token filter [analysis-polish-stop]
The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_polish_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_polish_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
```console
PUT /polish_stop_example

View file

@ -5,7 +5,7 @@ mapped_pages:
# smartcn_stop token filter [analysis-smartcn_stop]
The `smartcn_stop` token filter filters out stopwords defined by `smartcn` analyzer (`_smartcn_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_smartcn_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/data-analysis/text-analysis/analysis-stop-tokenfilter.md) instead.
The `smartcn_stop` token filter filters out stopwords defined by `smartcn` analyzer (`_smartcn_`), and any other custom stopwords specified by the user. This filter only supports the predefined `_smartcn_` stopwords list. If you want to use a different predefined list, then use the [`stop` token filter](/reference/text-analysis/analysis-stop-tokenfilter.md) instead.
```console
PUT smartcn_example

View file

@ -84,7 +84,7 @@ Bundles
The dictionary `synonyms.txt` can be used as `synonyms.txt` or using the full path `/app/config/synonyms.txt` in the `synonyms_path` of the `synonym-filter`.
To learn more about analyzing with synonyms, check [Synonym token filter](/reference/data-analysis/text-analysis/analysis-synonym-tokenfilter.md) and [Formatting Synonyms](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/synonym-formats.html).
To learn more about analyzing with synonyms, check [Synonym token filter](/reference/text-analysis/analysis-synonym-tokenfilter.md) and [Formatting Synonyms](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/synonym-formats.html).
**GeoIP database bundle**