mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
[docs] Fix various syntax and rendering errors (#127062)
* fix syntax and rendering errors * clean up * fix versions * more clean up * more fixes * more fixes * more fixes
This commit is contained in:
parent
41eefb62d3
commit
08552f1c2e
95 changed files with 1439 additions and 1195 deletions
|
@ -82,9 +82,11 @@ See [Limitations of the `max_gram` parameter](#analysis-edgengram-tokenfilter-ma
|
|||
: (Optional, Boolean) Emits original token when set to `true`. Defaults to `false`.
|
||||
|
||||
`side`
|
||||
: (Optional, string) [8.16.0]. Indicates whether to truncate tokens from the `front` or `back`. Defaults to `front`.
|
||||
|
||||
: :::{admonition} Deprecated in 8.16.0
|
||||
This setting was deprecated in 8.16.0.
|
||||
:::
|
||||
|
||||
(Optional, string) Indicates whether to truncate tokens from the `front` or `back`. Defaults to `front`.
|
||||
|
||||
## Customize [analysis-edgengram-tokenfilter-customize]
|
||||
|
||||
|
|
|
@ -49,7 +49,7 @@ GET /_analyze
|
|||
|
||||
The API returns the following response. Note that one version of each token has a `keyword` attribute of `true`.
|
||||
|
||||
::::{dropdown} **Response**
|
||||
::::{dropdown} Response
|
||||
```console-result
|
||||
{
|
||||
"detail": {
|
||||
|
@ -155,7 +155,7 @@ The API returns the following response. Note the following changes:
|
|||
* The non-keyword version of `running` was stemmed to `run`.
|
||||
* The non-keyword version of `jumping` was stemmed to `jump`.
|
||||
|
||||
::::{dropdown} **Response**
|
||||
::::{dropdown} Response
|
||||
```console-result
|
||||
{
|
||||
"detail": {
|
||||
|
@ -265,7 +265,7 @@ GET /_analyze
|
|||
|
||||
The API returns the following response. Note that the duplicate tokens for `fox` and `and` have been removed.
|
||||
|
||||
::::{dropdown} **Response**
|
||||
::::{dropdown} Response
|
||||
```console-result
|
||||
{
|
||||
"detail": {
|
||||
|
|
|
@ -70,7 +70,7 @@ PUT length_example
|
|||
: (Optional, integer) Minimum character length of a token. Shorter tokens are excluded from the output. Defaults to `0`.
|
||||
|
||||
`max`
|
||||
: (Optional, integer) Maximum character length of a token. Longer tokens are excluded from the output. Defaults to `Integer.MAX_VALUE`, which is `2`^`31`^`-1` or `2147483647`.
|
||||
: (Optional, integer) Maximum character length of a token. Longer tokens are excluded from the output. Defaults to `Integer.MAX_VALUE`, which is `2^31 - 1` or `2147483647`.
|
||||
|
||||
|
||||
## Customize [analysis-length-tokenfilter-customize]
|
||||
|
|
|
@ -39,7 +39,7 @@ The filter produces the following tokens.
|
|||
|
||||
The API response contains the position and offsets of each output token. Note the `predicate_token_filter` filter does not change the tokens' original positions or offsets.
|
||||
|
||||
::::{dropdown} **Response**
|
||||
::::{dropdown} Response
|
||||
```console-result
|
||||
{
|
||||
"tokens" : [
|
||||
|
|
|
@ -146,7 +146,7 @@ PUT /my-index-000001
|
|||
|
||||
Some token filters, such as the `stop` filter, create empty positions when removing stop words with a position increment greater than one.
|
||||
|
||||
::::{dropdown} **Example**
|
||||
::::{dropdown} Example
|
||||
In the following [analyze API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-analyze) request, the `stop` filter removes the stop word `a` from `fox jumps a lazy dog`, creating an empty position. The subsequent `shingle` filter replaces this empty position with a plus sign (`+`) in shingles.
|
||||
|
||||
```console
|
||||
|
|
|
@ -56,49 +56,96 @@ PUT /my-index-000001
|
|||
|
||||
$$$analysis-stemmer-tokenfilter-language-parm$$$
|
||||
|
||||
`language`: (Optional, string) Language-dependent stemming algorithm used to stem tokens. If both this and the `name` parameter are specified, the `language` parameter argument is used.
|
||||
`language`
|
||||
: (Optional, string) Language-dependent stemming algorithm used to stem tokens. If both this and the `name` parameter are specified, the `language` parameter argument is used.
|
||||
|
||||
:::{dropdown} Valid values for language
|
||||
|
||||
:::{dropdown} Valid values for `language`
|
||||
|
||||
Valid values are sorted by language. Defaults to [**`english`**](https://snowballstem.org/algorithms/porter/stemmer.html). Recommended algorithms are **bolded**.
|
||||
Arabic: [**`arabic`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/ar/ArabicStemmer.md)
|
||||
Armenian: [**`armenian`**](https://snowballstem.org/algorithms/armenian/stemmer.md)
|
||||
Basque: [**`basque`**](https://snowballstem.org/algorithms/basque/stemmer.md)
|
||||
Bengali:[**`bengali`**](https://www.tandfonline.com/doi/abs/10.1080/02564602.1993.11437284)
|
||||
Brazilian Portuguese:[**`brazilian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/br/BrazilianStemmer.md)
|
||||
Bulgarian:[**`bulgarian`**](http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf)
|
||||
Catalan:[**`catalan`**](https://snowballstem.org/algorithms/catalan/stemmer.md)
|
||||
Czech:[**`czech`**](https://dl.acm.org/doi/10.1016/j.ipm.2009.06.001)
|
||||
Danish:[**`danish`**](https://snowballstem.org/algorithms/danish/stemmer.md)
|
||||
Dutch:[**`dutch`**](https://snowballstem.org/algorithms/dutch/stemmer.md), [`dutch_kp`](https://snowballstem.org/algorithms/kraaij_pohlmann/stemmer.md) [8.16.0]
|
||||
English:[**`english`**](https://snowballstem.org/algorithms/porter/stemmer.html), [`light_english`](https://ciir.cs.umass.edu/pubfiles/ir-35.pdf), [`lovins`](https://snowballstem.org/algorithms/lovins/stemmer.md) [8.16.0], [`minimal_english`](https://www.researchgate.net/publication/220433848_How_effective_is_suffixing), [`porter2`](https://snowballstem.org/algorithms/english/stemmer.html), [`possessive_english`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/en/EnglishPossessiveFilter.html)
|
||||
Estonian:[**`estonian`**](https://lucene.apache.org/core/10_0_0/analyzers-common/org/tartarus/snowball/ext/EstonianStemmer.md)
|
||||
Finnish:[**`finnish`**](https://snowballstem.org/algorithms/finnish/stemmer.html), [`light_finnish`](http://clef.isti.cnr.it/2003/WN_web/22.pdf)
|
||||
French:[**`light_french`**](https://dl.acm.org/citation.cfm?id=1141523), [`french`](https://snowballstem.org/algorithms/french/stemmer.html), [`minimal_french`](https://dl.acm.org/citation.cfm?id=318984)
|
||||
Galician:[**`galician`**](http://bvg.udc.es/recursos_lingua/stemming.jsp), [`minimal_galician`](http://bvg.udc.es/recursos_lingua/stemming.jsp) (Plural step only)
|
||||
German:[**`light_german`**](https://dl.acm.org/citation.cfm?id=1141523), [`german`](https://snowballstem.org/algorithms/german/stemmer.html), [`minimal_german`](http://members.unine.ch/jacques.savoy/clef/morpho.pdf)
|
||||
Greek:[**`greek`**](https://sais.se/mthprize/2007/ntais2007.pdf)
|
||||
Hindi:[**`hindi`**](http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf)
|
||||
Hungarian:[**`hungarian`**](https://snowballstem.org/algorithms/hungarian/stemmer.html), [`light_hungarian`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)
|
||||
Indonesian:[**`indonesian`**](http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf)
|
||||
Irish:[**`irish`**](https://snowballstem.org/otherapps/oregan/)
|
||||
Italian:[**`light_italian`**](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf), [`italian`](https://snowballstem.org/algorithms/italian/stemmer.html)
|
||||
Kurdish (Sorani):[**`sorani`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/ckb/SoraniStemmer.md)
|
||||
Latvian:[**`latvian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/lv/LatvianStemmer.md)
|
||||
Lithuanian:[**`lithuanian`**](https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/analysis/common/src/java/org/apache/lucene/analysis/lt/stem_ISO_8859_1.sbl?view=markup)
|
||||
Norwegian (Bokmål):[**`norwegian`**](https://snowballstem.org/algorithms/norwegian/stemmer.html), [**`light_norwegian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianLightStemmer.md), [`minimal_norwegian`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.md)
|
||||
Norwegian:(Nynorsk)[**`light_nynorsk`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianLightStemmer.md), [`minimal_nynorsk`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.md)
|
||||
Persian:[**`persian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/fa/PersianStemmer.md)
|
||||
Portuguese:[**`light_portuguese`**](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181), [`minimal_portuguese`](http://www.inf.ufrgs.br/~buriol/papers/Orengo_CLEF07.pdf), [`portuguese`](https://snowballstem.org/algorithms/portuguese/stemmer.html), [`portuguese_rslp`](https://www.inf.ufrgs.br/\~viviane/rslp/index.htm)
|
||||
Romanian:[**`romanian`**](https://snowballstem.org/algorithms/romanian/stemmer.html)
|
||||
Russian:[**`russian`**](https://snowballstem.org/algorithms/russian/stemmer.html), [`light_russian`](https://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf)
|
||||
Serbian:[**`serbian`**](https://snowballstem.org/algorithms/serbian/stemmer.html)
|
||||
Spanish:[**`light_spanish`**](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf), [`spanish`](https://snowballstem.org/algorithms/spanish/stemmer.html) [`spanish_plural`](https://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n))
|
||||
Swedish:[**`swedish`**](https://snowballstem.org/algorithms/swedish/stemmer.html), [`light_swedish`](http://clef.isti.cnr.it/2003/WN_web/22.pdf)
|
||||
Turkish:[**`turkish`**](https://snowballstem.org/algorithms/turkish/stemmer.html)
|
||||
|
||||
* Arabic: [**`arabic`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/ar/ArabicStemmer.md)
|
||||
* Armenian: [**`armenian`**](https://snowballstem.org/algorithms/armenian/stemmer.md)
|
||||
* Basque: [**`basque`**](https://snowballstem.org/algorithms/basque/stemmer.md)
|
||||
* Bengali:[**`bengali`**](https://www.tandfonline.com/doi/abs/10.1080/02564602.1993.11437284)
|
||||
* Brazilian Portuguese:[**`brazilian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/br/BrazilianStemmer.md)
|
||||
* Bulgarian:[**`bulgarian`**](http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf)
|
||||
* Catalan:[**`catalan`**](https://snowballstem.org/algorithms/catalan/stemmer.md)
|
||||
* Czech:[**`czech`**](https://dl.acm.org/doi/10.1016/j.ipm.2009.06.001)
|
||||
* Danish:[**`danish`**](https://snowballstem.org/algorithms/danish/stemmer.md)
|
||||
* Dutch
|
||||
* [**`dutch`**](https://snowballstem.org/algorithms/dutch/stemmer.md)
|
||||
* [`dutch_kp`](https://snowballstem.org/algorithms/kraaij_pohlmann/stemmer.md)
|
||||
:::{admonition} Deprecated in 8.16.0
|
||||
This language was deprecated in 8.16.0.
|
||||
:::
|
||||
* English:
|
||||
* [**`english`**](https://snowballstem.org/algorithms/porter/stemmer.html)
|
||||
* [`light_english`](https://ciir.cs.umass.edu/pubfiles/ir-35.pdf)
|
||||
* [`lovins`](https://snowballstem.org/algorithms/lovins/stemmer.md)
|
||||
:::{admonition} Deprecated in 8.16.0
|
||||
This language was deprecated in 8.16.0.
|
||||
:::
|
||||
* [`minimal_english`](https://www.researchgate.net/publication/220433848_How_effective_is_suffixing)
|
||||
* [`porter2`](https://snowballstem.org/algorithms/english/stemmer.html)
|
||||
* [`possessive_english`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/en/EnglishPossessiveFilter.html)
|
||||
* Estonian:[**`estonian`**](https://lucene.apache.org/core/10_0_0/analyzers-common/org/tartarus/snowball/ext/EstonianStemmer.md)
|
||||
* Finnish:
|
||||
* [**`finnish`**](https://snowballstem.org/algorithms/finnish/stemmer.html)
|
||||
* [`light_finnish`](http://clef.isti.cnr.it/2003/WN_web/22.pdf)
|
||||
* French:
|
||||
* [**`light_french`**](https://dl.acm.org/citation.cfm?id=1141523)
|
||||
* [`french`](https://snowballstem.org/algorithms/french/stemmer.html)
|
||||
* [`minimal_french`](https://dl.acm.org/citation.cfm?id=318984)
|
||||
* Galician:
|
||||
* [**`galician`**](http://bvg.udc.es/recursos_lingua/stemming.jsp)
|
||||
* [`minimal_galician`](http://bvg.udc.es/recursos_lingua/stemming.jsp) (Plural step only)
|
||||
* German:
|
||||
* [**`light_german`**](https://dl.acm.org/citation.cfm?id=1141523),
|
||||
* [`german`](https://snowballstem.org/algorithms/german/stemmer.html)
|
||||
* [`minimal_german`](http://members.unine.ch/jacques.savoy/clef/morpho.pdf)
|
||||
* Greek:[**`greek`**](https://sais.se/mthprize/2007/ntais2007.pdf)
|
||||
* Hindi:[**`hindi`**](http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf)
|
||||
* Hungarian:
|
||||
* [**`hungarian`**](https://snowballstem.org/algorithms/hungarian/stemmer.html)
|
||||
* [`light_hungarian`](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)
|
||||
* Indonesian:[**`indonesian`**](http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf)
|
||||
* Irish:[**`irish`**](https://snowballstem.org/otherapps/oregan/)
|
||||
* Italian:
|
||||
* [**`light_italian`**](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)
|
||||
* [`italian`](https://snowballstem.org/algorithms/italian/stemmer.html)
|
||||
* Kurdish (Sorani):[**`sorani`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/ckb/SoraniStemmer.md)
|
||||
* Latvian:[**`latvian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/lv/LatvianStemmer.md)
|
||||
* Lithuanian:[**`lithuanian`**](https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/analysis/common/src/java/org/apache/lucene/analysis/lt/stem_ISO_8859_1.sbl?view=markup)
|
||||
* Norwegian (Bokmål):
|
||||
* [**`norwegian`**](https://snowballstem.org/algorithms/norwegian/stemmer.html)
|
||||
* [**`light_norwegian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianLightStemmer.md)
|
||||
* [`minimal_norwegian`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.md)
|
||||
* Norwegian (Nynorsk):
|
||||
* [**`light_nynorsk`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianLightStemmer.md)
|
||||
* [`minimal_nynorsk`](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/no/NorwegianMinimalStemmer.md)
|
||||
* Persian:[**`persian`**](https://lucene.apache.org/core/10_0_0/analysis/common/org/apache/lucene/analysis/fa/PersianStemmer.md)
|
||||
* Portuguese:
|
||||
* [**`light_portuguese`**](https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181)
|
||||
* [`minimal_portuguese`](http://www.inf.ufrgs.br/~buriol/papers/Orengo_CLEF07.pdf)
|
||||
* [`portuguese`](https://snowballstem.org/algorithms/portuguese/stemmer.html)
|
||||
* [`portuguese_rslp`](https://www.inf.ufrgs.br/\~viviane/rslp/index.htm)
|
||||
* Romanian:[**`romanian`**](https://snowballstem.org/algorithms/romanian/stemmer.html)
|
||||
* Russian:
|
||||
* [**`russian`**](https://snowballstem.org/algorithms/russian/stemmer.html)
|
||||
* [`light_russian`](https://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf)
|
||||
* Serbian:[**`serbian`**](https://snowballstem.org/algorithms/serbian/stemmer.html)
|
||||
* Spanish:
|
||||
* [**`light_spanish`**](https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf)
|
||||
* [`spanish`](https://snowballstem.org/algorithms/spanish/stemmer.html)
|
||||
* [`spanish_plural`](https://www.wikilengua.org/index.php/Plural_(formaci%C3%B3n))
|
||||
* Swedish:
|
||||
* [**`swedish`**](https://snowballstem.org/algorithms/swedish/stemmer.html)
|
||||
* [`light_swedish`](http://clef.isti.cnr.it/2003/WN_web/22.pdf)
|
||||
* Turkish:[**`turkish`**](https://snowballstem.org/algorithms/turkish/stemmer.html)
|
||||
:::
|
||||
|
||||
`name`: An alias for the [`language`](#analysis-stemmer-tokenfilter-language-parm) parameter. If both this and the `language` parameter are specified, the `language` parameter argument is used.
|
||||
`name`
|
||||
: An alias for the [`language`](#analysis-stemmer-tokenfilter-language-parm) parameter. If both this and the `language` parameter are specified, the `language` parameter argument is used.
|
||||
|
||||
|
||||
## Customize [analysis-stemmer-tokenfilter-customize]
|
||||
|
|
|
@ -279,7 +279,7 @@ While indexing does not support token graphs containing multi-position tokens, q
|
|||
|
||||
To see how token graphs produced by the `word_delimiter` and `word_delimiter_graph` filters differ, check out the following example.
|
||||
|
||||
:::::{dropdown} **Example**
|
||||
:::::{dropdown} Example
|
||||
$$$analysis-word-delimiter-graph-basic-token-graph$$$
|
||||
**Basic token graph**
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue