mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
* [DOCS] Documentation for the stable plugin API * Removed references to rivers * Add link to Cloud docs for managing plugins * Add caveat about needing to update plugins * Remove reference to site plugins * Wording and clarifications * Fix test * Add link to text analysis docs * Text analysis API dependencies * Remove reference to REST endpoints and fix list * Move plugin descriptor file to its own page * Typos * Review feedback * Delete unused properties file * Changed into * Changed 'elasticsearchVersion' into 'pluginApiVersion' * Swap 'The analysis plugin API' and 'Plugin file structure' sections * Update docs/plugins/authors.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/example-text-analysis-plugin.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/plugin-descriptor-file.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/plugin-script.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> * Rewording * Add modulename and extended.plugins descriptions for descriptor file * Add link to existing plugins in Github * Review feedback * Use 'stable' and 'classic' plugin naming * Fix capitalization * Review feedback --------- Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com> Co-authored-by: William Brafford <william.brafford@elastic.co>
105 lines
3 KiB
Text
105 lines
3 KiB
Text
[[analysis-phonetic]]
|
|
=== Phonetic analysis plugin
|
|
|
|
The Phonetic Analysis plugin provides token filters which convert tokens to
|
|
their phonetic representation using Soundex, Metaphone, and a variety of other
|
|
algorithms.
|
|
|
|
:plugin_name: analysis-phonetic
|
|
include::install_remove.asciidoc[]
|
|
|
|
|
|
[[analysis-phonetic-token-filter]]
|
|
==== `phonetic` token filter
|
|
|
|
The `phonetic` token filter takes the following settings:
|
|
|
|
`encoder`::
|
|
|
|
Which phonetic encoder to use. Accepts `metaphone` (default),
|
|
`double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
|
|
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
|
|
`beider_morse`, `daitch_mokotoff`.
|
|
|
|
`replace`::
|
|
|
|
Whether or not the original token should be replaced by the phonetic
|
|
token. Accepts `true` (default) and `false`. Not supported by
|
|
`beider_morse` encoding.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT phonetic_sample
|
|
{
|
|
"settings": {
|
|
"index": {
|
|
"analysis": {
|
|
"analyzer": {
|
|
"my_analyzer": {
|
|
"tokenizer": "standard",
|
|
"filter": [
|
|
"lowercase",
|
|
"my_metaphone"
|
|
]
|
|
}
|
|
},
|
|
"filter": {
|
|
"my_metaphone": {
|
|
"type": "phonetic",
|
|
"encoder": "metaphone",
|
|
"replace": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
GET phonetic_sample/_analyze
|
|
{
|
|
"analyzer": "my_analyzer",
|
|
"text": "Joe Bloggs" <1>
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
|
|
|
|
It is important to note that `"replace": false` can lead to unexpected behavior since
|
|
the original and the phonetically analyzed version are both kept at the same token position.
|
|
Some queries handle these stacked tokens in special ways. For example, the fuzzy `match`
|
|
query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens.
|
|
This can lead to issues that are difficult to diagnose and reason about. For this reason, it
|
|
is often beneficial to use separate fields for analysis with and without phonetic filtering.
|
|
That way searches can be run against both fields with differing boosts and trade-offs (e.g.
|
|
only run a fuzzy `match` query on the original text field, but not on the phonetic version).
|
|
|
|
[discrete]
|
|
===== Double metaphone settings
|
|
|
|
If the `double_metaphone` encoder is used, then this additional setting is
|
|
supported:
|
|
|
|
`max_code_len`::
|
|
|
|
The maximum length of the emitted metaphone token. Defaults to `4`.
|
|
|
|
[discrete]
|
|
===== Beider Morse settings
|
|
|
|
If the `beider_morse` encoder is used, then these additional settings are
|
|
supported:
|
|
|
|
`rule_type`::
|
|
|
|
Whether matching should be `exact` or `approx` (default).
|
|
|
|
`name_type`::
|
|
|
|
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
|
|
|
|
`languageset`::
|
|
|
|
An array of languages to check. If not specified, then the language will
|
|
be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
|
|
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
|
|
`spanish`.
|