elasticsearch/docs/plugins/analysis-phonetic.asciidoc
Abdon Pijpelink f93a94009f
[DOCS] Documentation update for creating plugins (#93413)
* [DOCS] Documentation for the stable plugin API

* Removed references to rivers

* Add link to Cloud docs for managing plugins

* Add caveat about needing to update plugins

* Remove reference to site plugins

* Wording and clarifications

* Fix test

* Add link to text analysis docs

* Text analysis API dependencies

* Remove reference to REST endpoints and fix list

* Move plugin descriptor file to its own page

* Typos

* Review feedback

* Delete unused properties file

* Changed  into

* Changed 'elasticsearchVersion' into 'pluginApiVersion'

* Swap 'The analysis plugin API' and 'Plugin file structure' sections

* Update docs/plugins/authors.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/example-text-analysis-plugin.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/plugin-descriptor-file.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/plugin-script.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Rewording

* Add modulename and extended.plugins descriptions for descriptor file

* Add link to existing plugins in Github

* Review feedback

* Use 'stable' and 'classic' plugin naming

* Fix capitalization

* Review feedback

---------

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
Co-authored-by: William Brafford <william.brafford@elastic.co>
2023-02-13 14:15:12 +01:00

105 lines
3 KiB
Text

[[analysis-phonetic]]
=== Phonetic analysis plugin
The Phonetic Analysis plugin provides token filters which convert tokens to
their phonetic representation using Soundex, Metaphone, and a variety of other
algorithms.
:plugin_name: analysis-phonetic
include::install_remove.asciidoc[]
[[analysis-phonetic-token-filter]]
==== `phonetic` token filter
The `phonetic` token filter takes the following settings:
`encoder`::
Which phonetic encoder to use. Accepts `metaphone` (default),
`double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
`caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
`beider_morse`, `daitch_mokotoff`.
`replace`::
Whether or not the original token should be replaced by the phonetic
token. Accepts `true` (default) and `false`. Not supported by
`beider_morse` encoding.
[source,console]
--------------------------------------------------
PUT phonetic_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
}
}
}
}
}
GET phonetic_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "Joe Bloggs" <1>
}
--------------------------------------------------
<1> Returns: `J`, `joe`, `BLKS`, `bloggs`
It is important to note that `"replace": false` can lead to unexpected behavior since
the original and the phonetically analyzed version are both kept at the same token position.
Some queries handle these stacked tokens in special ways. For example, the fuzzy `match`
query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens.
This can lead to issues that are difficult to diagnose and reason about. For this reason, it
is often beneficial to use separate fields for analysis with and without phonetic filtering.
That way searches can be run against both fields with differing boosts and trade-offs (e.g.
only run a fuzzy `match` query on the original text field, but not on the phonetic version).
[discrete]
===== Double metaphone settings
If the `double_metaphone` encoder is used, then this additional setting is
supported:
`max_code_len`::
The maximum length of the emitted metaphone token. Defaults to `4`.
[discrete]
===== Beider Morse settings
If the `beider_morse` encoder is used, then these additional settings are
supported:
`rule_type`::
Whether matching should be `exact` or `approx` (default).
`name_type`::
Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
`languageset`::
An array of languages to check. If not specified, then the language will
be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
`german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
`spanish`.