elasticsearch/docs/plugins/analysis-stempel.asciidoc
Abdon Pijpelink f93a94009f
[DOCS] Documentation update for creating plugins (#93413)
* [DOCS] Documentation for the stable plugin API

* Removed references to rivers

* Add link to Cloud docs for managing plugins

* Add caveat about needing to update plugins

* Remove reference to site plugins

* Wording and clarifications

* Fix test

* Add link to text analysis docs

* Text analysis API dependencies

* Remove reference to REST endpoints and fix list

* Move plugin descriptor file to its own page

* Typos

* Review feedback

* Delete unused properties file

* Changed  into

* Changed 'elasticsearchVersion' into 'pluginApiVersion'

* Swap 'The analysis plugin API' and 'Plugin file structure' sections

* Update docs/plugins/authors.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/example-text-analysis-plugin.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/plugin-descriptor-file.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/plugin-script.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Update docs/plugins/development/creating-non-text-analysis-plugins.asciidoc

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>

* Rewording

* Add modulename and extended.plugins descriptions for descriptor file

* Add link to existing plugins in Github

* Review feedback

* Use 'stable' and 'classic' plugin naming

* Fix capitalization

* Review feedback

---------

Co-authored-by: Arianna Laudazzi <46651782+alaudazzi@users.noreply.github.com>
Co-authored-by: William Brafford <william.brafford@elastic.co>
2023-02-13 14:15:12 +01:00

112 lines
2.7 KiB
Text

[[analysis-stempel]]
=== Stempel Polish analysis plugin
The Stempel analysis plugin integrates Lucene's Stempel analysis
module for Polish into elasticsearch.
:plugin_name: analysis-stempel
include::install_remove.asciidoc[]
[[analysis-stempel-tokenizer]]
[discrete]
==== `stempel` tokenizer and token filters
The plugin provides the `polish` analyzer and the `polish_stem` and `polish_stop` token filters,
which are not configurable.
==== Reimplementing and extending the analyzers
The `polish` analyzer could be reimplemented as a `custom` analyzer that can
then be extended and configured differently as follows:
[source,console]
----------------------------------------------------
PUT /stempel_example
{
"settings": {
"analysis": {
"analyzer": {
"rebuilt_stempel": {
"tokenizer": "standard",
"filter": [
"lowercase",
"polish_stop",
"polish_stem"
]
}
}
}
}
}
----------------------------------------------------
// TEST[s/\n$/\nstartyaml\n - compare_analyzers: {index: stempel_example, first: polish, second: rebuilt_stempel}\nendyaml\n/]
[[analysis-polish-stop]]
==== `polish_stop` token filter
The `polish_stop` token filter filters out Polish stopwords (`_polish_`), and
any other custom stopwords specified by the user. This filter only supports
the predefined `_polish_` stopwords list. If you want to use a different
predefined list, then use the
{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.
[source,console]
--------------------------------------------------
PUT /polish_stop_example
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_with_stop": {
"tokenizer": "standard",
"filter": [
"lowercase",
"polish_stop"
]
}
},
"filter": {
"polish_stop": {
"type": "polish_stop",
"stopwords": [
"_polish_",
"jeść"
]
}
}
}
}
}
}
GET polish_stop_example/_analyze
{
"analyzer": "analyzer_with_stop",
"text": "Gdzie kucharek sześć, tam nie ma co jeść."
}
--------------------------------------------------
The above request returns:
[source,console-result]
--------------------------------------------------
{
"tokens" : [
{
"token" : "kucharek",
"start_offset" : 6,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "sześć",
"start_offset" : 15,
"end_offset" : 20,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
--------------------------------------------------