mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
(cherry picked from commit 23be51a04f
)
# Conflicts:
# docs/reference/data-analysis/text-analysis/analysis-simplepattern-tokenizer.md
# docs/reference/data-analysis/text-analysis/analysis-simplepatternsplit-tokenizer.md
# docs/reference/query-languages/esql/_snippets/date-time-functions-orig.md
# docs/reference/query-languages/esql/_snippets/functions/ceil.md
# docs/reference/query-languages/esql/_snippets/functions/date_parse.md
# docs/reference/query-languages/esql/_snippets/functions/description/floor.md
# docs/reference/query-languages/esql/_snippets/functions/floor.md
# docs/reference/query-languages/esql/_snippets/math-functions-orig.md
# docs/reference/query-languages/query-dsl/query-dsl-geo-polygon-query.md
Co-authored-by: Colleen McGinnis <colleen.mcginnis@elastic.co>
1.6 KiB
1.6 KiB
mapped_pages | |
---|---|
|
nori_part_of_speech token filter [analysis-nori-speech]
The nori_part_of_speech
token filter removes tokens that match a set of part-of-speech tags. The list of supported tags and their meanings can be found here: Part of speech tags
It accepts the following setting:
stoptags
- An array of part-of-speech tags that should be removed.
and defaults to:
"stoptags": [
"E",
"IC",
"J",
"MAG", "MAJ", "MM",
"SP", "SSC", "SSO", "SC", "SE",
"XPN", "XSA", "XSN", "XSV",
"UNA", "NA", "VSV"
]
For example:
PUT nori_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "nori_tokenizer",
"filter": [
"my_posfilter"
]
}
},
"filter": {
"my_posfilter": {
"type": "nori_part_of_speech",
"stoptags": [
"NR" <1>
]
}
}
}
}
}
}
GET nori_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "여섯 용이" <2>
}
- Korean numerals should be removed (
NR
) - Six dragons
Which responds with:
{
"tokens" : [ {
"token" : "용",
"start_offset" : 3,
"end_offset" : 4,
"type" : "word",
"position" : 1
}, {
"token" : "이",
"start_offset" : 4,
"end_offset" : 5,
"type" : "word",
"position" : 2
} ]
}