mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
* delete asciidoc files
* add migrated files
* fix errors
* Disable docs tests
* Clarify release notes page titles
* Revert "Clarify release notes page titles"
This reverts commit 8be688648d
.
* Comment out edternal URI images
* Clean up query languages landing pages, link to conceptual docs
* Add .md to url
* Fixes inference processor nesting.
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
1.3 KiB
1.3 KiB
mapped_pages | |
---|---|
|
hiragana_uppercase token filter [analysis-kuromoji-hiragana-uppercase]
The hiragana_uppercase
token filter normalizes small letters (捨て仮名) in hiragana into standard letters. This filter is useful if you want to search against old style Japanese text such as patents, legal documents, contract policies, etc.
For example:
PUT kuromoji_sample
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "kuromoji_tokenizer",
"filter": [
"hiragana_uppercase"
]
}
}
}
}
}
}
GET kuromoji_sample/_analyze
{
"analyzer": "my_analyzer",
"text": "ちょっとまって"
}
Which results in:
{
"tokens": [
{
"token": "ちよつと",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0
},
{
"token": "まつ",
"start_offset": 4,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "て",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 2
}
]
}