elasticsearch/docs/reference/elasticsearch-plugins/analysis-icu-transform.md
Liam Thompson b98606e712
[9.0] [docs] Migrate docs from AsciiDoc to Markdown (#123507) (#124124)
* [docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648d.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
(cherry picked from commit b7e3a1e14b)

# Conflicts:
#	docs/build.gradle
#	docs/reference/migration/index.asciidoc
#	docs/reference/migration/migrate_9_0.asciidoc
#	docs/reference/release-notes.asciidoc
#	docs/reference/release-notes/9.0.0.asciidoc
#	docs/reference/release-notes/highlights.asciidoc

* Fix build file

* Really fix build file

---------

Co-authored-by: Colleen McGinnis <colleen.j.mcginnis@gmail.com>
2025-03-06 07:53:46 +01:00

1.7 KiB

mapped_pages
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-transform.html

ICU transform token filter [analysis-icu-transform]

Transforms are used to process Unicode text in many different ways, such as case mapping, normalization, transliteration and bidirectional text handling.

You can define which transformation you want to apply with the id parameter (defaults to Null), and specify text direction with the dir parameter which accepts forward (default) for LTR and reverse for RTL. Custom rulesets are not yet supported.

For example:

PUT icu_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "latin": {
            "tokenizer": "keyword",
            "filter": [
              "myLatinTransform"
            ]
          }
        },
        "filter": {
          "myLatinTransform": {
            "type": "icu_transform",
            "id": "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC" <1>
          }
        }
      }
    }
  }
}

GET icu_sample/_analyze
{
  "analyzer": "latin",
  "text": "你好" <2>
}

GET icu_sample/_analyze
{
  "analyzer": "latin",
  "text": "здравствуйте" <3>
}

GET icu_sample/_analyze
{
  "analyzer": "latin",
  "text": "こんにちは" <4>
}
  1. This transforms transliterates characters to Latin, and separates accents from their base characters, removes the accents, and then puts the remaining text into an unaccented form.
  2. Returns ni hao.
  3. Returns zdravstvujte.
  4. Returns kon'nichiha.

For more documentation, Please see the user guide of ICU Transform.