mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Colleen McGinnis 9bcd59596d

[docs] Prepare for docs-assembler (#125118 )

* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images

2025-03-20 12:09:12 -05:00

41 KiB

Raw Blame History

navigation_title

mapped_pages

Language

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html

Language analyzers [analysis-lang-analyzer]

A set of analyzers aimed at analyzing specific language text. The following types are supported: arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, serbian, sorani, spanish, swedish, turkish, thai.

Configuring language analyzers [_configuring_language_analyzers]

Stopwords [_stopwords]

All analyzers support setting custom stopwords either internally in the config, or by using an external stopwords file by setting stopwords_path. Check Stop Analyzer for more details.

Excluding words from stemming [_excluding_words_from_stemming]

The stem_exclusion parameter allows you to specify an array of lowercase words that should not be stemmed. Internally, this functionality is implemented by adding the keyword_marker token filter with the keywords set to the value of the stem_exclusion parameter.

The following analyzers support setting custom stem_exclusion list: arabic, armenian, basque, bengali, bulgarian, catalan, czech, dutch, english, finnish, french, galician, german, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, portuguese, romanian, russian, serbian, sorani, spanish, swedish, turkish.

Reimplementing language analyzers [_reimplementing_language_analyzers]

The built-in language analyzers can be reimplemented as custom analyzers (as described below) in order to customize their behaviour.

::::{note} If you do not intend to exclude words from being stemmed (the equivalent of the stem_exclusion parameter above), then you should remove the keyword_marker token filter from the custom analyzer configuration. ::::

`arabic` analyzer [arabic-analyzer]

The arabic analyzer could be reimplemented as a custom analyzer as follows:

PUT /arabic_example
{
  "settings": {
    "analysis": {
      "filter": {
        "arabic_stop": {
          "type":       "stop",
          "stopwords":  "_arabic_" <1>
        },
        "arabic_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["مثال"] <2>
        },
        "arabic_stemmer": {
          "type":       "stemmer",
          "language":   "arabic"
        }
      },
      "analyzer": {
        "rebuilt_arabic": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "decimal_digit",
            "arabic_stop",
            "arabic_normalization",
            "arabic_keywords",
            "arabic_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`armenian` analyzer [armenian-analyzer]

The armenian analyzer could be reimplemented as a custom analyzer as follows:

PUT /armenian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "armenian_stop": {
          "type":       "stop",
          "stopwords":  "_armenian_" <1>
        },
        "armenian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["օրինակ"] <2>
        },
        "armenian_stemmer": {
          "type":       "stemmer",
          "language":   "armenian"
        }
      },
      "analyzer": {
        "rebuilt_armenian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "armenian_stop",
            "armenian_keywords",
            "armenian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`basque` analyzer [basque-analyzer]

The basque analyzer could be reimplemented as a custom analyzer as follows:

PUT /basque_example
{
  "settings": {
    "analysis": {
      "filter": {
        "basque_stop": {
          "type":       "stop",
          "stopwords":  "_basque_" <1>
        },
        "basque_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["Adibidez"] <2>
        },
        "basque_stemmer": {
          "type":       "stemmer",
          "language":   "basque"
        }
      },
      "analyzer": {
        "rebuilt_basque": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "basque_stop",
            "basque_keywords",
            "basque_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`bengali` analyzer [bengali-analyzer]

The bengali analyzer could be reimplemented as a custom analyzer as follows:

PUT /bengali_example
{
  "settings": {
    "analysis": {
      "filter": {
        "bengali_stop": {
          "type":       "stop",
          "stopwords":  "_bengali_" <1>
        },
        "bengali_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["উদাহরণ"] <2>
        },
        "bengali_stemmer": {
          "type":       "stemmer",
          "language":   "bengali"
        }
      },
      "analyzer": {
        "rebuilt_bengali": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "decimal_digit",
            "bengali_keywords",
            "indic_normalization",
            "bengali_normalization",
            "bengali_stop",
            "bengali_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`brazilian` analyzer [brazilian-analyzer]

The brazilian analyzer could be reimplemented as a custom analyzer as follows:

PUT /brazilian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "brazilian_stop": {
          "type":       "stop",
          "stopwords":  "_brazilian_" <1>
        },
        "brazilian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["exemplo"] <2>
        },
        "brazilian_stemmer": {
          "type":       "stemmer",
          "language":   "brazilian"
        }
      },
      "analyzer": {
        "rebuilt_brazilian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "brazilian_stop",
            "brazilian_keywords",
            "brazilian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`bulgarian` analyzer [bulgarian-analyzer]

The bulgarian analyzer could be reimplemented as a custom analyzer as follows:

PUT /bulgarian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "bulgarian_stop": {
          "type":       "stop",
          "stopwords":  "_bulgarian_" <1>
        },
        "bulgarian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["пример"] <2>
        },
        "bulgarian_stemmer": {
          "type":       "stemmer",
          "language":   "bulgarian"
        }
      },
      "analyzer": {
        "rebuilt_bulgarian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "bulgarian_stop",
            "bulgarian_keywords",
            "bulgarian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`catalan` analyzer [catalan-analyzer]

The catalan analyzer could be reimplemented as a custom analyzer as follows:

PUT /catalan_example
{
  "settings": {
    "analysis": {
      "filter": {
        "catalan_elision": {
          "type":       "elision",
          "articles":   [ "d", "l", "m", "n", "s", "t"],
          "articles_case": true
        },
        "catalan_stop": {
          "type":       "stop",
          "stopwords":  "_catalan_" <1>
        },
        "catalan_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"] <2>
        },
        "catalan_stemmer": {
          "type":       "stemmer",
          "language":   "catalan"
        }
      },
      "analyzer": {
        "rebuilt_catalan": {
          "tokenizer":  "standard",
          "filter": [
            "catalan_elision",
            "lowercase",
            "catalan_stop",
            "catalan_keywords",
            "catalan_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`cjk` analyzer [cjk-analyzer]

::::{note} You may find that icu_analyzer in the ICU analysis plugin works better for CJK text than the cjk analyzer. Experiment with your text and queries. ::::

The cjk analyzer could be reimplemented as a custom analyzer as follows:

PUT /cjk_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  [ <1>
            "a", "and", "are", "as", "at", "be", "but", "by", "for",
            "if", "in", "into", "is", "it", "no", "not", "of", "on",
            "or", "s", "such", "t", "that", "the", "their", "then",
            "there", "these", "they", "this", "to", "was", "will",
            "with", "www"
          ]
        }
      },
      "analyzer": {
        "rebuilt_cjk": {
          "tokenizer":  "standard",
          "filter": [
            "cjk_width",
            "lowercase",
            "cjk_bigram",
            "english_stop"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters. The default stop words are almost the same as the _english_ set, but not exactly the same.

`czech` analyzer [czech-analyzer]

The czech analyzer could be reimplemented as a custom analyzer as follows:

PUT /czech_example
{
  "settings": {
    "analysis": {
      "filter": {
        "czech_stop": {
          "type":       "stop",
          "stopwords":  "_czech_" <1>
        },
        "czech_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["příklad"] <2>
        },
        "czech_stemmer": {
          "type":       "stemmer",
          "language":   "czech"
        }
      },
      "analyzer": {
        "rebuilt_czech": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "czech_stop",
            "czech_keywords",
            "czech_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`danish` analyzer [danish-analyzer]

The danish analyzer could be reimplemented as a custom analyzer as follows:

PUT /danish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "danish_stop": {
          "type":       "stop",
          "stopwords":  "_danish_" <1>
        },
        "danish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["eksempel"] <2>
        },
        "danish_stemmer": {
          "type":       "stemmer",
          "language":   "danish"
        }
      },
      "analyzer": {
        "rebuilt_danish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "danish_stop",
            "danish_keywords",
            "danish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`dutch` analyzer [dutch-analyzer]

The dutch analyzer could be reimplemented as a custom analyzer as follows:

PUT /dutch_example
{
  "settings": {
    "analysis": {
      "filter": {
        "dutch_stop": {
          "type":       "stop",
          "stopwords":  "_dutch_" <1>
        },
        "dutch_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["voorbeeld"] <2>
        },
        "dutch_stemmer": {
          "type":       "stemmer",
          "language":   "dutch"
        },
        "dutch_override": {
          "type":       "stemmer_override",
          "rules": [
            "fiets=>fiets",
            "bromfiets=>bromfiets",
            "ei=>eier",
            "kind=>kinder"
          ]
        }
      },
      "analyzer": {
        "rebuilt_dutch": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "dutch_stop",
            "dutch_keywords",
            "dutch_override",
            "dutch_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`english` analyzer [english-analyzer]

The english analyzer could be reimplemented as a custom analyzer as follows:

PUT /english_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" <1>
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"] <2>
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        }
      },
      "analyzer": {
        "rebuilt_english": {
          "tokenizer":  "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_keywords",
            "english_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`estonian` analyzer [estonian-analyzer]

The estonian analyzer could be reimplemented as a custom analyzer as follows:

PUT /estonian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "estonian_stop": {
          "type":       "stop",
          "stopwords":  "_estonian_" <1>
        },
        "estonian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["näide"] <2>
        },
        "estonian_stemmer": {
          "type":       "stemmer",
          "language":   "estonian"
        }
      },
      "analyzer": {
        "rebuilt_estonian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "estonian_stop",
            "estonian_keywords",
            "estonian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`finnish` analyzer [finnish-analyzer]

The finnish analyzer could be reimplemented as a custom analyzer as follows:

PUT /finnish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "finnish_stop": {
          "type":       "stop",
          "stopwords":  "_finnish_" <1>
        },
        "finnish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["esimerkki"] <2>
        },
        "finnish_stemmer": {
          "type":       "stemmer",
          "language":   "finnish"
        }
      },
      "analyzer": {
        "rebuilt_finnish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "finnish_stop",
            "finnish_keywords",
            "finnish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`french` analyzer [french-analyzer]

The french analyzer could be reimplemented as a custom analyzer as follows:

PUT /french_example
{
  "settings": {
    "analysis": {
      "filter": {
        "french_elision": {
          "type":         "elision",
          "articles_case": true,
          "articles": [
              "l", "m", "t", "qu", "n", "s",
              "j", "d", "c", "jusqu", "quoiqu",
              "lorsqu", "puisqu"
            ]
        },
        "french_stop": {
          "type":       "stop",
          "stopwords":  "_french_" <1>
        },
        "french_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["Example"] <2>
        },
        "french_stemmer": {
          "type":       "stemmer",
          "language":   "light_french"
        }
      },
      "analyzer": {
        "rebuilt_french": {
          "tokenizer":  "standard",
          "filter": [
            "french_elision",
            "lowercase",
            "french_stop",
            "french_keywords",
            "french_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`galician` analyzer [galician-analyzer]

The galician analyzer could be reimplemented as a custom analyzer as follows:

PUT /galician_example
{
  "settings": {
    "analysis": {
      "filter": {
        "galician_stop": {
          "type":       "stop",
          "stopwords":  "_galician_" <1>
        },
        "galician_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["exemplo"] <2>
        },
        "galician_stemmer": {
          "type":       "stemmer",
          "language":   "galician"
        }
      },
      "analyzer": {
        "rebuilt_galician": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "galician_stop",
            "galician_keywords",
            "galician_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`german` analyzer [german-analyzer]

The german analyzer could be reimplemented as a custom analyzer as follows:

PUT /german_example
{
  "settings": {
    "analysis": {
      "filter": {
        "german_stop": {
          "type":       "stop",
          "stopwords":  "_german_" <1>
        },
        "german_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["Beispiel"] <2>
        },
        "german_stemmer": {
          "type":       "stemmer",
          "language":   "light_german"
        }
      },
      "analyzer": {
        "rebuilt_german": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "german_stop",
            "german_keywords",
            "german_normalization",
            "german_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`greek` analyzer [greek-analyzer]

The greek analyzer could be reimplemented as a custom analyzer as follows:

PUT /greek_example
{
  "settings": {
    "analysis": {
      "filter": {
        "greek_stop": {
          "type":       "stop",
          "stopwords":  "_greek_" <1>
        },
        "greek_lowercase": {
          "type":       "lowercase",
          "language":   "greek"
        },
        "greek_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["παράδειγμα"] <2>
        },
        "greek_stemmer": {
          "type":       "stemmer",
          "language":   "greek"
        }
      },
      "analyzer": {
        "rebuilt_greek": {
          "tokenizer":  "standard",
          "filter": [
            "greek_lowercase",
            "greek_stop",
            "greek_keywords",
            "greek_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`hindi` analyzer [hindi-analyzer]

The hindi analyzer could be reimplemented as a custom analyzer as follows:

PUT /hindi_example
{
  "settings": {
    "analysis": {
      "filter": {
        "hindi_stop": {
          "type":       "stop",
          "stopwords":  "_hindi_" <1>
        },
        "hindi_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["उदाहरण"] <2>
        },
        "hindi_stemmer": {
          "type":       "stemmer",
          "language":   "hindi"
        }
      },
      "analyzer": {
        "rebuilt_hindi": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "decimal_digit",
            "hindi_keywords",
            "indic_normalization",
            "hindi_normalization",
            "hindi_stop",
            "hindi_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`hungarian` analyzer [hungarian-analyzer]

The hungarian analyzer could be reimplemented as a custom analyzer as follows:

PUT /hungarian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "hungarian_stop": {
          "type":       "stop",
          "stopwords":  "_hungarian_" <1>
        },
        "hungarian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["példa"] <2>
        },
        "hungarian_stemmer": {
          "type":       "stemmer",
          "language":   "hungarian"
        }
      },
      "analyzer": {
        "rebuilt_hungarian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "hungarian_stop",
            "hungarian_keywords",
            "hungarian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`indonesian` analyzer [indonesian-analyzer]

The indonesian analyzer could be reimplemented as a custom analyzer as follows:

PUT /indonesian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "indonesian_stop": {
          "type":       "stop",
          "stopwords":  "_indonesian_" <1>
        },
        "indonesian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["contoh"] <2>
        },
        "indonesian_stemmer": {
          "type":       "stemmer",
          "language":   "indonesian"
        }
      },
      "analyzer": {
        "rebuilt_indonesian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "indonesian_stop",
            "indonesian_keywords",
            "indonesian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`irish` analyzer [irish-analyzer]

The irish analyzer could be reimplemented as a custom analyzer as follows:

PUT /irish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "irish_hyphenation": {
          "type":       "stop",
          "stopwords":  [ "h", "n", "t" ],
          "ignore_case": true
        },
        "irish_elision": {
          "type":       "elision",
          "articles":   [ "d", "m", "b" ],
          "articles_case": true
        },
        "irish_stop": {
          "type":       "stop",
          "stopwords":  "_irish_" <1>
        },
        "irish_lowercase": {
          "type":       "lowercase",
          "language":   "irish"
        },
        "irish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["sampla"] <2>
        },
        "irish_stemmer": {
          "type":       "stemmer",
          "language":   "irish"
        }
      },
      "analyzer": {
        "rebuilt_irish": {
          "tokenizer":  "standard",
          "filter": [
            "irish_hyphenation",
            "irish_elision",
            "irish_lowercase",
            "irish_stop",
            "irish_keywords",
            "irish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`italian` analyzer [italian-analyzer]

The italian analyzer could be reimplemented as a custom analyzer as follows:

PUT /italian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "italian_elision": {
          "type": "elision",
          "articles": [
                "c", "l", "all", "dall", "dell",
                "nell", "sull", "coll", "pell",
                "gl", "agl", "dagl", "degl", "negl",
                "sugl", "un", "m", "t", "s", "v", "d"
          ],
          "articles_case": true
        },
        "italian_stop": {
          "type":       "stop",
          "stopwords":  "_italian_" <1>
        },
        "italian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["esempio"] <2>
        },
        "italian_stemmer": {
          "type":       "stemmer",
          "language":   "light_italian"
        }
      },
      "analyzer": {
        "rebuilt_italian": {
          "tokenizer":  "standard",
          "filter": [
            "italian_elision",
            "lowercase",
            "italian_stop",
            "italian_keywords",
            "italian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`latvian` analyzer [latvian-analyzer]

The latvian analyzer could be reimplemented as a custom analyzer as follows:

PUT /latvian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "latvian_stop": {
          "type":       "stop",
          "stopwords":  "_latvian_" <1>
        },
        "latvian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["piemērs"] <2>
        },
        "latvian_stemmer": {
          "type":       "stemmer",
          "language":   "latvian"
        }
      },
      "analyzer": {
        "rebuilt_latvian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "latvian_stop",
            "latvian_keywords",
            "latvian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`lithuanian` analyzer [lithuanian-analyzer]

The lithuanian analyzer could be reimplemented as a custom analyzer as follows:

PUT /lithuanian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "lithuanian_stop": {
          "type":       "stop",
          "stopwords":  "_lithuanian_" <1>
        },
        "lithuanian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["pavyzdys"] <2>
        },
        "lithuanian_stemmer": {
          "type":       "stemmer",
          "language":   "lithuanian"
        }
      },
      "analyzer": {
        "rebuilt_lithuanian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "lithuanian_stop",
            "lithuanian_keywords",
            "lithuanian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`norwegian` analyzer [norwegian-analyzer]

The norwegian analyzer could be reimplemented as a custom analyzer as follows:

PUT /norwegian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "norwegian_stop": {
          "type":       "stop",
          "stopwords":  "_norwegian_" <1>
        },
        "norwegian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["eksempel"] <2>
        },
        "norwegian_stemmer": {
          "type":       "stemmer",
          "language":   "norwegian"
        }
      },
      "analyzer": {
        "rebuilt_norwegian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "norwegian_stop",
            "norwegian_keywords",
            "norwegian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`persian` analyzer [persian-analyzer]

The persian analyzer could be reimplemented as a custom analyzer as follows:

PUT /persian_example
{
  "settings": {
    "analysis": {
      "char_filter": {
        "zero_width_spaces": {
            "type":       "mapping",
            "mappings": [ "\\u200C=>\\u0020"] <1>
        }
      },
      "filter": {
        "persian_stop": {
          "type":       "stop",
          "stopwords":  "_persian_" <2>
        }
      },
      "analyzer": {
        "rebuilt_persian": {
          "tokenizer":     "standard",
          "char_filter": [ "zero_width_spaces" ],
          "filter": [
            "lowercase",
            "decimal_digit",
            "arabic_normalization",
            "persian_normalization",
            "persian_stop",
            "persian_stem"
          ]
        }
      }
    }
  }
}

Replaces zero-width non-joiners with an ASCII space.
The default stopwords can be overridden with the stopwords or stopwords_path parameters.

`portuguese` analyzer [portuguese-analyzer]

The portuguese analyzer could be reimplemented as a custom analyzer as follows:

PUT /portuguese_example
{
  "settings": {
    "analysis": {
      "filter": {
        "portuguese_stop": {
          "type":       "stop",
          "stopwords":  "_portuguese_" <1>
        },
        "portuguese_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["exemplo"] <2>
        },
        "portuguese_stemmer": {
          "type":       "stemmer",
          "language":   "light_portuguese"
        }
      },
      "analyzer": {
        "rebuilt_portuguese": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "portuguese_stop",
            "portuguese_keywords",
            "portuguese_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`romanian` analyzer [romanian-analyzer]

The romanian analyzer could be reimplemented as a custom analyzer as follows:

PUT /romanian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "romanian_stop": {
          "type":       "stop",
          "stopwords":  "_romanian_" <1>
        },
        "romanian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["exemplu"] <2>
        },
        "romanian_stemmer": {
          "type":       "stemmer",
          "language":   "romanian"
        }
      },
      "analyzer": {
        "rebuilt_romanian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "romanian_stop",
            "romanian_keywords",
            "romanian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`russian` analyzer [russian-analyzer]

The russian analyzer could be reimplemented as a custom analyzer as follows:

PUT /russian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "russian_stop": {
          "type":       "stop",
          "stopwords":  "_russian_" <1>
        },
        "russian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["пример"] <2>
        },
        "russian_stemmer": {
          "type":       "stemmer",
          "language":   "russian"
        }
      },
      "analyzer": {
        "rebuilt_russian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "russian_stop",
            "russian_keywords",
            "russian_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`serbian` analyzer [serbian-analyzer]

The serbian analyzer could be reimplemented as a custom analyzer as follows:

PUT /serbian_example
{
  "settings": {
    "analysis": {
      "filter": {
        "serbian_stop": {
          "type":       "stop",
          "stopwords":  "_serbian_" <1>
        },
        "serbian_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["пример"] <2>
        },
        "serbian_stemmer": {
          "type":       "stemmer",
          "language":   "serbian"
        }
      },
      "analyzer": {
        "rebuilt_serbian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "serbian_stop",
            "serbian_keywords",
            "serbian_stemmer",
            "serbian_normalization"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`sorani` analyzer [sorani-analyzer]

The sorani analyzer could be reimplemented as a custom analyzer as follows:

PUT /sorani_example
{
  "settings": {
    "analysis": {
      "filter": {
        "sorani_stop": {
          "type":       "stop",
          "stopwords":  "_sorani_" <1>
        },
        "sorani_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["mînak"] <2>
        },
        "sorani_stemmer": {
          "type":       "stemmer",
          "language":   "sorani"
        }
      },
      "analyzer": {
        "rebuilt_sorani": {
          "tokenizer":  "standard",
          "filter": [
            "sorani_normalization",
            "lowercase",
            "decimal_digit",
            "sorani_stop",
            "sorani_keywords",
            "sorani_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`spanish` analyzer [spanish-analyzer]

The spanish analyzer could be reimplemented as a custom analyzer as follows:

PUT /spanish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "spanish_stop": {
          "type":       "stop",
          "stopwords":  "_spanish_" <1>
        },
        "spanish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["ejemplo"] <2>
        },
        "spanish_stemmer": {
          "type":       "stemmer",
          "language":   "light_spanish"
        }
      },
      "analyzer": {
        "rebuilt_spanish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "spanish_stop",
            "spanish_keywords",
            "spanish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`swedish` analyzer [swedish-analyzer]

The swedish analyzer could be reimplemented as a custom analyzer as follows:

PUT /swedish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "swedish_stop": {
          "type":       "stop",
          "stopwords":  "_swedish_" <1>
        },
        "swedish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["exempel"] <2>
        },
        "swedish_stemmer": {
          "type":       "stemmer",
          "language":   "swedish"
        }
      },
      "analyzer": {
        "rebuilt_swedish": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "swedish_stop",
            "swedish_keywords",
            "swedish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`turkish` analyzer [turkish-analyzer]

The turkish analyzer could be reimplemented as a custom analyzer as follows:

PUT /turkish_example
{
  "settings": {
    "analysis": {
      "filter": {
        "turkish_stop": {
          "type":       "stop",
          "stopwords":  "_turkish_" <1>
        },
        "turkish_lowercase": {
          "type":       "lowercase",
          "language":   "turkish"
        },
        "turkish_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["örnek"] <2>
        },
        "turkish_stemmer": {
          "type":       "stemmer",
          "language":   "turkish"
        }
      },
      "analyzer": {
        "rebuilt_turkish": {
          "tokenizer":  "standard",
          "filter": [
            "apostrophe",
            "turkish_lowercase",
            "turkish_stop",
            "turkish_keywords",
            "turkish_stemmer"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.
This filter should be removed unless there are words which should be excluded from stemming.

`thai` analyzer [thai-analyzer]

The thai analyzer could be reimplemented as a custom analyzer as follows:

PUT /thai_example
{
  "settings": {
    "analysis": {
      "filter": {
        "thai_stop": {
          "type":       "stop",
          "stopwords":  "_thai_" <1>
        }
      },
      "analyzer": {
        "rebuilt_thai": {
          "tokenizer":  "thai",
          "filter": [
            "lowercase",
            "decimal_digit",
            "thai_stop"
          ]
        }
      }
    }
  }
}

The default stopwords can be overridden with the stopwords or stopwords_path parameters.

41 KiB Raw Blame History

Language analyzers [analysis-lang-analyzer]

Configuring language analyzers [_configuring_language_analyzers]

Stopwords [_stopwords]

Excluding words from stemming [_excluding_words_from_stemming]

Reimplementing language analyzers [_reimplementing_language_analyzers]

arabic analyzer [arabic-analyzer]

armenian analyzer [armenian-analyzer]

basque analyzer [basque-analyzer]

bengali analyzer [bengali-analyzer]

brazilian analyzer [brazilian-analyzer]

bulgarian analyzer [bulgarian-analyzer]

catalan analyzer [catalan-analyzer]

cjk analyzer [cjk-analyzer]

czech analyzer [czech-analyzer]

danish analyzer [danish-analyzer]

dutch analyzer [dutch-analyzer]

english analyzer [english-analyzer]

estonian analyzer [estonian-analyzer]

finnish analyzer [finnish-analyzer]

french analyzer [french-analyzer]

galician analyzer [galician-analyzer]

german analyzer [german-analyzer]

greek analyzer [greek-analyzer]

hindi analyzer [hindi-analyzer]

hungarian analyzer [hungarian-analyzer]

indonesian analyzer [indonesian-analyzer]

irish analyzer [irish-analyzer]

italian analyzer [italian-analyzer]

latvian analyzer [latvian-analyzer]

lithuanian analyzer [lithuanian-analyzer]

norwegian analyzer [norwegian-analyzer]

persian analyzer [persian-analyzer]

portuguese analyzer [portuguese-analyzer]

romanian analyzer [romanian-analyzer]

russian analyzer [russian-analyzer]

serbian analyzer [serbian-analyzer]

sorani analyzer [sorani-analyzer]

spanish analyzer [spanish-analyzer]

swedish analyzer [swedish-analyzer]

turkish analyzer [turkish-analyzer]

thai analyzer [thai-analyzer]