Provide access to new settings for HyphenationCompoundWordTokenFilter (#115585)

Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter

Adds access to the two new flags no_sub_matches and no_overlapping_matches.

Lucene issue: https://github.com/apache/lucene/issues/9231
This commit is contained in:
Peter Straßer 2024-11-18 17:38:49 +01:00 committed by GitHub
parent 99689281e0
commit c804953105
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 1295 additions and 11 deletions

View file

@ -111,6 +111,18 @@ output. Defaults to `5`.
(Optional, Boolean)
If `true`, only include the longest matching subword. Defaults to `false`.
`no_sub_matches`::
(Optional, Boolean)
If `true`, do not match sub tokens in tokens that are in the word list.
Defaults to `false`.
`no_overlapping_matches`::
(Optional, Boolean)
If `true`, do not allow overlapping tokens.
Defaults to `false`.
Typically users will only want to include one of the three flags as enabling `no_overlapping_matches` is the most restrictive and `no_sub_matches` is more restrictive than `only_longest_match`. When enabling a more restrictive option the state of the less restrictive does not have any effect.
[[analysis-hyp-decomp-tokenfilter-customize]]
==== Customize and add to an analyzer