mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
[DOCS] http -> https, remove outdated plugin docs (#60380)
Plugin discovery documentation contained information about installing Elasticsearch 2.0 and installing an oracle JDK, both of which is no longer valid. While noticing that the instructions used cleartext HTTP to install packages, this commit replaces HTTPs links instead of HTTP where possible. In addition a few community links have been removed, as they do not seem to exist anymore.
This commit is contained in:
parent
947186719e
commit
c7ac9e7073
80 changed files with 188 additions and 242 deletions
|
@ -13,12 +13,12 @@ themselves. The regular expression defaults to `\W+` (or all non-word characters
|
|||
========================================
|
||||
|
||||
The pattern analyzer uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
Read more about https://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
|
@ -146,11 +146,11 @@ The `pattern` analyzer accepts the following parameters:
|
|||
[horizontal]
|
||||
`pattern`::
|
||||
|
||||
A http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression], defaults to `\W+`.
|
||||
A https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression], defaults to `\W+`.
|
||||
|
||||
`flags`::
|
||||
|
||||
Java regular expression http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Java regular expression https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`.
|
||||
|
||||
`lowercase`::
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
The `standard` analyzer is the default analyzer which is used if none is
|
||||
specified. It provides grammar based tokenization (based on the Unicode Text
|
||||
Segmentation algorithm, as specified in
|
||||
http://unicode.org/reports/tr29/[Unicode Standard Annex #29]) and works well
|
||||
https://unicode.org/reports/tr29/[Unicode Standard Annex #29]) and works well
|
||||
for most languages.
|
||||
|
||||
[discrete]
|
||||
|
|
|
@ -13,12 +13,12 @@ The replacement string can refer to capture groups in the regular expression.
|
|||
========================================
|
||||
|
||||
The pattern replace character filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
Read more about https://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
|
@ -30,17 +30,17 @@ The `pattern_replace` character filter accepts the following parameters:
|
|||
[horizontal]
|
||||
`pattern`::
|
||||
|
||||
A http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression]. Required.
|
||||
A https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression]. Required.
|
||||
|
||||
`replacement`::
|
||||
|
||||
The replacement string, which can reference capture groups using the
|
||||
`$1`..`$9` syntax, as explained
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendReplacement-java.lang.StringBuffer-java.lang.String-[here].
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendReplacement-java.lang.StringBuffer-java.lang.String-[here].
|
||||
|
||||
`flags`::
|
||||
|
||||
Java regular expression http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Java regular expression https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`.
|
||||
|
||||
[discrete]
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
++++
|
||||
|
||||
Provides <<dictionary-stemmers,dictionary stemming>> based on a provided
|
||||
http://en.wikipedia.org/wiki/Hunspell[Hunspell dictionary]. The `hunspell`
|
||||
{wikipedia}/Hunspell[Hunspell dictionary]. The `hunspell`
|
||||
filter requires
|
||||
<<analysis-hunspell-tokenfilter-dictionary-config,configuration>> of one or more
|
||||
language-specific Hunspell dictionaries.
|
||||
|
@ -244,4 +244,4 @@ Path to a Hunspell dictionary directory. This path must be absolute or
|
|||
relative to the `config` location.
|
||||
+
|
||||
By default, the `<path.config>/hunspell` directory is used, as described in
|
||||
<<analysis-hunspell-tokenfilter-dictionary-config>>.
|
||||
<<analysis-hunspell-tokenfilter-dictionary-config>>.
|
||||
|
|
|
@ -332,7 +332,7 @@ You cannot specify this parameter and `keywords_pattern`.
|
|||
+
|
||||
--
|
||||
(Required*, string)
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java
|
||||
regular expression] used to match tokens. Tokens that match this expression are
|
||||
marked as keywords and not stemmed.
|
||||
|
||||
|
@ -386,4 +386,4 @@ PUT /my-index-000001
|
|||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
----
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>KStem</titleabbrev>
|
||||
++++
|
||||
|
||||
Provides http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[KStem]-based stemming for
|
||||
Provides https://ciir.cs.umass.edu/pubfiles/ir-35.pdf[KStem]-based stemming for
|
||||
the English language. The `kstem` filter combines
|
||||
<<algorithmic-stemmers,algorithmic stemming>> with a built-in
|
||||
<<dictionary-stemmers,dictionary>>.
|
||||
|
|
|
@ -108,7 +108,7 @@ Language-specific lowercase token filter to use. Valid values include:
|
|||
{lucene-analysis-docs}/el/GreekLowerCaseFilter.html[GreekLowerCaseFilter]
|
||||
|
||||
`irish`::: Uses Lucene's
|
||||
http://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
|
||||
{lucene-analysis-docs}/ga/IrishLowerCaseFilter.html[IrishLowerCaseFilter]
|
||||
|
||||
`turkish`::: Uses Lucene's
|
||||
{lucene-analysis-docs}/tr/TurkishLowerCaseFilter.html[TurkishLowerCaseFilter]
|
||||
|
|
|
@ -10,34 +10,34 @@ characters of a certain language.
|
|||
[horizontal]
|
||||
Arabic::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizer.html[`arabic_normalization`]
|
||||
{lucene-analysis-docs}/ar/ArabicNormalizer.html[`arabic_normalization`]
|
||||
|
||||
German::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html[`german_normalization`]
|
||||
{lucene-analysis-docs}/de/GermanNormalizationFilter.html[`german_normalization`]
|
||||
|
||||
Hindi::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizer.html[`hindi_normalization`]
|
||||
{lucene-analysis-docs}/hi/HindiNormalizer.html[`hindi_normalization`]
|
||||
|
||||
Indic::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizer.html[`indic_normalization`]
|
||||
{lucene-analysis-docs}/in/IndicNormalizer.html[`indic_normalization`]
|
||||
|
||||
Kurdish (Sorani)::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizer.html[`sorani_normalization`]
|
||||
{lucene-analysis-docs}/ckb/SoraniNormalizer.html[`sorani_normalization`]
|
||||
|
||||
Persian::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizer.html[`persian_normalization`]
|
||||
{lucene-analysis-docs}/fa/PersianNormalizer.html[`persian_normalization`]
|
||||
|
||||
Scandinavian::
|
||||
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html[`scandinavian_normalization`],
|
||||
http://lucene.apache.org/core/4_9_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html[`scandinavian_folding`]
|
||||
{lucene-analysis-docs}/miscellaneous/ScandinavianNormalizationFilter.html[`scandinavian_normalization`],
|
||||
{lucene-analysis-docs}/miscellaneous/ScandinavianFoldingFilter.html[`scandinavian_folding`]
|
||||
|
||||
Serbian::
|
||||
|
||||
http://lucene.apache.org/core/7_1_0/analyzers-common/org/apache/lucene/analysis/sr/SerbianNormalizationFilter.html[`serbian_normalization`]
|
||||
{lucene-analysis-docs}/sr/SerbianNormalizationFilter.html[`serbian_normalization`]
|
||||
|
||||
|
|
|
@ -15,12 +15,12 @@ overlap.
|
|||
========================================
|
||||
|
||||
The pattern capture token filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
Read more about https://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
Uses a regular expression to match and replace token substrings.
|
||||
|
||||
The `pattern_replace` filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java's
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java's
|
||||
regular expression syntax]. By default, the filter replaces matching
|
||||
substrings with an empty substring (`""`).
|
||||
|
||||
|
@ -22,7 +22,7 @@ A poorly-written regular expression may run slowly or return a
|
|||
StackOverflowError, causing the node running the expression to exit suddenly.
|
||||
|
||||
Read more about
|
||||
http://www.regular-expressions.info/catastrophic.html[pathological regular
|
||||
https://www.regular-expressions.info/catastrophic.html[pathological regular
|
||||
expressions and how to avoid them].
|
||||
====
|
||||
|
||||
|
@ -108,7 +108,7 @@ in each token. Defaults to `true`.
|
|||
`pattern`::
|
||||
(Required, string)
|
||||
Regular expression, written in
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java's
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java's
|
||||
regular expression syntax]. The filter replaces token substrings matching this
|
||||
pattern with the substring in the `replacement` parameter.
|
||||
|
||||
|
@ -157,4 +157,4 @@ PUT /my-index-000001
|
|||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
----
|
||||
|
|
|
@ -125,7 +125,7 @@ Basque::
|
|||
http://snowball.tartarus.org/algorithms/basque/stemmer.html[*`basque`*]
|
||||
|
||||
Bengali::
|
||||
http://www.tandfonline.com/doi/abs/10.1080/02564602.1993.11437284[*`bengali`*]
|
||||
https://www.tandfonline.com/doi/abs/10.1080/02564602.1993.11437284[*`bengali`*]
|
||||
|
||||
Brazilian Portuguese::
|
||||
{lucene-analysis-docs}/br/BrazilianStemmer.html[*`brazilian`*]
|
||||
|
@ -137,7 +137,7 @@ Catalan::
|
|||
http://snowball.tartarus.org/algorithms/catalan/stemmer.html[*`catalan`*]
|
||||
|
||||
Czech::
|
||||
http://portal.acm.org/citation.cfm?id=1598600[*`czech`*]
|
||||
https://dl.acm.org/doi/10.1016/j.ipm.2009.06.001[*`czech`*]
|
||||
|
||||
Danish::
|
||||
http://snowball.tartarus.org/algorithms/danish/stemmer.html[*`danish`*]
|
||||
|
@ -148,9 +148,9 @@ http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html[`dutch_kp`]
|
|||
|
||||
English::
|
||||
http://snowball.tartarus.org/algorithms/porter/stemmer.html[*`english`*],
|
||||
http://ciir.cs.umass.edu/pubfiles/ir-35.pdf[`light_english`],
|
||||
https://ciir.cs.umass.edu/pubfiles/ir-35.pdf[`light_english`],
|
||||
http://snowball.tartarus.org/algorithms/lovins/stemmer.html[`lovins`],
|
||||
http://www.researchgate.net/publication/220433848_How_effective_is_suffixing[`minimal_english`],
|
||||
https://www.researchgate.net/publication/220433848_How_effective_is_suffixing[`minimal_english`],
|
||||
http://snowball.tartarus.org/algorithms/english/stemmer.html[`porter2`],
|
||||
{lucene-analysis-docs}/en/EnglishPossessiveFilter.html[`possessive_english`]
|
||||
|
||||
|
@ -162,29 +162,29 @@ http://snowball.tartarus.org/algorithms/finnish/stemmer.html[*`finnish`*],
|
|||
http://clef.isti.cnr.it/2003/WN_web/22.pdf[`light_finnish`]
|
||||
|
||||
French::
|
||||
http://dl.acm.org/citation.cfm?id=1141523[*`light_french`*],
|
||||
https://dl.acm.org/citation.cfm?id=1141523[*`light_french`*],
|
||||
http://snowball.tartarus.org/algorithms/french/stemmer.html[`french`],
|
||||
http://dl.acm.org/citation.cfm?id=318984[`minimal_french`]
|
||||
https://dl.acm.org/citation.cfm?id=318984[`minimal_french`]
|
||||
|
||||
Galician::
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[*`galician`*],
|
||||
http://bvg.udc.es/recursos_lingua/stemming.jsp[`minimal_galician`] (Plural step only)
|
||||
|
||||
German::
|
||||
http://dl.acm.org/citation.cfm?id=1141523[*`light_german`*],
|
||||
https://dl.acm.org/citation.cfm?id=1141523[*`light_german`*],
|
||||
http://snowball.tartarus.org/algorithms/german/stemmer.html[`german`],
|
||||
http://snowball.tartarus.org/algorithms/german2/stemmer.html[`german2`],
|
||||
http://members.unine.ch/jacques.savoy/clef/morpho.pdf[`minimal_german`]
|
||||
|
||||
Greek::
|
||||
http://sais.se/mthprize/2007/ntais2007.pdf[*`greek`*]
|
||||
https://sais.se/mthprize/2007/ntais2007.pdf[*`greek`*]
|
||||
|
||||
Hindi::
|
||||
http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf[*`hindi`*]
|
||||
|
||||
Hungarian::
|
||||
http://snowball.tartarus.org/algorithms/hungarian/stemmer.html[*`hungarian`*],
|
||||
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[`light_hungarian`]
|
||||
https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[`light_hungarian`]
|
||||
|
||||
Indonesian::
|
||||
http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf[*`indonesian`*]
|
||||
|
@ -193,7 +193,7 @@ Irish::
|
|||
http://snowball.tartarus.org/otherapps/oregan/intro.html[*`irish`*]
|
||||
|
||||
Italian::
|
||||
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[*`light_italian`*],
|
||||
https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[*`light_italian`*],
|
||||
http://snowball.tartarus.org/algorithms/italian/stemmer.html[`italian`]
|
||||
|
||||
Kurdish (Sorani)::
|
||||
|
@ -203,7 +203,7 @@ Latvian::
|
|||
{lucene-analysis-docs}/lv/LatvianStemmer.html[*`latvian`*]
|
||||
|
||||
Lithuanian::
|
||||
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/analysis/common/src/java/org/apache/lucene/analysis/lt/stem_ISO_8859_1.sbl?view=markup[*`lithuanian`*]
|
||||
https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_5_3/lucene/analysis/common/src/java/org/apache/lucene/analysis/lt/stem_ISO_8859_1.sbl?view=markup[*`lithuanian`*]
|
||||
|
||||
Norwegian (Bokmål)::
|
||||
http://snowball.tartarus.org/algorithms/norwegian/stemmer.html[*`norwegian`*],
|
||||
|
@ -215,20 +215,20 @@ Norwegian (Nynorsk)::
|
|||
{lucene-analysis-docs}/no/NorwegianMinimalStemmer.html[`minimal_nynorsk`]
|
||||
|
||||
Portuguese::
|
||||
http://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[*`light_portuguese`*],
|
||||
https://dl.acm.org/citation.cfm?id=1141523&dl=ACM&coll=DL&CFID=179095584&CFTOKEN=80067181[*`light_portuguese`*],
|
||||
pass:macros[http://www.inf.ufrgs.br/~buriol/papers/Orengo_CLEF07.pdf[`minimal_portuguese`\]],
|
||||
http://snowball.tartarus.org/algorithms/portuguese/stemmer.html[`portuguese`],
|
||||
http://www.inf.ufrgs.br/\~viviane/rslp/index.htm[`portuguese_rslp`]
|
||||
https://www.inf.ufrgs.br/\~viviane/rslp/index.htm[`portuguese_rslp`]
|
||||
|
||||
Romanian::
|
||||
http://snowball.tartarus.org/algorithms/romanian/stemmer.html[*`romanian`*]
|
||||
|
||||
Russian::
|
||||
http://snowball.tartarus.org/algorithms/russian/stemmer.html[*`russian`*],
|
||||
http://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf[`light_russian`]
|
||||
https://doc.rero.ch/lm.php?url=1000%2C43%2C4%2C20091209094227-CA%2FDolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf[`light_russian`]
|
||||
|
||||
Spanish::
|
||||
http://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[*`light_spanish`*],
|
||||
https://www.ercim.eu/publication/ws-proceedings/CLEF2/savoy.pdf[*`light_spanish`*],
|
||||
http://snowball.tartarus.org/algorithms/spanish/stemmer.html[`spanish`]
|
||||
|
||||
Swedish::
|
||||
|
|
|
@ -145,7 +145,7 @@ However, it is recommended to define large synonyms set in a file using
|
|||
[discrete]
|
||||
==== WordNet synonyms
|
||||
|
||||
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
|
||||
Synonyms based on https://wordnet.princeton.edu/[WordNet] format can be
|
||||
declared using `format`:
|
||||
|
||||
[source,console]
|
||||
|
|
|
@ -136,7 +136,7 @@ However, it is recommended to define large synonyms set in a file using
|
|||
[discrete]
|
||||
==== WordNet synonyms
|
||||
|
||||
Synonyms based on http://wordnet.princeton.edu/[WordNet] format can be
|
||||
Synonyms based on https://wordnet.princeton.edu/[WordNet] format can be
|
||||
declared using `format`:
|
||||
|
||||
[source,console]
|
||||
|
|
|
@ -371,7 +371,7 @@ $ => DIGIT
|
|||
|
||||
# in some cases you might not want to split on ZWJ
|
||||
# this also tests the case where we need a bigger byte[]
|
||||
# see http://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
# see https://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
\\u200D => ALPHANUM
|
||||
----
|
||||
|
||||
|
|
|
@ -320,7 +320,7 @@ $ => DIGIT
|
|||
|
||||
# in some cases you might not want to split on ZWJ
|
||||
# this also tests the case where we need a bigger byte[]
|
||||
# see http://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
# see https://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
\\u200D => ALPHANUM
|
||||
----
|
||||
|
||||
|
@ -379,4 +379,4 @@ PUT /my-index-000001
|
|||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
----
|
||||
|
|
|
@ -16,12 +16,12 @@ non-word characters.
|
|||
========================================
|
||||
|
||||
The pattern tokenizer uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
Read more about https://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
|
@ -107,11 +107,11 @@ The `pattern` tokenizer accepts the following parameters:
|
|||
[horizontal]
|
||||
`pattern`::
|
||||
|
||||
A http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression], defaults to `\W+`.
|
||||
A https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java regular expression], defaults to `\W+`.
|
||||
|
||||
`flags`::
|
||||
|
||||
Java regular expression http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Java regular expression https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary[flags].
|
||||
Flags should be pipe-separated, eg `"CASE_INSENSITIVE|COMMENTS"`.
|
||||
|
||||
`group`::
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
The `standard` tokenizer provides grammar based tokenization (based on the
|
||||
Unicode Text Segmentation algorithm, as specified in
|
||||
http://unicode.org/reports/tr29/[Unicode Standard Annex #29]) and works well
|
||||
https://unicode.org/reports/tr29/[Unicode Standard Annex #29]) and works well
|
||||
for most languages.
|
||||
|
||||
[discrete]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue