mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
[DOCS] Replace Wikipedia links with attribute (#61171)
This commit is contained in:
parent
3b44274373
commit
a94e5cb7c4
78 changed files with 164 additions and 164 deletions
|
@ -8,7 +8,7 @@ tokens, it also records the following:
|
|||
* The `positionLength`, the number of positions that a token spans
|
||||
|
||||
Using these, you can create a
|
||||
https://en.wikipedia.org/wiki/Directed_acyclic_graph[directed acyclic graph],
|
||||
{wikipedia}/Directed_acyclic_graph[directed acyclic graph],
|
||||
called a _token graph_, for a stream. In a token graph, each position represents
|
||||
a node. Each token represents an edge or arc, pointing to the next position.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>CJK bigram</titleabbrev>
|
||||
++++
|
||||
|
||||
Forms https://en.wikipedia.org/wiki/Bigram[bigrams] out of CJK (Chinese,
|
||||
Forms {wikipedia}/Bigram[bigrams] out of CJK (Chinese,
|
||||
Japanese, and Korean) tokens.
|
||||
|
||||
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
|
||||
|
@ -161,7 +161,7 @@ All non-CJK input is passed through unmodified.
|
|||
`output_unigrams`
|
||||
(Optional, boolean)
|
||||
If `true`, emit tokens in both bigram and
|
||||
https://en.wikipedia.org/wiki/N-gram[unigram] form. If `false`, a CJK character
|
||||
{wikipedia}/N-gram[unigram] form. If `false`, a CJK character
|
||||
is output in unigram form when it has no adjacent characters. Defaults to
|
||||
`false`.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>Common grams</titleabbrev>
|
||||
++++
|
||||
|
||||
Generates https://en.wikipedia.org/wiki/Bigram[bigrams] for a specified set of
|
||||
Generates {wikipedia}/Bigram[bigrams] for a specified set of
|
||||
common words.
|
||||
|
||||
For example, you can specify `is` and `the` as common words. This filter then
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>Edge n-gram</titleabbrev>
|
||||
++++
|
||||
|
||||
Forms an https://en.wikipedia.org/wiki/N-gram[n-gram] of a specified length from
|
||||
Forms an {wikipedia}/N-gram[n-gram] of a specified length from
|
||||
the beginning of a token.
|
||||
|
||||
For example, you can use the `edge_ngram` token filter to change `quick` to
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>Elision</titleabbrev>
|
||||
++++
|
||||
|
||||
Removes specified https://en.wikipedia.org/wiki/Elision[elisions] from
|
||||
Removes specified {wikipedia}/Elision[elisions] from
|
||||
the beginning of tokens. For example, you can use this filter to change
|
||||
`l'avion` to `avion`.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>MinHash</titleabbrev>
|
||||
++++
|
||||
|
||||
Uses the https://en.wikipedia.org/wiki/MinHash[MinHash] technique to produce a
|
||||
Uses the {wikipedia}/MinHash[MinHash] technique to produce a
|
||||
signature for a token stream. You can use MinHash signatures to estimate the
|
||||
similarity of documents. See <<analysis-minhash-tokenfilter-similarity-search>>.
|
||||
|
||||
|
@ -95,8 +95,8 @@ locality sensitive hashing (LSH).
|
|||
|
||||
Depending on what constitutes the similarity between documents,
|
||||
various LSH functions https://arxiv.org/abs/1408.2927[have been proposed].
|
||||
For https://en.wikipedia.org/wiki/Jaccard_index[Jaccard similarity], a popular
|
||||
LSH function is https://en.wikipedia.org/wiki/MinHash[MinHash].
|
||||
For {wikipedia}/Jaccard_index[Jaccard similarity], a popular
|
||||
LSH function is {wikipedia}/MinHash[MinHash].
|
||||
A general idea of the way MinHash produces a signature for a document
|
||||
is by applying a random permutation over the whole index vocabulary (random
|
||||
numbering for the vocabulary), and recording the minimum value for this permutation
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>N-gram</titleabbrev>
|
||||
++++
|
||||
|
||||
Forms https://en.wikipedia.org/wiki/N-gram[n-grams] of specified lengths from
|
||||
Forms {wikipedia}/N-gram[n-grams] of specified lengths from
|
||||
a token.
|
||||
|
||||
For example, you can use the `ngram` token filter to change `fox` to
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>Shingle</titleabbrev>
|
||||
++++
|
||||
|
||||
Add shingles, or word https://en.wikipedia.org/wiki/N-gram[n-grams], to a token
|
||||
Add shingles, or word {wikipedia}/N-gram[n-grams], to a token
|
||||
stream by concatenating adjacent tokens. By default, the `shingle` token filter
|
||||
outputs two-word shingles and unigrams.
|
||||
|
||||
|
|
|
@ -4,7 +4,7 @@
|
|||
<titleabbrev>Stop</titleabbrev>
|
||||
++++
|
||||
|
||||
Removes https://en.wikipedia.org/wiki/Stop_words[stop words] from a token
|
||||
Removes {wikipedia}/Stop_words[stop words] from a token
|
||||
stream.
|
||||
|
||||
When not customized, the filter removes the following English stop words by
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
The `edge_ngram` tokenizer first breaks text down into words whenever it
|
||||
encounters one of a list of specified characters, then it emits
|
||||
https://en.wikipedia.org/wiki/N-gram[N-grams] of each word where the start of
|
||||
{wikipedia}/N-gram[N-grams] of each word where the start of
|
||||
the N-gram is anchored to the beginning of the word.
|
||||
|
||||
Edge N-Grams are useful for _search-as-you-type_ queries.
|
||||
|
|
|
@ -6,7 +6,7 @@
|
|||
|
||||
The `ngram` tokenizer first breaks text down into words whenever it encounters
|
||||
one of a list of specified characters, then it emits
|
||||
https://en.wikipedia.org/wiki/N-gram[N-grams] of each word of the specified
|
||||
{wikipedia}/N-gram[N-grams] of each word of the specified
|
||||
length.
|
||||
|
||||
N-grams are like a sliding window that moves across the word - a continuous
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue