[DOCS] Replace Wikipedia links with attribute (#61171)

This commit is contained in:
James Rodewig 2020-08-17 09:44:24 -04:00 committed by GitHub
parent 3b44274373
commit a94e5cb7c4
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
78 changed files with 164 additions and 164 deletions

View file

@ -8,7 +8,7 @@ tokens, it also records the following:
* The `positionLength`, the number of positions that a token spans
Using these, you can create a
https://en.wikipedia.org/wiki/Directed_acyclic_graph[directed acyclic graph],
{wikipedia}/Directed_acyclic_graph[directed acyclic graph],
called a _token graph_, for a stream. In a token graph, each position represents
a node. Each token represents an edge or arc, pointing to the next position.

View file

@ -4,7 +4,7 @@
<titleabbrev>CJK bigram</titleabbrev>
++++
Forms https://en.wikipedia.org/wiki/Bigram[bigrams] out of CJK (Chinese,
Forms {wikipedia}/Bigram[bigrams] out of CJK (Chinese,
Japanese, and Korean) tokens.
This filter is included in {es}'s built-in <<cjk-analyzer,CJK language
@ -161,7 +161,7 @@ All non-CJK input is passed through unmodified.
`output_unigrams`
(Optional, boolean)
If `true`, emit tokens in both bigram and
https://en.wikipedia.org/wiki/N-gram[unigram] form. If `false`, a CJK character
{wikipedia}/N-gram[unigram] form. If `false`, a CJK character
is output in unigram form when it has no adjacent characters. Defaults to
`false`.

View file

@ -4,7 +4,7 @@
<titleabbrev>Common grams</titleabbrev>
++++
Generates https://en.wikipedia.org/wiki/Bigram[bigrams] for a specified set of
Generates {wikipedia}/Bigram[bigrams] for a specified set of
common words.
For example, you can specify `is` and `the` as common words. This filter then

View file

@ -4,7 +4,7 @@
<titleabbrev>Edge n-gram</titleabbrev>
++++
Forms an https://en.wikipedia.org/wiki/N-gram[n-gram] of a specified length from
Forms an {wikipedia}/N-gram[n-gram] of a specified length from
the beginning of a token.
For example, you can use the `edge_ngram` token filter to change `quick` to

View file

@ -4,7 +4,7 @@
<titleabbrev>Elision</titleabbrev>
++++
Removes specified https://en.wikipedia.org/wiki/Elision[elisions] from
Removes specified {wikipedia}/Elision[elisions] from
the beginning of tokens. For example, you can use this filter to change
`l'avion` to `avion`.

View file

@ -4,7 +4,7 @@
<titleabbrev>MinHash</titleabbrev>
++++
Uses the https://en.wikipedia.org/wiki/MinHash[MinHash] technique to produce a
Uses the {wikipedia}/MinHash[MinHash] technique to produce a
signature for a token stream. You can use MinHash signatures to estimate the
similarity of documents. See <<analysis-minhash-tokenfilter-similarity-search>>.
@ -95,8 +95,8 @@ locality sensitive hashing (LSH).
Depending on what constitutes the similarity between documents,
various LSH functions https://arxiv.org/abs/1408.2927[have been proposed].
For https://en.wikipedia.org/wiki/Jaccard_index[Jaccard similarity], a popular
LSH function is https://en.wikipedia.org/wiki/MinHash[MinHash].
For {wikipedia}/Jaccard_index[Jaccard similarity], a popular
LSH function is {wikipedia}/MinHash[MinHash].
A general idea of the way MinHash produces a signature for a document
is by applying a random permutation over the whole index vocabulary (random
numbering for the vocabulary), and recording the minimum value for this permutation

View file

@ -4,7 +4,7 @@
<titleabbrev>N-gram</titleabbrev>
++++
Forms https://en.wikipedia.org/wiki/N-gram[n-grams] of specified lengths from
Forms {wikipedia}/N-gram[n-grams] of specified lengths from
a token.
For example, you can use the `ngram` token filter to change `fox` to

View file

@ -4,7 +4,7 @@
<titleabbrev>Shingle</titleabbrev>
++++
Add shingles, or word https://en.wikipedia.org/wiki/N-gram[n-grams], to a token
Add shingles, or word {wikipedia}/N-gram[n-grams], to a token
stream by concatenating adjacent tokens. By default, the `shingle` token filter
outputs two-word shingles and unigrams.

View file

@ -4,7 +4,7 @@
<titleabbrev>Stop</titleabbrev>
++++
Removes https://en.wikipedia.org/wiki/Stop_words[stop words] from a token
Removes {wikipedia}/Stop_words[stop words] from a token
stream.
When not customized, the filter removes the following English stop words by

View file

@ -6,7 +6,7 @@
The `edge_ngram` tokenizer first breaks text down into words whenever it
encounters one of a list of specified characters, then it emits
https://en.wikipedia.org/wiki/N-gram[N-grams] of each word where the start of
{wikipedia}/N-gram[N-grams] of each word where the start of
the N-gram is anchored to the beginning of the word.
Edge N-Grams are useful for _search-as-you-type_ queries.

View file

@ -6,7 +6,7 @@
The `ngram` tokenizer first breaks text down into words whenever it encounters
one of a list of specified characters, then it emits
https://en.wikipedia.org/wiki/N-gram[N-grams] of each word of the specified
{wikipedia}/N-gram[N-grams] of each word of the specified
length.
N-grams are like a sliding window that moves across the word - a continuous