Commit graph

301 commits

Author SHA1 Message Date
James Rodewig
6dbdf879b2 [DOCS] Correct Lucene link in kstrem token filter docs 2020-04-29 09:28:05 -04:00
James Rodewig
77a35c641d
[DOCS] Reformat kstem token filter (#55823)
Makes the following changes to the `kstem` token filter docs:

* Rewrite description and adds a Lucene work
* Adds detailed analyze example
* Adds an analyzer example
2020-04-29 08:27:30 -04:00
Amit Khandelwal
9e41feda86
Expose preserve_original in edge_ngram token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:22:59 +02:00
James Rodewig
d67a1b47e4 [DOCS] Correct stemmer token filters anchor 2020-04-27 14:56:25 -04:00
James Rodewig
f08b3c93cb [DOCS] Correct stemmer token filter anchor 2020-04-27 14:49:19 -04:00
James Rodewig
bb9dbcb4c8
[DOCS] Reformat stemmer token filter (#55693)
Makes the following changes to the `stemmer` token filter docs:

* Adds detailed analyze example
* Rewrites parameter definitions
* Adds custom analyzer example
* Adds a `language` value for the `estonian` stemmer
* Reorders the `language` values to show recommended algorithms first,
  followed by other values alphabetically
2020-04-24 11:08:55 -04:00
James Rodewig
1c4e60e86d
[DOCS] Add stemming concept docs (#55156)
Adds conceptual documentation for stemming, including:

* An overview of why stemming is helpful in search
* Algorithmic vs. dictionary stemming
* Token filters used to control stemming, such as `stemmer_override`, `keyword_marker`, and `conditional`
2020-04-24 10:41:50 -04:00
James Rodewig
24160366b8
[DOCS] Reformat flatten_graph token filter (#54268)
* [DOCS] Reformat `flatten_graph` token filter

Makes the following changes to the `flatten_graph` token filter docs:

* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Adds analyzer example
2020-04-16 08:34:15 -04:00
James Rodewig
e867dfabff
[DOCS] Add token filter reference docs template (#52290)
Creates a reusable template for token filter reference documentation.

Contributors can make a copy of this template and customize it when
documenting new token filters.
2020-04-10 08:44:17 -04:00
markharwood
d83798f237
Add pre-configured “lowercase” normalizer (#53882)
Add pre-configured “lowercase” normalizer
Includes tests that user-defined "lowercase" normalizer overrides the default one.

Closes #53872
2020-04-03 10:12:06 +01:00
James Rodewig
28cfb8ca69
[DOCS] Reformat keyword_repeat token filter (#54428) 2020-04-01 11:37:25 -04:00
James Rodewig
ba89f7096c [DOCS] Add missing word to keyword marker token filter docs 2020-03-30 10:45:55 -04:00
James Rodewig
40067d04dd [DOCS] Add missing "the" to keyword tokenizer docs 2020-03-30 08:53:55 -04:00
jureaky
4fe8ad357c
[DOCS] Add a lowercase email example to keyword tokenizer docs (#53257) 2020-03-30 08:35:55 -04:00
James Rodewig
4f503bf9df
[DOCS] Reformat keyword_marker token filter (#54076)
Makes the following changes to the `keyword_marker` token filter docs:

* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Rewrites parameter definitions
* Adds custom analyzer and filter example
2020-03-25 09:01:30 -04:00
James Rodewig
0a35f3900d [DOCS] Remove double space in WDG docs 2020-03-23 17:15:37 -04:00
James Rodewig
747a164fae [DOCS] Fix "letter case" typo
Changes "lettercase" to "letter case" in the `uppercase` token filter
docs.
2020-03-23 17:11:39 -04:00
lgypro
7a1502db6c [Docs] Fix typo in _analyze api docs (#53837) 2020-03-20 11:45:31 +01:00
James Rodewig
8d5478f56c
[DOCS] Add token graph concept docs (#53339)
Adds conceptual docs for token graphs.
These docs cover:

* How a token graph is constructed from a token stream
* How synonyms and multi-position tokens impact token graphs
* How token graphs are used during search
* Why some token filters produce invalid token graphs

Also makes the following supporting changes:
* Adds anchors to the 'Anatomy of an Analyzer' docs for cross-linking
* Adds several SVGs for token graph diagrams
2020-03-19 07:42:26 -04:00
James Rodewig
3a39ed0055
[DOCS] Remove light_bengali stemmer (#53697)
Only the `bengali` stemmer is available in Lucene and surfaced through
Elasticsearch. This removes the incorrect `light_bengali` link in our
docs.
2020-03-18 08:33:20 -04:00
James Rodewig
e8ed337b2a
[DOCS] Reformat remove_duplicates token filter (#53608)
Makes the following changes to the `remove_duplicates` token filter
docs:

* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Adds custom analyzer example
2020-03-16 11:21:20 -04:00
Jim Ferenczi
9ad0597617
Removes old Lucene's experimental flag from analyzer documentations (#53217)
This change removes the Lucene's experimental flag from the documentations of the following
tokenizer/filters:
  * Simple Pattern Split Tokenizer
  * Simple Pattern tokenizer
  * Flatten Graph Token Filter
  * Word Delimiter Graph Token Filter

The flag is still present in Lucene codebase but we're fully supporting these tokenizers/filters
in ES for a long time now so the docs flag is misleading.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-03-12 21:17:11 +01:00
James Rodewig
d16fe48312
[DOCS] Reformat word_delimiter token filter (#53387)
Makes the following changes to the `word_delimiter` token filter docs:

* Adds a warning admonition recommending the `word_delimiter_graph`
  filter instead. This warning includes a link to the deprecated Lucene
  `WordDelimiterFilter`.
* Updates the description
* Adds detailed analyze snippet
* Adds custom analyzer and custom filter snippets
* Reorganizes and updates parameter documentation
2020-03-11 08:44:44 -04:00
James Rodewig
377539e055
[DOCS] Use keyword tokenizer in word delimiter graph examples (#53384)
In a tip admonition, we recommend using the `keyword` tokenizer with the
`word_delimiter_graph` token filter. However, we only use the
`whitespace` tokenizer in the example snippets. This updates those
snippets to use the `keyword` tokenizer instead.

Also corrects several spacing issues for arrays in these docs.
2020-03-11 04:45:26 -04:00
James Rodewig
0089805b68 [DOCS] Correct anchor in word delimiter graph token filter docs 2020-03-10 10:32:00 -04:00
James Rodewig
1c8ab01ee6
[DOCS] Reformat word_delimiter_graph token filter (#53170)
Makes the following changes to the `word_delimiter_graph` token filter
docs:

* Updates the Lucene experimental admonition.
* Updates description
* Adds analyze snippet
* Adds custom analyzer and custom filter snippets
* Reorganizes and updates parameter list
* Expands and updates section re: differences between `word_delimiter`
  and `word_delimiter_graph`
2020-03-09 06:27:41 -04:00
James Rodewig
10f9a8fd64
[DOCS] Note that trim filter doesn't change offsets (#53220)
The [word delimiter graph token filter docs][0] note that the `trim`
filter changes the length of tokens without changing their offsets.

This explicitly mentions that in the `trim` filter docs.

[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/analysis-word-delimiter-graph-tokenfilter.html
2020-03-06 07:27:14 -05:00
James Rodewig
9f641dc07d
[DOCS] Fix several Asciidoctor double arrow replacements (#52827)
Per the [Asciidoctor docs][0], Asciidoctor replaces the following
syntax with double arrows in the rendered HTML:

* => renders as ⇒
* <= renders as ⇐

This escapes several unintended replacements, such as in the Painless
docs.

Where appropriate, it also replaces some double arrow instances with
single arrows for consistency.

[0]: https://asciidoctor.org/docs/user-manual/#replacements
2020-03-04 08:42:37 -05:00
James Rodewig
e016864b7d
[DOCS] Reformat stop token filter (#53059)
Makes the following changes to the `stop` token filter docs:

* Updates description
* Adds a link to the related Lucene filter
* Adds detailed analyze snippet
* Updates custom analyzer and custom filter snippets
* Adds a list of predefined stop words by language

Co-authored-by: ScottieL <36999642+ScottieL@users.noreply.github.com>
2020-03-03 13:05:12 -05:00
James Rodewig
996ec0def7
[DOCS] Reformat trim token filter docs (#51649)
Makes the following changes to the `trim` token filter docs:

* Updates description
* Adds a link to the related Lucene filter
* Adds tip about removing whitespace using tokenizers
* Adds detailed analyze snippets
* Adds custom analyzer snippet
2020-03-02 07:47:38 -05:00
rhymes
74b9878f69 [DOCS] Fix typo in index and search analysis docs (#52988) 2020-03-02 07:22:50 -05:00
debadair
e1c6ced949
[DOCS] Fixed typo in jump link. (#52302) 2020-02-12 17:52:11 -08:00
James Rodewig
a7ebddd2f2
[DOCS] Add attribute for Lucene analysis links (#51687)
Adds a `lucene-analysis-docs` attribute for the Lucene `/analysis/`
javadocs directory. This should prevent typos and keep the docs DRY.
2020-01-30 11:22:30 -05:00
James Rodewig
3c28a10b85
[DOCS] Rewrite analysis intro (#51184)
* [DOCS] Rewrite analysis intro. Move index/search analysis content.

* Rewrites 'Text analysis' page intro as high-level definition.
  Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
  * Conceptual content -> 'Index and search analysis' under 'Concepts'
  * Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
  and when not.
* Adds new example snippets for specifying search analyzers

* clarifications

* Add toc. Decrement headings.

* Reword 'When to configure' section

* Remove sentence from tip
2020-01-30 09:19:53 -05:00
James Rodewig
c99a0e9a5e
[DOCS] Reformat unique token filter docs (#50748)
* Updates the description
* Adds analyze, custom analyzer, and custom filter snippets
* Adds parameter documentation
2020-01-28 10:33:45 -05:00
James Rodewig
0189d29c53
[DOCS] Add response snippets to 'Testing analyzers' page (#51427)
Adds response snippets to the `POST _analyze` snippets in the 'Testing
analyzers' page.

Co-authored-by: Emmanuel DEMEY <demey.emmanuel@gmail.com>
2020-01-27 08:41:05 -05:00
James Rodewig
0fa6ac0fb9
[DOCS] Add tutorials section to analysis topic (#50809)
Adds a 'Configure text analysis' page to house tutorial content for the
analysis topic.

Also relocates the following pages as children as this new page:

* 'Test an analyzer'
* 'Configuring built-in analyzers'
* 'Create a custom analyzer'

I plan to add a tutorial for specifying index-time and search-time
analyzers to this section as part of a future PR.
2020-01-16 13:11:42 -05:00
James Rodewig
0605eb2078
[DOCS] Add concepts section to analysis topic (#50801)
This helps the topic better match the structure of
our machine learning docs, e.g.
https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html

This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts'
child page, but I plan to add other concepts, such as 'Index time vs.
search time', with later PRs.
2020-01-16 13:00:04 -05:00
James Rodewig
8f06f94d9b
[DOCS] Retitle analysis reference pages (#51071)
* Changes titles to sentence case.

* Appends pages with 'reference' to differentiate their content from
  conceptual overviews.

* Moves the 'Normalizers' page to end of the Analysis topic pages.
2020-01-16 12:27:54 -05:00
PND
e16d1e5725 [Docs] Fix example output of edge n-gram token filter. (#51085) 2020-01-16 11:34:23 +01:00
James Rodewig
14185fbf79 [DOCS] Add section ID to analysis overview page 2020-01-08 14:43:05 -06:00
James Rodewig
495ce1add0
[DOCS] Add overview page to analysis topic (#50515)
Adds a 'text analysis overview' page to the analysis topic docs.

The goals of this page are:

* Concisely summarize the analysis process while avoiding in-depth concepts, tutorials, or API examples
* Explain why analysis is important, largely through highlighting problems with full-text searches missing analysis
* Highlight how analysis can be used to improve search results
2020-01-08 12:53:08 -06:00
James Rodewig
b0ffc60b80
[DOCS] Reformat reverse token filter docs (#50672)
* Updates the description and adds a Lucene link
* Adds analyze and custom analyzer snippets
2020-01-07 10:54:16 -06:00
James Rodewig
2bc37ea4e9
[DOCS] Reformat truncate token filter docs (#50687)
* Updates the description and adds a Lucene link
* Adds analyze, custom analyzer, and custom filter snippets
* Adds parameter documentation
2020-01-07 10:32:54 -06:00
James Rodewig
90e139e252
[DOCS] Reformat uppercase token filter docs (#50555)
* Updates the description and adds a Lucene link
* Adds analyze and custom analyzer snippets
2020-01-03 08:34:11 -05:00
James Rodewig
18ee52a5b2
[DOCS] Abbreviate token filter titles (#50511) 2019-12-27 11:00:51 -05:00
Xiang Dai
432bd0e92c Fix docs typos (#50365)
Fixes a few typos in the docs.

Signed-off-by: Xiang Dai 764524258@qq.com
2019-12-23 10:35:14 -05:00
James Rodewig
9907b0aab8
[DOCS] Reformat token count limit filter docs (#49835) 2019-12-13 08:43:35 -05:00
James Rodewig
4dfc07c922
[DOCS] Reformat lowercase token filter docs (#49935) 2019-12-12 09:39:06 -05:00
James Rodewig
e964a97005
[DOCS] Reformat length token filter docs (#49805)
* Adds a title abbreviation
* Updates the description and adds a Lucene link
* Reformats the parameters section
* Adds analyze, custom analyzer, and custom filter snippets

Relates to #44726.
2019-12-04 09:58:19 -05:00