Commit graph

235 commits

Author SHA1 Message Date
James Rodewig
31fc615381
[DOCS] Reformat ASCII folding token filter docs (#48143) 2019-10-23 15:06:18 -05:00
James Rodewig
a0795163a9
[DOCS] Reformat classic token filter docs (#48314) 2019-10-23 09:38:22 -05:00
James Rodewig
bb635e5a9e
[DOCS] Reformat CJK bigram and CJK width token filter docs (#48210) 2019-10-21 09:43:59 -04:00
James Rodewig
c367c5cf75
[DOCS] Reformat apostrophe token filter docs (#48076) 2019-10-16 08:50:12 -04:00
Wilder Pereira
630bfa1001 [DOCS] Remove unneeded spaces from custom analyzer snippet (#47332) 2019-10-15 15:52:52 -04:00
James Rodewig
59933abb0e
[DOCS] Sort analyzers, tokenizers, and token filters alphabetically (#48068) 2019-10-15 15:46:50 -04:00
Alan Woodward
c1f99e2d75
Remove _type from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
James Rodewig
de2c8f7231
Fixed sample code for minhash (#46385)
The sample code is wrong. Field type is required for the sample field.
I guess the intention was to give the sample field the name ```fingerprint```, mapping it as ```text``` using the custom analyzer ```my_analyzer```
2019-09-12 13:29:07 -04:00
Abhilash Bolla
b4c18b9c44 Fixed grammar in pattern replace char filter docs. (#46546)
Minor grammar fix in the pattern replace char filter docs.
2019-09-10 09:46:06 -07:00
James Rodewig
5772c1c7dd
[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) 2019-09-09 13:13:41 -04:00
James Rodewig
e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
James Rodewig
466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
James Rodewig
be7b873a43
[DOCS] Correct custom analyzer callouts (#46030) 2019-08-29 10:07:52 -04:00
MK Swanson
f47886e44a
[DOCS] Modified section headings, edited text for clarity. (#44988)
* [DOCS] Modified section headings, edited text for clarity.

* [DOCS] Modified section headings, edited text for clarity.

* [DOCS] Modified section headings, edited text for clarity.
2019-07-30 16:03:05 -04:00
James Rodewig
ea1adb61c2
[DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:16:35 -04:00
Christoph Büscher
56ee1a5e00
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-27 18:27:11 +02:00
Alan Woodward
d2c696d54b
Require [articles] setting in elision filter (#43083)
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
2019-06-27 08:56:26 +01:00
Sachin Frayne
31a37fbb00 Correct the description of generate_word_parts (#43026) 2019-06-10 11:37:34 +01:00
James Rodewig
8685a7b8d2
[DOCS] Add explicit articles_case parameter to Elision Token Filter example (#42987) 2019-06-07 11:22:32 -04:00
Mayya Sharipova
6f12eb168f Fix error with mapping in docs 2019-05-30 10:06:38 -04:00
Peter Dyson
588228816a [DOCS] path_hierarchy tokenizer examples (#39630)
Closes #17138
2019-05-30 09:19:56 -04:00
Alan Woodward
72c7910299
Improvements to docs around multiplexer and synonyms (#41645)
This commit fixes a multiplexer doc error concerning synonyms, and adds
suggestions on how to combine the two filters.
2019-05-07 09:09:28 +01:00
James Rodewig
b33b5fc122
[DOCS] Add attribute to escape minimal pt token link in Asciidoctor (#41613) 2019-04-30 14:11:24 -04:00
James Rodewig
adf67053f4
[DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:19:09 -04:00
Guilherme Ferreira
378d74be00 [Docs] Correct default stop list constant (#41342) 2019-04-23 19:14:31 +02:00
Guilherme Ferreira
17463d2be4 [Docs] Correct spelling of "_none_" (#41192) 2019-04-15 15:12:55 +02:00
Guilherme Ferreira
9f74a932eb [Docs] Correct spelling the "_none_" stopwords element (#41191) 2019-04-15 14:17:53 +02:00
Christoph Büscher
5be4827a78
Correct indention in synonym docs (#40711)
The stopword filter should be on the same level as the synonym filter in the
example request. Correcting this for better readability.
2019-04-02 01:43:02 +02:00
Mayya Sharipova
aad93977f5 Correct errors in min_hash filter documentation
Related to #39671
2019-03-08 16:16:03 -05:00
Mayya Sharipova
5b852fa184
Add documentation for min_hash filter (#39671)
* Add documentation for min_hash filter

Closes #20757
2019-03-07 08:47:32 -05:00
jimczi
89b80c64ee fix typo in synonym graph filter docs 2019-03-05 18:18:45 +01:00
Jim Ferenczi
f3e8d66ffb
Remove beta marker from the synonym_graph docs (#38185) 2019-02-19 10:47:59 +01:00
Christoph Büscher
7bb2da197d
Remove nGram and edgeNGram token filter names (#38911)
In #30209 we deprecated the camel case `nGram` filter name in favour of `ngram` and
did the same for `edgeNGram` and `edge_ngram`. Using these names has been deprecated
since 6.4 and is issuing deprecation warnings since then.
I think we can remove these filters in 8.0. In a backport of this PR I would change what was a
dreprecation warning from 6.4. to an error starting with new indices created in 7.0.
2019-02-15 20:15:05 +01:00
Mayya Sharipova
da63ee5252
Correct rebuilt persian analyzer (#38724)
Make substitution of \u200C with a space explicit

The problem with this symbol `\u200C` in a test string, 
that **SHOULD** be substituted with space in the rebuilt Persian analyzer, but it is not.

Correcting this line `"mappings": [ "\\u200C=> "] <1>` to
 `"mappings": [ "\\u200C=>\\u0020"] <1>` in solves the problem.
This change explicitly says to substitute ZWNJ with a space.

Closes #38188
2019-02-11 10:46:18 -05:00
Christoph Büscher
34f2d2ec91
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 15:13:52 +01:00
Christoph Büscher
3a96608b3f
Remove more include_type_name and types from docs (#37601) 2019-01-18 14:11:18 +01:00
Christoph Büscher
25aac4f77f
Remove include_type_name in asciidoc where possible (#37568)
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
2019-01-18 09:34:11 +01:00
Julie Tibshirani
36a3b84fc9
Update the default for include_type_name to false. (#37285)
* Default include_type_name to false for get and put mappings.

* Default include_type_name to false for get field mappings.

* Add a constant for the default include_type_name value.

* Default include_type_name to false for get and put index templates.

* Default include_type_name to false for create index.

* Update create index calls in REST documentation to use include_type_name=true.

* Some minor clean-ups around the get index API.

* In REST tests, use include_type_name=true by default for index creation.

* Make sure to use 'expression == false'.

* Clarify the different IndexTemplateMetaData toXContent methods.

* Fix FullClusterRestartIT#testSnapshotRestore.

* Fix the ml_anomalies_default_mappings test.

* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.

We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.

This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.

* Fix more REST tests.

* Improve some wording in the create index documentation.

* Add a note about types removal in the create index docs.

* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.

* Make sure to mention include_type_name in the REST docs for affected APIs.

* Make sure to use 'expression == false' in FullClusterRestartIT.

* Mention include_type_name in the REST templates docs.
2019-01-14 13:08:01 -08:00
Josh Soref
edb48321ba [DOCS] Various spelling corrections (#37046) 2019-01-07 14:44:12 +01:00
Christoph Büscher
132ccbec2f
[Docs] Extend common-grams-tokenfilter doctest example (#36807)
Adding an example output using the "_analyze" API and expected response.
2018-12-19 09:49:23 +01:00
Christoph Büscher
41feaf137c
[Docs] Fix error in Common Grams Token Filter (#36774)
The first example given is missing the two single-token cases for "is" and "a".
The later usage example is slightly wrong in that custom analyzers should
go under `settings.analysis.analyzer`.
2018-12-18 16:54:06 +01:00
Alan Woodward
af57575838
Allow word_delimiter_graph_filter to not adjust internal offsets (#36699)
This commit adds an adjust_offsets parameter to the word_delimiter_graph token filter, defaulting
to true. Most of the time you'd want sub-tokens emitted by this filter to have offsets that are
adjusted to their real position in the token stream; however, some token filters can change the 
length or starting position of a token (eg trim) without changing their offset attributes, and this 
can lead to word_delimiter_graph emitting illegal offsets. Setting adjust_offsets to false in these 
cases will allow indexing again.

Fixes #34741, #33710
2018-12-18 13:20:51 +00:00
Jim Ferenczi
18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Alan Woodward
a646f85a99
Ensure TokenFilters only produce single tokens when parsing synonyms (#34331)
A number of tokenfilters can produce multiple tokens at the same position.  This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.

This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are 
generated.  In indexes created in elasticsearch 6.x deprecation warnings are emitted in place 
of the error. 

* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors

Fixes #34298
2018-11-29 10:35:38 +00:00
Alan Woodward
26cc8ff8c3
Add pointer to the index-phrases option in shingle filter docs (#35771)
We should be discouraging the use of shingle filters and instead pointing users to the
index-phrases parameter on text fields.
2018-11-21 15:27:11 +00:00
Alan Woodward
f6a43b5939
Add a prebuilt ICU Analyzer (#34958)
The ICU plugin provides the building blocks of an analysis chain, but doesn't actually have a prebuilt analyzer. It would be a better for users if there was a simple analyzer that they could use out of the box, and also something we can point to from the CJK Analyzer docs as a superior alternative.

Relates to #34285
2018-11-21 09:00:48 +00:00
Julie Tibshirani
f854330e06
Make sure to use the type _doc in the REST documentation. (#34662)
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
2018-10-22 11:54:04 -07:00
Christoph Büscher
e869f9a78c [Docs] Update synonym-tokenfilter.asciidoc (#34706)
Remove ugly double-dot.
2018-10-22 17:18:29 +02:00
Jim Ferenczi
a9daa5cb90
[DOCS] Remove beta label from normalizers (#34326) 2018-10-05 15:42:00 +02:00
Nikolay Vasiliev
16956a1a05 [DOCS] Clarify 'type' parameter meaning for custom analyzer (#34012)
This pull request improves the docs on the meaning of type parameter on the custom 
analyzer doc page. 

Closes #33456
2018-09-25 15:32:27 +02:00