elasticsearch/docs/java-rest/high-level
David Roberts 0059c59e25
[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Closes elastic/ml-cpp#1724
2021-06-01 15:11:32 +01:00
..
asyncsearch [Docs] Add HLRC Async Search API documentation (#54353) 2020-03-30 15:32:45 +02:00
ccr [DOCS] Replace put with create or update in API names (#70330) 2021-03-15 14:49:44 -04:00
cluster Enroll node API (#72129) 2021-05-12 08:45:02 +03:00
document [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
enrich [DOCS] Fix create enrich policy API title (#71494) 2021-04-08 15:35:53 -04:00
graph [DOCS] Updating heading for consistency. (#47619) 2019-10-14 15:59:42 -07:00
ilm [DOCS] Replace put with create or update in API names (#70330) 2021-03-15 14:49:44 -04:00
indices [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
ingest [DOCS] Replace put with create or update in API names (#70330) 2021-03-15 14:49:44 -04:00
licensing [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
migration Remove Migration Upgrade and Assistance APIs (#40075) 2019-03-15 15:34:50 -06:00
miscellaneous HLRC: Convert xpack methods to client side objects (#40705) 2019-04-04 09:49:12 -05:00
ml [ML] Make ml_standard tokenizer the default for new categorization jobs (#72805) 2021-06-01 15:11:32 +01:00
rollup [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
script [DOCS] Fix typos in HLRC delete stored script API (#70897) 2021-03-26 12:08:27 -04:00
search Add point in time to HLRC (#72167) 2021-05-12 17:59:25 -04:00
searchable_snapshots [DOCS] Rename mount types for searchable snapshots (#72699) 2021-05-05 16:35:33 -04:00
security [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
snapshot [DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
tasks Broadcast cancellation to only nodes have outstanding child tasks (#54312) 2020-04-01 11:22:13 -04:00
textstructure [DOCS] Remove experimental tag from find structure API (#68153) 2021-01-28 12:50:14 -08:00
transform [DOCS] Replace put with create or update in API names (#70330) 2021-03-15 14:49:44 -04:00
watcher [DOCS] Replace put with create or update in API names (#70330) 2021-03-15 14:49:44 -04:00
aggs-builders.asciidoc Fix broken links to aggregation javadoc (#59083) 2020-07-09 16:52:29 +01:00
execution-no-req.asciidoc [DOCS] Fixes terms in HLRC data frame transform APIs (#44838) 2019-07-25 09:13:26 -07:00
execution.asciidoc [DOCS] Fixes terms in HLRC data frame transform APIs (#44838) 2019-07-25 09:13:26 -07:00
getting-started.asciidoc [DOCS] Fix "the the" typos (#64344) 2020-10-29 10:11:58 -04:00
index.asciidoc [DOCS] Remove added admons (#69452) 2021-02-23 10:35:21 -05:00
java-builders.asciidoc Docs: Cut down on high level rest client copy-and-paste-ness (#34125) 2018-09-28 14:48:11 -04:00
migration.asciidoc Docs: Pin two IDs in the rest client (#40785) 2019-04-04 12:02:55 -04:00
query-builders.asciidoc Search - added HLRC support for PinnedQueryBuilder (#45779) 2019-08-22 16:32:42 +01:00
supported-apis.asciidoc Add Searchable Snapshots Cache Stats API to HLRC (#71858) 2021-04-20 13:35:21 +02:00