elasticsearch/docs/java-rest/high-level/ml
David Roberts 8cf1fdcd05
[ML] Make ml_standard tokenizer the default for new categorization jobs (#73605)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Backport of #72805
2021-06-02 07:04:16 +01:00
..
close-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-calendar-event.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-calendar-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-calendar.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
delete-datafeed.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-expired-data.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-filter.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-forecast.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-model-snapshot.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
delete-trained-model-alias.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
delete-trained-models.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
estimate-model-memory.asciidoc [ML] Add a model memory estimation endpoint for anomaly detection (#54129) 2020-03-24 22:55:11 +00:00
evaluate-data-frame.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
explain-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
flush-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
forecast-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
get-buckets.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-calendar-events.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
get-calendars.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
get-categories.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-data-frame-analytics-stats.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
get-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
get-datafeed-stats.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-datafeed.asciidoc [7.x] [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)(#63092) (#63177) 2020-10-20 12:42:52 -04:00
get-filters.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-influencers.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-info.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
get-job-stats.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
get-job.asciidoc [7.x] [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)(#63092) (#63177) 2020-10-20 12:42:52 -04:00
get-model-snapshots.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-overall-buckets.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-records.asciidoc [DOCS] Adds missing icons to ML HLRC APIs (#46515) 2019-09-10 08:28:02 -07:00
get-trained-models-stats.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
get-trained-models.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
open-job.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
post-calendar-event.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
post-data.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
preview-datafeed.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
put-calendar-job.asciidoc [DOCS] Replace put with create or update in API names (#70330) (#70421) 2021-03-15 17:16:13 -04:00
put-calendar.asciidoc [DOCS] Replace put with create or update in API names (#70330) (#70421) 2021-03-15 17:16:13 -04:00
put-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
put-datafeed.asciidoc [DOCS] Add runtime_mappings to update datafeed API in HLRC (#71772) (#72110) 2021-04-22 09:52:31 -07:00
put-filter.asciidoc [DOCS] Replace put with create or update in API names (#70330) (#70421) 2021-03-15 17:16:13 -04:00
put-job.asciidoc [DOCS] Replace put with create or update in API names (#70330) (#70421) 2021-03-15 17:16:13 -04:00
put-trained-model-alias.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
put-trained-model.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
revert-model-snapshot.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
set-upgrade-mode.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
start-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
start-datafeed.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
stop-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
stop-datafeed.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
update-data-frame-analytics.asciidoc [DOCS] Removes beta labels from DFA related docs. (#70808) (#70902) 2021-03-26 10:25:36 +01:00
update-datafeed.asciidoc [DOCS] Add runtime_mappings to update datafeed API in HLRC (#71772) (#72110) 2021-04-22 09:52:31 -07:00
update-filter.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
update-job.asciidoc [ML] Make ml_standard tokenizer the default for new categorization jobs (#73605) 2021-06-02 07:04:16 +01:00
update-model-snapshot.asciidoc [DOCS] Fix titles for ML APIs (#63152) (#63207) 2020-10-02 14:01:01 -07:00
upgrade-job-model-snapshot.asciidoc [7.x] [ML] add new snapshot upgrader API for upgrading older snapshots (#64665) (#65010) 2020-11-17 11:30:47 -05:00