elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 09:28:55 -04:00

History

David Roberts 0059c59e25 [ML] Make ml_standard tokenizer the default for new categorization jobs (#72805 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Closes elastic/ml-cpp#1724		2021-06-01 15:11:32 +01:00
..
high-level	[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805 )	2021-06-01 15:11:32 +01:00
low-level	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
index.asciidoc	[DOCS] Update data frame transform URLs (#46940 )	2019-09-20 13:26:57 -07:00
license.asciidoc	[DOCS] Bump copyright to 2019 for Java HLRC license (#50206 )	2019-12-30 15:38:59 -05:00
overview.asciidoc	[DOCS] restructure java clients docs pages (#25517 )	2017-07-04 10:58:57 +02:00
redirects.asciidoc	[DOCS] Update data frame transform URLs (#46940 )	2019-09-20 13:26:57 -07:00