elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 10:23:41 -04:00

Author	SHA1	Message	Date
David Roberts	8cf1fdcd05	[ML] Make ml_standard tokenizer the default for new categorization jobs (#73605 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Backport of #72805	2021-06-02 07:04:16 +01:00
Lisa Cawley	58e9bb6ca6	[DOCS] Add runtime_mappings to update datafeed API in HLRC (#71772 ) (#72110 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2021-04-22 09:52:31 -07:00
István Zoltán Szabó	591e93397a	[DOCS] Removes beta labels from DFA related docs. (#70808 ) (#70902 )	2021-03-26 10:25:36 +01:00
James Rodewig	302341a526	[DOCS] Replace `put` with `create or update` in API names (#70330 ) (#70421 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-15 17:16:13 -04:00
Benjamin Trent	12e2cc8176	[7.x] [ML][HLRC] adds put and delete trained model alias APIs to rest high-level client (#69214 ) (#69297 ) * [ML][HLRC] adds put and delete trained model alias APIs to rest high-level client (#69214) adds put (and reassign) and delete trained model alias APIs to the rest high-level client. This adds some serialization objects and request wrappers.	2021-02-22 07:36:34 -05:00
Dimitris Athanasiou	98c69cedce	[7.x][ML] Add runtime mappings to data frame analytics source config … (#69284 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056 Backport of #69183	2021-02-19 20:17:06 +02:00
Valeriy Khakhutskyy	4bbd31a268	[7.x][ML] Add early stopping DFA configuration parameter (#68271 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue. Backport of #68099.	2021-02-01 14:11:06 +01:00
Dimitris Athanasiou	9e55623c29	[7.x][ML] Expand regression/classification hyperparameters (#67950 ) (#67983 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance Backport of #67950	2021-01-26 15:48:13 +02:00
Benjamin Trent	a324055310	[7.x] [ML] move find file structure finder in Rest high Level client to its new endpoint and plugin (#67290 ) (#67510 ) * [ML] move find file structure finder in Rest high Level client to its new endpoint and plugin (#67290) Find file structure finder is now its own plugin, and separated from the ml plugin. This commit updates the rest high level client to reflect this. Additionally, this adjusts the internal and client object names from `FileStructure` to the more general `TextStructure`	2021-01-14 09:59:34 -05:00
David Kyle	5fec2538ca	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) (#66007 ) For the changes in #65606	2020-12-08 11:04:21 +00:00
Benjamin Trent	39f5f39dc2	[7.x] [ML] add new snapshot upgrader API for upgrading older snapshots (#64665 ) (#65010 ) * [ML] add new snapshot upgrader API for upgrading older snapshots (#64665) This new API provides a way for users to upgrade their own anomaly job model snapshots. To upgrade a snapshot the following is done: - Open a native process given the job id and the desired snapshot id - load the snapshot to the process - write the snapshot again from the native task (now updated via the native process) relates #64154	2020-11-17 11:30:47 -05:00
István Zoltán Szabó	b822e582c3	[DOCS] Changes experimental flag to beta in DFA related docs (#63992 ) (#64176 )	2020-10-26 18:04:21 +01:00
Benjamin Trent	b9dc522cb4	[7.x] [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899 )(#63092 ) (#63177 ) * [ML] adding for_export flag for ml plugin GET resource APIs (#63092) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes https://github.com/elastic/elasticsearch/issues/63055 * [ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899) When exporting and cloning ml configurations in a cluster it can be frustrating to remove all the fields that were generated by the plugin. Especially as the number of these fields change from version to version. This flag, exclude_generated, allows the GET config APIs to return configurations with these generated fields removed. APIs supporting this flag: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) relates to #63055	2020-10-20 12:42:52 -04:00
Przemysław Witek	bb7df2eb5f	[ML] Allow setting num_top_classes to a special value -1 (#63587 ) (#63601 )	2020-10-13 14:00:12 +02:00
Przemysław Witek	a97bd5b787	[7.x] [ML] Validate that AucRoc has the data necessary to be calculated (#63302 ) (#63453 )	2020-10-08 09:31:45 +02:00
Lisa Cawley	8f76c89cd3	[7.x][DOCS] Add feature_importance_baseline to get trained model API (#63279 ) (#63336 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-10-06 10:08:34 -07:00
Lisa Cawley	4de6104dae	[DOCS] Fix titles for ML APIs (#63152 ) (#63207 )	2020-10-02 14:01:01 -07:00
Lisa Cawley	57ea5d27ae	[DOCS] Add experimental tag to data frame analytics APIs (#63153 )	2020-10-02 09:44:40 -07:00
Benjamin Trent	cfcf973259	[7.x] [ML] renames /inference apis to /trained_models (#63097 ) (#63136 ) * [ML] renames /inference apis to /trained_models (#63097) This commit renames all `inference` CRUD APIs to `trained_models`. This aligns with internal terminology, documentation, and use-cases.	2020-10-02 07:34:28 -04:00
Przemysław Witek	d677a2b8ee	[7.x] [ML] Implement AucRoc metric for classification - HLRC (#62304 ) (#63058 )	2020-09-30 14:04:10 +02:00
Benjamin Trent	e163559e4c	[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922 ) (#62620 ) * [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true. * fixing test * Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java	2020-09-18 10:07:35 -04:00
Lisa Cawley	bc5eec8205	[DOCS] Fix capitalization in HLRC ML APIs (#62010 ) (#62012 )	2020-09-04 16:57:15 -07:00
Benjamin Trent	1ae2923632	[7.x] [ML] adding docs + hlrc for data frame analysis feature_processors (#61149 ) (#61493 ) * [ML] adding docs + hlrc for data frame analysis feature_processors (#61149) Adds HLRC and some docs for the new feature_processors field in Data frame analytics. Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-08-24 12:56:21 -04:00
James Rodewig	60876a0e32	[DOCS] Replace Wikipedia links with attribute (#61171 ) (#61209 )	2020-08-17 11:27:04 -04:00
Przemysław Witek	283a1f605c	Rename binary_soft_classification evaluation to outlier_detection (#59951 ) (#59970 )	2020-07-21 15:15:04 +02:00
Dimitris Athanasiou	b2243337d8	[7.x][ML] Data frame analytics max_num_threads setting (#59254 ) (#59308 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU. Backport of #59254 and #57274	2020-07-09 19:15:46 +03:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
David Kyle	39020f3900	HLRC for delete expired data by job Id (#57722 ) (#57975 ) High level rest client changes for #57337	2020-06-12 09:44:17 +01:00
Dimitris Athanasiou	f49a14ce6f	[7.x][ML] Fix race condition when force stopping DF analytics job (#57680 ) (#57717 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests. Backport of #57680	2020-06-05 17:50:01 +03:00
Benjamin Trent	35d5126cea	[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351 ) (#57368 ) * [ML] adds new for_export flag to GET _ml/inference API (#57351) Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API. This flag is useful for moving models between clusters.	2020-05-29 14:01:08 -04:00
Benjamin Trent	c8374dc9f3	[ML] add max_model_memory parameter to forecast request (#57254 ) (#57355 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 11:16:08 -04:00
Benjamin Trent	297f864884	[ML] relax throttling on expired data cleanup (#56711 ) (#56895 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 08:46:42 -04:00
Dimitris Athanasiou	75dadb7a6d	[7.x][ML] Add loss_function to regression (#56118 ) (#56187 ) Adds parameters `loss_function` and `loss_function_parameter` to regression. Backport of #56118	2020-05-05 14:59:51 +03:00
David Roberts	da5aeb8be7	[ML] Return assigned node in start/open job/datafeed response (#55570 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Backport of #55473	2020-04-22 12:06:53 +01:00
Benjamin Trent	4a1610265f	[7.x] [ML] add new inference_config field to trained model config (#54421 ) (#54647 ) * [ML] add new inference_config field to trained model config (#54421) A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. The inference processor can still override whatever is set as the default in the trained model config. * fixing for backport	2020-04-02 12:25:10 -04:00
David Roberts	7667004b20	[ML] Add a model memory estimation endpoint for anomaly detection (#54129 ) A new endpoint for estimating anomaly detection job model memory requirements: POST _ml/anomaly_detectors/estimate_model_memory Backport of #53507	2020-03-24 22:55:11 +00:00
Tom Veasey	690099553c	[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-13 17:35:51 +00:00
Benjamin Trent	2a5c181dda	[ML][Inference] don't return inflated definition when storing trained models (#52573 ) (#52580 ) When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition. These definitions can be large and returning the inflated definition causes undo work on the server and client side. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-20 19:47:29 -05:00
Benjamin Trent	76660a5a4f	[7.x] [ML][Inference] add tags url param to GET (#51330 ) (#51404 ) * [ML][Inference] add tags url param to GET (#51330) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 08:26:58 -05:00
Dimitris Athanasiou	1d8cb3c741	[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) (#50976 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method. Backport of #50914	2020-01-14 16:46:09 +02:00
Benjamin Trent	fa116a6d26	[7.x] [ML][Inference] PUT API (#50852 ) (#50887 ) * [ML][Inference] PUT API (#50852) This adds the `PUT` API for creating trained models that support our format. This includes * HLRC change for the API * API creation * Validations of model format and call * fixing backport	2020-01-12 10:59:11 -05:00
Dimitris Athanasiou	ca0828ba07	[7.x][ML] Implement force deleting a data frame analytics job (#50553 ) (#50589 ) Adds a `force` parameter to the delete data frame analytics request. When `force` is `true`, the action force-stops the jobs and then proceeds to the deletion. This can be used in order to delete a non-stopped job with a single request. Closes #48124 Backport of #50553	2020-01-03 13:46:02 +02:00
Przemysław Witek	cc4bc797f9	[7.x] Implement `precision` and `recall` metrics for classification evaluation (#49671 ) (#50378 )	2019-12-19 18:55:05 +01:00
Dimitris Athanasiou	8891f4db88	[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990 ) (#50023 ) This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training. Backport of #49990	2019-12-10 15:29:19 +02:00
Dimitris Athanasiou	4edb2e7bb6	[7.x][ML] Add optional source filtering during data frame reindexing (#49690 ) (#49718 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531 Backport of #49690	2019-11-29 16:10:44 +02:00
Benjamin Trent	b5d7c939f8	[7.x] [ML][Inference][HLRC] add GET _stats (#49562 ) (#49600 ) * [ML][Inference][HLRC] add GET _stats (#49562) * fixing for backport	2019-11-26 11:28:26 -05:00
Benjamin Trent	26a8ca00db	[7.x] [ML][Inference][HLRC] Delete trained model API (#49567 ) (#49585 ) * [ML][Inference][HLRC] Delete trained model API (#49567) * fixing for backport	2019-11-26 08:27:08 -05:00
Dimitris Athanasiou	8eaee7cbdc	[7.x][ML] Explain data frame analytics API (#49455 ) (#49504 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455	2019-11-22 22:06:10 +02:00

1 2 3

120 commits