elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 01:44:36 -04:00

Author	SHA1	Message	Date
David Kyle	94adaa55c0	[ML] Merge the pytorch-inference feature branch (#73660 ) The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713	2021-06-03 12:43:06 +01:00
David Roberts	0059c59e25	[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Closes elastic/ml-cpp#1724	2021-06-01 15:11:32 +01:00
Lisa Cawley	52c88a763e	[DOCS] Add runtime_mappings to update datafeed API in HLRC (#71772 ) Co-authored-by: David Kyle <david.kyle@elastic.co>	2021-04-22 08:22:13 -07:00
István Zoltán Szabó	9a8c6fb66f	[DOCS] Removes beta labels from DFA related docs. (#70808 )	2021-03-26 09:46:41 +01:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
Benjamin Trent	2ee6dc37b6	[ML][HLRC] adds put and delete trained model alias APIs to rest high-level client (#69214 ) adds put (and reassign) and delete trained model alias APIs to the rest high-level client. This adds some serialization objects and request wrappers.	2021-02-19 14:18:26 -05:00
Dimitris Athanasiou	7fb98c0d3c	[ML] Add runtime mappings to data frame analytics source config (#69183 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056	2021-02-19 16:29:19 +02:00
Valeriy Khakhutskyy	78368428b3	[ML] Add early stopping DFA configuration parameter (#68099 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.	2021-02-01 11:41:28 +01:00
Dimitris Athanasiou	5c961c1c81	[ML] Expand regression/classification hyperparameters (#67950 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance	2021-01-26 12:56:41 +02:00
Benjamin Trent	5cf569ffff	[ML] move find file structure finder in Rest high Level client to its new endpoint and plugin (#67290 ) Find file structure finder is now its own plugin, and separated from the ml plugin. This commit updates the rest high level client to reflect this. Additionally, this adjusts the internal and client object names from `FileStructure` to the more general `TextStructure`	2021-01-14 08:16:52 -05:00
David Kyle	22dadfd407	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) For the changes in #65606	2020-12-08 10:06:58 +00:00
Benjamin Trent	33de89d94c	[ML] add new snapshot upgrader API for upgrading older snapshots (#64665 ) This new API provides a way for users to upgrade their own anomaly job model snapshots. To upgrade a snapshot the following is done: - Open a native process given the job id and the desired snapshot id - load the snapshot to the process - write the snapshot again from the native task (now updated via the native process) relates #64154	2020-11-12 10:45:56 -05:00
István Zoltán Szabó	6093518f4a	[DOCS] Changes experimental flag to beta in DFA related docs (#63992 )	2020-10-26 17:02:46 +01:00
Przemysław Witek	d9e7d88f08	[ML] Allow setting num_top_classes to a special value -1 (#63587 )	2020-10-13 13:14:17 +02:00
Przemysław Witek	b0019bd0a6	[ML] Validate that AucRoc has the data necessary to be calculated (#63302 )	2020-10-08 08:19:43 +02:00
Lisa Cawley	49ab8f8688	[DOCS] Add feature_importance_baseline to get trained model API (#63279 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-10-06 07:56:55 -07:00
Lisa Cawley	51f9bf657d	[DOCS] Fix titles for ML APIs (#63152 )	2020-10-02 11:53:49 -07:00
Lisa Cawley	b325772498	[DOCS] Add experimental tag to data frame analytics APIs (#63153 )	2020-10-02 09:42:57 -07:00
Benjamin Trent	7bd6e78dae	[ML] adding for_export flag for ml plugin GET resource APIs (#63092 ) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes https://github.com/elastic/elasticsearch/issues/63055	2020-10-02 08:29:19 -04:00
Benjamin Trent	1084aaf18a	[ML] renames /inference apis to /trained_models (#63097 ) This commit renames all `inference` CRUD APIs to `trained_models`. This aligns with internal terminology, documentation, and use-cases.	2020-10-01 12:13:49 -04:00
Przemysław Witek	a9e54a2d9e	[ML] Implement AucRoc metric for classification - HLRC (#62304 )	2020-09-30 10:53:45 +02:00
Benjamin Trent	fdb7b6d3b5	[ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922 ) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true.	2020-09-18 07:11:38 -04:00
Lisa Cawley	8290b6216e	[DOCS] Fix capitalization in HLRC ML APIs (#62010 )	2020-09-04 13:40:02 -07:00
Benjamin Trent	1b34c88d56	[ML] adding docs + hlrc for data frame analysis feature_processors (#61149 ) Adds HLRC and some docs for the new feature_processors field in Data frame analytics. Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-08-24 12:00:44 -04:00
James Rodewig	a94e5cb7c4	[DOCS] Replace Wikipedia links with attribute (#61171 )	2020-08-17 09:44:24 -04:00
Przemysław Witek	2a12dcf2e0	Rename binary_soft_classification evaluation to outlier_detection (#59951 )	2020-07-21 14:27:57 +02:00
Dimitris Athanasiou	da0249f6c2	[ML] Data frame analytics max_num_threads setting (#59254 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU.	2020-07-09 16:31:26 +03:00
Przemysław Witek	38aa474dec	Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 )	2020-07-01 13:29:56 +02:00
Przemysław Witek	dfa06240fc	Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 )	2020-06-30 13:06:15 +02:00
Przemysław Witek	3953de4c98	Introduce DataFrameAnalyticsConfig update API (#58302 )	2020-06-29 09:26:31 +02:00
David Kyle	bc1883b582	HLRC for delete expired data by job Id (#57722 ) High level rest client changes for #57337	2020-06-11 10:07:31 +01:00
Dimitris Athanasiou	e116ac850f	[ML] Fix race condition when force stopping DF analytics job (#57680 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests.	2020-06-05 12:13:02 +03:00
Benjamin Trent	251b17009a	[ML] adds new for_export flag to GET _ml/inference API (#57351 ) Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API. This flag is useful for moving models between clusters.	2020-05-29 12:29:28 -04:00
Benjamin Trent	ec67787a2e	[ML] add max_model_memory parameter to forecast request (#57254 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 08:59:50 -04:00
Benjamin Trent	8fed077b0a	[ML] relax throttling on expired data cleanup (#56711 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 07:21:06 -04:00
Dimitris Athanasiou	6bf3834059	[ML] Add loss_function to regression (#56118 ) Adds parameters `loss_function` and `loss_function_parameter` to regression.	2020-05-05 12:36:05 +03:00
David Roberts	8906e76079	[ML] Return assigned node in start/open job/datafeed response (#55473 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Fixes #54067	2020-04-22 08:44:57 +01:00
Benjamin Trent	4e1ff31c3c	[ML] add new inference_config field to trained model config (#54421 ) A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. The inference processor can still override whatever is set as the default in the trained model config.	2020-04-02 10:34:17 -04:00
David Roberts	8ee770560a	[ML] Add a model memory estimation endpoint for anomaly detection (#53507 ) A new endpoint for estimating anomaly detection job model memory requirements: POST _ml/anomaly_detectors/estimate_model_memory Closes #53219	2020-03-24 21:38:19 +00:00
Tom Veasey	58340c2dbe	[ML] Adds the class_assignment_objective parameter to classification (#52763 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-12 18:39:29 +00:00
Benjamin Trent	1c1d45130c	[ML][Inference] don't return inflated definition when storing trained models (#52573 ) When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition. These definitions can be large and returning the inflated definition causes undo work on the server and client side.	2020-02-20 11:25:34 -05:00
Benjamin Trent	c9e285c1e6	[ML][Inference] add tags url param to GET (#51330 ) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 07:30:56 -05:00
Dimitris Athanasiou	4d2be9bd32	[ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method.	2020-01-14 15:01:47 +02:00
Benjamin Trent	4cecb7a5be	[ML][Inference] PUT API (#50852 ) This adds the `PUT` API for creating trained models that support our format. This includes * HLRC change for the API * API creation * Validations of model format and call	2020-01-11 16:02:56 -05:00
Dimitris Athanasiou	af0ce426cc	[ML] Implement force deleting a data frame analytics job (#50553 ) Adds a `force` parameter to the delete data frame analytics request. When `force` is `true`, the action force-stops the jobs and then proceeds to the deletion. This can be used in order to delete a non-stopped job with a single request. Closes #48124	2020-01-03 12:01:41 +02:00
Przemysław Witek	786ead630a	Implement `precision` and `recall` metrics for classification evaluation (#49671 )	2019-12-19 16:07:09 +01:00
Dimitris Athanasiou	269425b54d	[ML] Introduce randomize_seed setting for regression and classification (#49990 ) This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training.	2019-12-10 10:22:53 +02:00
Dimitris Athanasiou	bad07b76f7	[ML] Add optional source filtering during data frame reindexing (#49690 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531	2019-11-29 14:20:31 +02:00
Benjamin Trent	ba914453be	[ML][Inference][HLRC] add GET _stats (#49562 )	2019-11-26 09:26:31 -05:00
Benjamin Trent	fc7df300a2	[ML][Inference][HLRC] Delete trained model API (#49567 )	2019-11-26 07:13:02 -05:00

1 2 3

121 commits