elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 18:33:26 -04:00

Author	SHA1	Message	Date
David Roberts	6a20678517	[ML] Correct the update datafeed docs (#92227 ) (#92229 ) These docs previously implied that you could update datafeed properties while the datafeed was running, but then would have to stop and restart it for the changes to take effect. In fact datafeed updates can only be made while the datafeed is stopped (and this has been the case for many years, if not forever).	2022-12-08 05:13:24 -05:00
Lisa Cawley	2fb3c46ad4	[DOCS] Add missing HTML anchors to CCR and ML (#80287 ) (#83182 ) Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>	2022-01-26 11:39:38 -08:00
David Roberts	0d403365bd	[7.16] [ML] Model snapshot upgrade needs a stats endpoint (#81706 ) * [7.16] [ML] Model snapshot upgrade needs a stats endpoint Previously the ML model snapshot upgrade endpoint did not provide a way to reliably monitor progress. This could lead to the upgrade assistant UI thinking that a model snapshot upgrade had finished when it actually hadn't. This change adds a new "stats" API that allows external interested parties to find out the status of each model snapshot upgrade and which node (if any) each is running on. Backport of #81641 * Fixing compilation	2021-12-14 04:50:07 -05:00
Lisa Cawley	376c499f95	[DOCS] Fixes query parameters for get buckets API (#80643 ) (#80917 )	2021-11-22 12:18:10 -08:00
Lisa Cawley	1c855f5ed4	[DOCS] Adds missing query parameters to ML APIs (#80863 ) (#80910 )	2021-11-22 10:09:50 -08:00
Lisa Cawley	ad2b02bcee	[DOCS] Adds missing query parameters in get influencer and get snapshot APIs (#80801 ) (#80844 )	2021-11-18 08:52:42 -08:00
Lisa Cawley	061b6fd0e6	[DOCS] Add query parameters to update datafeed API (#80777 ) (#80798 )	2021-11-17 08:12:15 -08:00
Lisa Cawley	0a97f7440e	[DOCS] Clarify parameters in delete expired data, forecast, and flush job APIs (#80517 ) (#80569 )	2021-11-09 15:20:43 -08:00
Lisa Cawley	416d5bb8e6	[DOCS] Edits stop and start datafeed APIs (#80461 ) (#80567 )	2021-11-09 15:08:52 -08:00
James Rodewig	07ac8818b6	[DOCS] Remove `testenv` annotations from doc snippet tests (#80023 ) (#80458 ) Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible. Relates to #79309, #31619 # Conflicts: # docs/reference/ml/df-analytics/apis/get-trained-model-deployment-stats.asciidoc # docs/reference/ml/df-analytics/apis/infer-trained-model-deployment.asciidoc # docs/reference/ml/df-analytics/apis/put-trained-model-definition-part.asciidoc # docs/reference/ml/df-analytics/apis/put-trained-model-vocabulary.asciidoc # docs/reference/ml/df-analytics/apis/start-trained-model-deployment.asciidoc # docs/reference/ml/df-analytics/apis/stop-trained-model-deployment.asciidoc # docs/reference/slm/apis/slm-delete.asciidoc # docs/reference/slm/apis/slm-execute-retention.asciidoc # docs/reference/slm/apis/slm-execute.asciidoc # docs/reference/slm/apis/slm-get-status.asciidoc # docs/reference/slm/apis/slm-get.asciidoc # docs/reference/slm/apis/slm-start.asciidoc # docs/reference/slm/apis/slm-stats.asciidoc # docs/reference/slm/apis/slm-stop.asciidoc # docs/reference/sql/endpoints/client-apps/tableau-desktop.asciidoc # docs/reference/sql/endpoints/client-apps/tableau-server.asciidoc	2021-11-05 19:41:54 -04:00
István Zoltán Szabó	f124a986a3	[7.16] [DOCS] Adds missing query params to GET category and GET influencer APIs (#79448 ) (#80430 )	2021-11-05 17:27:31 +01:00
Lisa Cawley	8b7f8ee5e2	[DOCS] Adds deprecated allow_no_jobs and allow_no_datafeeds ML API properties (#80163 )	2021-11-02 08:30:24 -07:00
Lisa Cawley	01c557d639	[DOCS] Fixes typo in preview datafeed API (#79863 ) (#79879 )	2021-10-26 18:24:15 -07:00
Lisa Cawley	7beedaf7e1	[DOCS] Fixes typo in calendar API example (#78867 ) (#78868 )	2021-10-07 18:04:46 -07:00
Lisa Cawley	6695c7ceca	[DOCS] Fixes ML get calendars API (#78808 ) (#78854 )	2021-10-07 14:17:00 -07:00
Lisa Cawley	2dc4ee3413	[DOCS] Fixes ML get scheduled events API (#78809 ) (#78843 )	2021-10-07 10:39:03 -07:00
Benjamin Trent	d3b68b32dc	[ML] add new default char filter `first_line_with_letters` for machine learning categorization (#77457 ) (#77503 ) The char filter replaces the previous default of `first_non_blank_line`. `first_non_blank_line` worked well to figure out what line had characters at all, but log lines like the following were handled poorly: ``` -------------------------------------------------------------------------------- Alias 'foo' already exists and this prevents setting up ILM for logs -------------------------------------------------------------------------------- ``` When combined with the `ml_standard` tokenizer, the first line was used: ``` -------------------------------------------------------------------------------- ``` This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer. The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true). Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens: ``` "tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"] ```	2021-09-09 11:36:55 -04:00
Lisa Cawley	40f72fd75c	[DOCS] Update datafeed details in ML docs (#76854 ) (#76948 )	2021-08-25 15:15:40 -07:00
David Roberts	c70ba3c768	[ML] Use results retention time for deleting system annotations (#76113 ) * [ML] Use results retention time for deleting system annotations In #75617 a new setting, system_annotations_retention_days, was added to control how long system annotations are retained for. We now feel that this setting is redundant and that system annotations should be retained for the same period as results. This is intuitive and defensible, as system annotations can be considered a type of result. Backport of #76096 * Fix one more merge clash	2021-08-04 13:53:06 -04:00
David Roberts	17581d1232	[ML] Deleting a job now deletes the datafeed if necessary (#76064 ) Previously attempting to delete a job that had a datafeed would return an exception. However, this was unnecessarily pedantic - the user would always want to delete both job and datafeed together, and would react by deleting the datafeed and then subsequently deleting the job again. This change makes the delete job API automatically delete a datafeed associated with the job. The same level of force is used for this delete datafeed request as was used on the delete job request. This means that it's possible to force-delete an open job with a started datafeed (since force-delete datafeed will automatically stop a started datafeed). It's still not possible to delete an opened job without using force. Backport of #76010	2021-08-04 05:14:54 -04:00
Ed Savage	582b634117	[7.x][ML] Add 'model_prune_window' field to AD job config (#75741 ) (#75999 ) Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/#1962 Backports #75741	2021-08-03 11:57:36 +01:00
Przemysław Witek	de732a4432	[7.x] [ML] Delete expired annotations (#75617 ) (#75841 )	2021-07-29 17:03:35 +02:00
Lisa Cawley	6d821421bb	[DOCS] Fixes nesting of datafeed config in APIs (#75502 ) (#75545 )	2021-07-20 12:02:44 -07:00
István Zoltán Szabó	c9299e1f65	[DOCS] Adds peak_model_bytes and assignment_memory_basis to GET model snapshot API docs (#75413 ) (#75425 )	2021-07-18 07:43:18 +02:00
Benjamin Trent	d251874910	[7.x] [ML] Add datafeed_config field to anomaly detection job configs (#75262 ) This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed. The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.	2021-07-12 14:57:38 -04:00
Lisa Cawley	e99d91df2d	[DOCS] Add memory limit details in update job API (#74517 ) (#74570 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-24 09:08:09 -07:00
David Roberts	59c55d1c63	[ML] Closing an anomaly detection job now automatically stops its datafeed if necessary (#74416 ) Previously it was a requirement of the close job API that if the job had an associated datafeed that that datafeed was stopped before the job could be closed. Experience has shown that this is just a pedantic nuisance. If a user closes the job without first stopping the datafeed then it's just a mistake, and they then have to make two further calls, to stop the datafeed and then attempt to close the job again. This PR changes the behaviour so that if you ask to close a job whose datafeed is running then the datafeed gets stopped first as part of the same call. Datafeeds are stopped with the same level of force as the job close request specified. Backport of #74257	2021-06-22 17:08:36 +01:00
Dimitris Athanasiou	92f7c6250a	[7.x][ML] Reset anomaly detection job API (#73908 ) (#74093 ) Adds a new API that allows a user to reset an anomaly detection job. To use the API do: ``` POST _ml/anomaly_detectors/<job_id>_reset ``` The API removes all data associated to the job. In particular, it deletes model state, results and stats. However, job notifications and user annotations are not removed. Also, the API can be called asynchronously by setting the parameter `wait_for_completion` to `false` (defaults to `true`). When run that way the API returns the task id for further monitoring. In order to prevent the job from opening while it is resetting, a new job field has been added called `blocked`. It is an object that contains a `reason` and the `task_id`. `reason` can take a value from ["delete", "reset", "revert"] as all these operations should block the job from opening. The `task_id` is also included in order to allow tracking the task if necessary. Finally, this commit also sets the `blocked` field when the revert snapshot API is called as a job should not be opened while it is reverted to a different model snapshot. Backport of #73908	2021-06-15 10:05:40 +03:00
Benjamin Trent	43cd27d339	[ML] adding running_state to datafeed stats object (#73926 ) (#74002 ) It is useful to know the following information when reading datafeed stats: - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time - Has the datafeed processed all past data available at the time of starting. This object is only available if the datafeed task has been created. It has the form: ``` "running_state": { "is_real_time": <boolean>, "look_back_finished": <boolean> } ```	2021-06-10 11:35:27 -04:00
Lisa Cawley	59c37a1cda	[DOCS] Adds defaults to get ML results APIs (#73540 ) (#73735 )	2021-06-03 12:03:30 -07:00
David Roberts	8cf1fdcd05	[ML] Make ml_standard tokenizer the default for new categorization jobs (#73605 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Backport of #72805	2021-06-02 07:04:16 +01:00
István Zoltán Szabó	ca98fbe744	[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483 ) (#72608 )	2021-05-03 11:21:38 +02:00
Benjamin Trent	6ca6dd06f0	[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487 ) (#72549 ) This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory. If a node does not have a view into its native memory, we ignore it for assignment. This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.	2021-04-30 09:56:36 -04:00
Benjamin Trent	a41e0e2625	[ML] adding ability to update runtime_mappings via datafeed config update API (#71707 ) (#71748 ) Adds runtime_mappings as an updatable field via datafeed config update. closes: #71702	2021-04-15 11:05:52 -04:00
James Rodewig	c757f9e4e7	[DOCS] Fix double spaces (#71082 ) (#71120 )	2021-03-31 11:43:34 -04:00
Benjamin Trent	2545cc8ec4	[ML] Allow datafeed and job configs for datafeed preview API (#70836 ) (#70927 ) Previously, a datafeed and job must already exist for the `_preview` API to work. With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job without creating either of them. closes https://github.com/elastic/elasticsearch/issues/70264	2021-03-26 13:58:52 -04:00
James Rodewig	302341a526	[DOCS] Replace `put` with `create or update` in API names (#70330 ) (#70421 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-15 17:16:13 -04:00
Lisa Cawley	bf8024b9e7	[DOCS] Edits machine learning settings (#69947 ) (#70174 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-03-09 11:35:29 -08:00
Dan Hermann	b311906009	Unmute more memory-related tests after the fix in #68542	2021-02-19 08:06:50 -06:00
James Rodewig	b55249507e	[DOCS] Fix typos for duplicate words (#69125 ) (#69132 )	2021-02-17 11:16:58 -05:00
Benjamin Trent	88a1bfa516	[ML] [DOCS] update find-structure reference docs (#67586 ) (#67592 ) The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".	2021-01-15 13:22:59 -05:00
Lisa Cawley	da5662495b	[DOCS] Move find file structure to a new API endpoint (#67314 ) (#67389 )	2021-01-12 12:41:39 -08:00
Benjamin Trent	65690eef67	[7.x] [ML] move find file structure to a new API endpoint (#67123 ) (#67251 ) * [ML] move find file structure to a new API endpoint (#67123) This introduces a new `text-structure` plugin. This is the new home of the find file structure API. The old REST URL is still available but is deprecated. The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged. Changes to the high-level REST client and docs will be in separate commit. related to: https://github.com/elastic/elasticsearch/issues/67001	2021-01-11 10:39:39 -05:00
Gordon Brown	09f823b1db	Mute tests failing on Debian 8 due to memory reporting (#66648 ) See https://github.com/elastic/elasticsearch/issues/66629 for details	2020-12-18 15:15:43 -07:00
Dimitris Athanasiou	97cfc8fdf8	[7.x][ML] Add log_time to AD data_counts and decide current based on it (#66343 ) (#66384 ) This commit is fixing a potential bug if we support anomaly detection results index rollover in the future. In particular, we determine the current `data_counts` by sorting on the latest record time. However, this is not correct if the job reverts to an older model snapshot. To fix this we add `log_time` to `data_counts` (similarly to `model_size_stats`) and sort on `log_time` to figure out the current counts for the job. Backport of #66343	2020-12-16 12:27:32 +02:00
David Roberts	8284b93dcb	[ML] Deprecate anomaly detection post data endpoint (#66398 ) There is little evidence of this endpoint being used and there is quite a lot of code complexity associated with the various formats that can be used to upload data and the different errors that can occur when direct data upload is open to end users. In a future release we can make this endpoint internal so that only datafeeds can use it, and remove all the options and formats that are not used by datafeeds. End users will have to store their input data for anomaly detection in Elasticsearch indices (which we believe all do today) and use a datafeed to feed it to anomaly detection jobs. Backport of #66347	2020-12-15 21:39:47 +00:00
István Zoltán Szabó	80ffeabbbc	[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085 ) (#66089 )	2020-12-09 11:48:16 +01:00
István Zoltán Szabó	b1c11ebdbe	[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036 ) (#66084 )	2020-12-09 10:23:42 +01:00
David Kyle	5fec2538ca	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) (#66007 ) For the changes in #65606	2020-12-08 11:04:21 +00:00
David Roberts	cbb3886bbb	[ML] Adding assignment_memory_basis to model_size_stats (#65875 ) At present the Java code makes a decision on whether to use current model memory or model memory limit to calculate how much memory a job requires to be assigned. The plan is to move this decision to the C++ code, which will report it via a new field in the model size stats. An additional change will be that once we have made the switch from using model memory limit to using current model memory we will never switch back, as this causes large fluctuations up and down in memory requirement which will be much more noticeable when autoscaling is in use. Although the only two options at present are model memory limit and current model memory, the new enum includes a third possibility, peak model memory. To switch to this now would be tricky, as there have been two bugs in the implementation of peak model memory which render its value unreliable in 7.x. However, in 8.x it might make sense to switch to using peak model memory instead of current model memory and it's much easier from a BWC perspective if the enum contains all the values from the start. Backport of #65561	2020-12-04 11:34:36 +00:00

1 2 3

142 commits