elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Lisa Cawley	64af39b759	[DOCS] Add memory limit details in update job API (#74517 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-24 08:50:19 -07:00
Benjamin Trent	0303e6d733	[ML] add datafeed field to the job config (#74265 ) This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed. The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.	2021-06-23 08:06:58 -04:00
David Roberts	6e9b959450	[ML] Closing an anomaly detection job now automatically stops its datafeed if necessary (#74257 ) Previously it was a requirement of the close job API that if the job had an associated datafeed that that datafeed was stopped before the job could be closed. Experience has shown that this is just a pedantic nuisance. If a user closes the job without first stopping the datafeed then it's just a mistake, and they then have to make two further calls, to stop the datafeed and then attempt to close the job again. This PR changes the behaviour so that if you ask to close a job whose datafeed is running then the datafeed gets stopped first as part of the same call. Datafeeds are stopped with the same level of force as the job close request specified.	2021-06-22 12:56:11 +01:00
István Zoltán Szabó	2e820fcab6	[DOCS] Clarifies terminology in Performing population analysis page. (#74237 )	2021-06-18 09:03:38 +02:00
ymao1	c727b40d0b	[Docs] Update cross-document links to Kibana Alerting docs (#74034 ) * Updating cross-document links * PR fixes	2021-06-14 12:23:47 -04:00
Dimitris Athanasiou	dc61a72c9e	[ML] Reset anomaly detection job API (#73908 ) Adds a new API that allows a user to reset an anomaly detection job. To use the API do: ``` POST _ml/anomaly_detectors/<job_id>_reset ``` The API removes all data associated to the job. In particular, it deletes model state, results and stats. However, job notifications and user annotations are not removed. Also, the API can be called asynchronously by setting the parameter `wait_for_completion` to `false` (defaults to `true`). When run that way the API returns the task id for further monitoring. In order to prevent the job from opening while it is resetting, a new job field has been added called `blocked`. It is an object that contains a `reason` and the `task_id`. `reason` can take a value from ["delete", "reset", "revert"] as all these operations should block the job from opening. The `task_id` is also included in order to allow tracking the task if necessary. Finally, this commit also sets the `blocked` field when the revert snapshot API is called as a job should not be opened while it is reverted to a different model snapshot.	2021-06-14 18:56:28 +03:00
Benjamin Trent	8d882863d7	[ML] adding running_state to datafeed stats object (#73926 ) It is useful to know the following information when reading datafeed stats: - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time - Has the datafeed processed all past data available at the time of starting. This object is only available if the datafeed task has been created. It has the form: ``` "running_state": { "is_real_time": <boolean>, "look_back_finished": <boolean> } ```	2021-06-10 08:08:49 -04:00
István Zoltán Szabó	20d0dc300f	[DOCS] Updates datafeed related runtime field examples (#73725 )	2021-06-08 11:27:55 +02:00
Lisa Cawley	a6339918ac	[DOCS] Adds defaults to get ML results APIs (#73540 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-03 10:05:47 -07:00
István Zoltán Szabó	44c26c8bdc	[DOCS] Removes Kibana charts-related advise about agg interval and bucket span. (#73673 )	2021-06-02 16:47:01 +02:00
David Roberts	0059c59e25	[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Closes elastic/ml-cpp#1724	2021-06-01 15:11:32 +01:00
István Zoltán Szabó	1ce2308e2a	[DOCS] Adds max_trees hyperparameter to GET TM API docs (#72298 )	2021-05-06 08:18:19 +02:00
István Zoltán Szabó	d07c174aaf	[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483 )	2021-05-03 10:20:14 +02:00
Benjamin Trent	2ce4d175f0	[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487 ) This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory. If a node does not have a view into its native memory, we ignore it for assignment. This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.	2021-04-30 07:55:57 -04:00
István Zoltán Szabó	ce9dd74cf5	[DOCS] Expands DFA and TM API docs with required privileges info (#71335 )	2021-04-28 08:33:42 +02:00
Pierre Grimaud	3c44dfec60	[DOCS] Fix typos (#72227 )	2021-04-26 12:40:38 -04:00
István Zoltán Szabó	2f122f03b2	[DOCS] Adds anomaly detection rule advanced settings to docs (#72072 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-04-26 09:55:02 +02:00
István Zoltán Szabó	aca0a7ffa4	[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745 )	2021-04-19 13:06:50 +02:00
Benjamin Trent	01fc8ed246	[ML] adding ability to update runtime_mappings via datafeed config update API (#71707 ) Adds runtime_mappings as an updatable field via datafeed config update. closes: #71702	2021-04-15 09:44:34 -04:00
István Zoltán Szabó	ce389dff5d	[DOCS] Clarifies that custom rules are job rules in Kibana (#71678 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-04-15 09:33:03 +02:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
Benjamin Trent	c8415a7924	[ML] adding support for composite aggs in anomaly detection (#69970 ) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 08:25:40 -04:00
István Zoltán Szabó	1db2b85e45	[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978 )	2021-03-30 10:42:48 +02:00
Benjamin Trent	b796632582	[ML] Allow datafeed and job configs for datafeed preview API (#70836 ) Previously, a datafeed and job must already exist for the `_preview` API to work. With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job without creating either of them. closes https://github.com/elastic/elasticsearch/issues/70264	2021-03-26 12:52:23 -04:00
István Zoltán Szabó	9a8c6fb66f	[DOCS] Removes beta labels from DFA related docs. (#70808 )	2021-03-26 09:46:41 +01:00
István Zoltán Szabó	165c0ddaeb	[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-18 18:23:19 +01:00
Benjamin Trent	10e637d97c	[ML] allow documents to be out of order within the same time bucket (#70468 ) This commit allows documents seen within the same time bucket to be out of order. This is already supported within the native process. Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.	2021-03-17 09:34:49 -04:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
István Zoltán Szabó	59f6280a7b	[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201 )	2021-03-10 15:46:01 +01:00
Lisa Cawley	2caba7b11f	[DOCS] Edits machine learning settings (#69947 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-03-09 10:59:12 -08:00
István Zoltán Szabó	c226958947	[DOCS] Expands anomaly detection alert type docs (#70026 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>	2021-03-09 12:02:16 +01:00
Lisa Cawley	c537e5f38c	[DOCS] Edits delete trained model alias API (#70119 )	2021-03-08 17:08:58 -08:00
István Zoltán Szabó	8a7aced8e8	[DOCS] Adds beta tag to anomaly detection alert docs. (#70013 )	2021-03-08 10:46:24 +01:00
István Zoltán Szabó	2ccc81081f	[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959 )	2021-03-04 16:43:06 +01:00
Joe Gallo	1e8b5fa7c2	Remove the _ml/find-file-structure docs (#69823 )	2021-03-03 09:49:28 -05:00
Benjamin Trent	2279cafb4e	[ML] adding new _preview endpoint for data frame analytics (#69453 ) This commit adds a new `_preview` endpoint for data frame analytics. This allows users to see the data on which their model will be trained. This is especially useful in the arrival of custom feature processors. The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.	2021-03-01 12:25:50 -05:00
Lisa Cawley	138224b398	[DOCS] Edits trained model alias API (#69491 )	2021-02-24 08:17:49 -08:00
István Zoltán Szabó	77d0f56581	[DOCS] Adds anomaly detection alert documentation (#68923 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-02-23 10:29:54 +01:00
Dimitris Athanasiou	7fb98c0d3c	[ML] Add runtime mappings to data frame analytics source config (#69183 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056	2021-02-19 16:29:19 +02:00
Benjamin Trent	0af38bba9e	[ML] add new delete trained model aliases API (#69195 ) In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.	2021-02-18 13:12:07 -05:00
Lisa Cawley	55f0e32fe4	[DOCS] Clarify put data frame analytics API feature processors option (#69158 )	2021-02-18 08:53:46 -08:00
Benjamin Trent	26eef892df	[ML] adds new trained model alias API to simplify trained model updates and deployments (#68922 ) A `model_alias` allows trained models to be referred by a user defined moniker. This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.	2021-02-18 09:41:50 -05:00
James Rodewig	9b88ae92e6	[DOCS] Fix typos for duplicate words (#69125 )	2021-02-17 10:34:20 -05:00
Lisa Cawley	a1fb2c3606	[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084 )	2021-02-16 14:02:00 -08:00
Lisa Cawley	8b6ec07613	[DOCS] Edits ML hyperparameter descriptions (#68880 )	2021-02-11 11:55:28 -08:00
Lisa Cawley	683368cc4d	[DOCS] Clarify soft_tree_depth_limit (#68787 ) Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>	2021-02-10 12:51:01 -08:00
István Zoltán Szabó	e45d7a942d	[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213 )	2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy	78368428b3	[ML] Add early stopping DFA configuration parameter (#68099 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.	2021-02-01 11:41:28 +01:00
Dimitris Athanasiou	5c961c1c81	[ML] Expand regression/classification hyperparameters (#67950 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance	2021-01-26 12:56:41 +02:00
István Zoltán Szabó	addb5cbd3a	[DOCS] Adds custom feature processors description to PUT DFA API (#67424 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-19 09:47:32 +01:00

... 3 4 5 6 7 ...

590 commits