elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 18:33:26 -04:00

Author	SHA1	Message	Date
James Rodewig	c757f9e4e7	[DOCS] Fix double spaces (#71082 ) (#71120 )	2021-03-31 11:43:34 -04:00
Benjamin Trent	abb182d95c	[7.x] [ML] adding support for composite aggs in anomaly detection (#69970 ) (#71052 ) * [ML] adding support for composite aggs in anomaly detection (#69970) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 12:04:54 -04:00
István Zoltán Szabó	cdde73d5ff	[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978 ) (#71033 )	2021-03-30 12:22:23 +02:00
Benjamin Trent	2545cc8ec4	[ML] Allow datafeed and job configs for datafeed preview API (#70836 ) (#70927 ) Previously, a datafeed and job must already exist for the `_preview` API to work. With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job without creating either of them. closes https://github.com/elastic/elasticsearch/issues/70264	2021-03-26 13:58:52 -04:00
István Zoltán Szabó	591e93397a	[DOCS] Removes beta labels from DFA related docs. (#70808 ) (#70902 )	2021-03-26 10:25:36 +01:00
István Zoltán Szabó	bc035168f2	[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486 ) (#70576 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-22 14:24:23 +01:00
Benjamin Trent	2fa22e6759	[ML] allow documents to be out of order within the same time bucket (#70468 ) (#70494 ) This commit allows documents seen within the same time bucket to be out of order. This is already supported within the native process. Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.	2021-03-17 10:42:10 -04:00
James Rodewig	302341a526	[DOCS] Replace `put` with `create or update` in API names (#70330 ) (#70421 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-15 17:16:13 -04:00
István Zoltán Szabó	8fa2d13b57	[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201 ) (#70239 )	2021-03-10 17:12:03 +01:00
Lisa Cawley	bf8024b9e7	[DOCS] Edits machine learning settings (#69947 ) (#70174 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-03-09 11:35:29 -08:00
Lisa Cawley	2e7ee9abcc	[DOCS] Edits delete trained model alias API (#70119 ) (#70120 )	2021-03-09 08:03:13 -08:00
István Zoltán Szabó	24bf46dcf9	[DOCS] Expands anomaly detection alert type docs (#70026 ) (#70137 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>	2021-03-09 16:39:38 +01:00
István Zoltán Szabó	e4f74c8b5b	[DOCS] Adds beta tag to anomaly detection alert docs. (#70013 ) (#70054 )	2021-03-08 12:11:58 +01:00
István Zoltán Szabó	646534bda0	[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959 ) (#69978 )	2021-03-05 09:54:34 +01:00
Benjamin Trent	3bef185e06	[7.x] [ML] adding new _preview endpoint for data frame analytics (#69453 ) (#69729 ) * [ML] adding new _preview endpoint for data frame analytics (#69453) This commit adds a new `_preview` endpoint for data frame analytics. This allows users to see the data on which their model will be trained. This is especially useful in the arrival of custom feature processors. The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.	2021-03-01 14:47:32 -05:00
Lisa Cawley	119af4e14a	[DOCS] Edits trained model alias API (#69491 ) (#69560 )	2021-02-24 09:30:47 -08:00
István Zoltán Szabó	4088f85002	[DOCS] Adds anomaly detection alert documentation (#68923 ) (#69417 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-02-23 10:48:24 +01:00
Dimitris Athanasiou	98c69cedce	[7.x][ML] Add runtime mappings to data frame analytics source config … (#69284 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056 Backport of #69183	2021-02-19 20:17:06 +02:00
Dan Hermann	b311906009	Unmute more memory-related tests after the fix in #68542	2021-02-19 08:06:50 -06:00
Benjamin Trent	1e15fe5da6	[ML] add new delete trained model aliases API (#69195 ) (#69221 ) In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.	2021-02-19 07:39:48 -05:00
Lisa Cawley	8daaf69f2d	[DOCS] Clarify put data frame analytics API feature processors option (#69158 ) (#69210 )	2021-02-18 14:29:56 -08:00
Benjamin Trent	3250dd763a	[7.x] [ML] adds new trained model alias API to simplify trained model updates and deployments (#68922 ) (#69208 ) * [ML] adds new trained model alias API to simplify trained model updates and deployments (#68922) A `model_alias` allows trained models to be referred by a user defined moniker. This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.	2021-02-18 15:24:24 -05:00
James Rodewig	b55249507e	[DOCS] Fix typos for duplicate words (#69125 ) (#69132 )	2021-02-17 11:16:58 -05:00
Lisa Cawley	78cc644ef0	[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084 ) (#69089 )	2021-02-16 14:41:38 -08:00
Lisa Cawley	c981a7c905	[DOCS] Edits ML hyperparameter descriptions (#68880 ) (#68934 )	2021-02-11 12:35:58 -08:00
Lisa Cawley	cbbd6a0729	[DOCS] Clarify soft_tree_depth_limit (#68787 ) (#68867 ) Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>	2021-02-10 13:26:59 -08:00
István Zoltán Szabó	d851ba88c0	[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213 ) (#68372 )	2021-02-02 15:39:07 +01:00
Valeriy Khakhutskyy	4bbd31a268	[7.x][ML] Add early stopping DFA configuration parameter (#68271 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue. Backport of #68099.	2021-02-01 14:11:06 +01:00
Dimitris Athanasiou	9e55623c29	[7.x][ML] Expand regression/classification hyperparameters (#67950 ) (#67983 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance Backport of #67950	2021-01-26 15:48:13 +02:00
István Zoltán Szabó	9f7ca2649c	[DOCS] Adds custom feature processors description to PUT DFA API (#67424 ) (#67678 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-19 10:39:46 +01:00
Dimitris Athanasiou	b9fbda1829	[7.x][ML] Remove DFA job states reindexing and analyzing from docs (#67658 ) (#67662 ) These states do no longer exist as of #67423 Backport of #67658	2021-01-18 17:56:59 +02:00
Benjamin Trent	05d88f71d3	[ML] [DOCS] adding missing fields to the get trained models API docs (#67590 ) (#67598 ) Adds missing fields description, inference_config, and input to the GET trained models API documentation	2021-01-15 13:38:00 -05:00
Benjamin Trent	88a1bfa516	[ML] [DOCS] update find-structure reference docs (#67586 ) (#67592 ) The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".	2021-01-15 13:22:59 -05:00
István Zoltán Szabó	2a66cb60e0	[DOCS] Adds hyperparameter metadata property to GET trained models API docs. (#67412 ) (#67426 )	2021-01-13 14:34:54 +01:00
Lisa Cawley	da5662495b	[DOCS] Move find file structure to a new API endpoint (#67314 ) (#67389 )	2021-01-12 12:41:39 -08:00
Benjamin Trent	65690eef67	[7.x] [ML] move find file structure to a new API endpoint (#67123 ) (#67251 ) * [ML] move find file structure to a new API endpoint (#67123) This introduces a new `text-structure` plugin. This is the new home of the find file structure API. The old REST URL is still available but is deprecated. The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged. Changes to the high-level REST client and docs will be in separate commit. related to: https://github.com/elastic/elasticsearch/issues/67001	2021-01-11 10:39:39 -05:00
Lisa Cawley	2cbb6123db	[DOCS] Clarify impact of delayed data in anomaly detection (#66816 ) (#67039 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-05 13:09:30 -08:00
István Zoltán Szabó	2f7240d271	[DOCS] Improves inference processor linking and docs (#66119 ) (#66983 )	2021-01-05 14:04:51 +01:00
Gordon Brown	09f823b1db	Mute tests failing on Debian 8 due to memory reporting (#66648 ) See https://github.com/elastic/elasticsearch/issues/66629 for details	2020-12-18 15:15:43 -07:00
Dimitris Athanasiou	97cfc8fdf8	[7.x][ML] Add log_time to AD data_counts and decide current based on it (#66343 ) (#66384 ) This commit is fixing a potential bug if we support anomaly detection results index rollover in the future. In particular, we determine the current `data_counts` by sorting on the latest record time. However, this is not correct if the job reverts to an older model snapshot. To fix this we add `log_time` to `data_counts` (similarly to `model_size_stats`) and sort on `log_time` to figure out the current counts for the job. Backport of #66343	2020-12-16 12:27:32 +02:00
David Roberts	8284b93dcb	[ML] Deprecate anomaly detection post data endpoint (#66398 ) There is little evidence of this endpoint being used and there is quite a lot of code complexity associated with the various formats that can be used to upload data and the different errors that can occur when direct data upload is open to end users. In a future release we can make this endpoint internal so that only datafeeds can use it, and remove all the options and formats that are not used by datafeeds. End users will have to store their input data for anomaly detection in Elasticsearch indices (which we believe all do today) and use a datafeed to feed it to anomaly detection jobs. Backport of #66347	2020-12-15 21:39:47 +00:00
István Zoltán Szabó	80ffeabbbc	[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085 ) (#66089 )	2020-12-09 11:48:16 +01:00
István Zoltán Szabó	b1c11ebdbe	[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036 ) (#66084 )	2020-12-09 10:23:42 +01:00
David Kyle	5fec2538ca	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) (#66007 ) For the changes in #65606	2020-12-08 11:04:21 +00:00
David Roberts	cbb3886bbb	[ML] Adding assignment_memory_basis to model_size_stats (#65875 ) At present the Java code makes a decision on whether to use current model memory or model memory limit to calculate how much memory a job requires to be assigned. The plan is to move this decision to the C++ code, which will report it via a new field in the model size stats. An additional change will be that once we have made the switch from using model memory limit to using current model memory we will never switch back, as this causes large fluctuations up and down in memory requirement which will be much more noticeable when autoscaling is in use. Although the only two options at present are model memory limit and current model memory, the new enum includes a third possibility, peak model memory. To switch to this now would be tricky, as there have been two bugs in the implementation of peak model memory which render its value unreliable in 7.x. However, in 8.x it might make sense to switch to using peak model memory instead of current model memory and it's much easier from a BWC perspective if the enum contains all the values from the start. Backport of #65561	2020-12-04 11:34:36 +00:00
David Roberts	9a239eef6f	[ML] Adjusting soft_limit description (#65383 ) This PR adds detail to the explanation of the soft_limit memory_status in ML job stats. A consequence that was not mentioned before is that examples are not added to category definitions. Relates elastic/ml-cpp#1590	2020-11-24 09:38:02 +00:00
István Zoltán Szabó	ee114e7c90	[DOCS] Fixes typo in Aggregating data for faster performance. (#65354 ) (#65356 )	2020-11-23 13:03:20 +01:00
István Zoltán Szabó	53c64d594b	[DOCS] Adds UI related limitation to configuring aggs docs (#65184 ) (#65327 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-11-23 09:40:26 +01:00
István Zoltán Szabó	971b6d95c4	[DOCS] Makes the screenshot larger on the custom URLs page. (#65269 ) (#65297 )	2020-11-20 15:25:15 +01:00
David Roberts	d8f549c21f	[ML] Add total ML memory to ML info (#65214 ) This change adds an extra piece of information, limits.total_ml_memory, to the ML info response. This returns the total amount of memory that ML is permitted to use for native processes across all ML nodes in the cluster. Some of this may already be in use; the value returned is total, not available ML memory. Backport of #65195	2020-11-18 18:19:37 +00:00

1 2 3 4 5 ...

364 commits