Commit graph

522 commits

Author SHA1 Message Date
Benjamin Trent
d588d456f0
[ML] add new trained model deployment cache clear API (#89074)
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
2022-08-04 19:45:15 +01:00
Benjamin Trent
9ce59bb7a9
[ML] add text_similarity nlp task documentation (#88994)
Introduced in: #88439

* [ML] add text_similarity nlp task documentation

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/ml-shared.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2022-08-02 12:17:14 -04:00
Dimitris Athanasiou
3f9334012f
[ML] Fix version substitution in put DFA docs (#88862)
This fixes the version substitution in a couple of response examples in
the put DFA docs.
2022-07-28 01:37:30 +09:30
David Roberts
15e7b06b79
[ML] Add inference cache hit count to inference node stats (#88807)
The inference node stats for deployed PyTorch inference
models now contain two new fields: `inference_cache_hit_count`
and `inference_cache_hit_count_last_minute`.

These indicate how many inferences on that node were served
from the C++-side response cache that was added in
https://github.com/elastic/ml-cpp/pull/2305. Cache hits
occur when exactly the same inference request is sent to the
same node more than once.

The `average_inference_time_ms` and
`average_inference_time_ms_last_minute` fields now refer to
the time taken to do the cache lookup, plus, if necessary,
the time to do the inference. We would expect average inference
time to be vastly reduced in situations where the cache hit
rate is high.
2022-07-26 17:53:43 +01:00
Benjamin Trent
a044b5c01e
[ML] make composite aggs in datafeeds Generally Available (#88589)
Commit makes composite aggs in datafeeds generally available.
2022-07-19 12:41:25 -04:00
Benjamin Trent
afa28d49b4
[ML] add new cache_size parameter to trained_model deployments API (#88450)
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.

By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead. 

This is due to the model having to be loaded in memory twice when serializing to the native process. 

But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.

Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
2022-07-18 09:19:01 -04:00
István Zoltán Szabó
cf68d0f13c
[DOCS] Updates infer trained model API docs with inference_config (#88500)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2022-07-13 17:47:05 +02:00
Lisa Cawley
7e214fc51b
[DOCS] Add authorization info to create, get, and update DFA jobs APIs (#88098) 2022-06-30 08:41:04 -07:00
Lisa Cawley
c9b4499d2e
[DOCS] Add authorization details to update datafeed API (#88099) 2022-06-28 13:43:58 -07:00
Lisa Cawley
aa19690990
[DOCS] Add authorization to anomaly detection job and datafeed API examples (#87937) 2022-06-27 13:05:35 -07:00
Dimitris Athanasiou
f3199e968b
[ML] Adjust docs for distributed model allocation (#87955)
[ML] Adjust docs for distributed model allocation

Follow up to #87366
2022-06-23 15:35:58 +03:00
Lisa Cawley
76cd7b63a4
[DOCS] Add authorization info to get anomaly detection jobs API (#87904) 2022-06-22 15:15:33 -07:00
István Zoltán Szabó
78c0ad91fc
[DOCS] Adds note to time_of_week function about how values are calculated (#87871)
Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>
2022-06-22 10:22:49 +02:00
Dimitris Athanasiou
679351e224
[ML] Require that threads_per_allocation is a power of 2 (#87697)
As the number of cores in CPUs is typically a power of 2,
this commit adds a validation that trained model deployments
start with `threads_per_allocation` set to be a power of 2.
When we look for how we distribute the allocations across the
cluster, this prevents situations where we have a lot of wasted
CPU cores.

In addition, we add a max value limit of `32`.
2022-06-17 15:12:37 +03:00
Lisa Cawley
5b6838e6ec
[DOCS] Fix typo in anomaly detection example (#87668) 2022-06-14 14:34:33 -07:00
Lisa Cawley
32f6082b7e
[DOCS] Typo in time functions (#87373) 2022-06-03 08:40:12 -07:00
István Zoltán Szabó
a71ad6e407
[DOCS] Expands AD and Transform alert docs with info on context for recovered alerts (#87118) 2022-06-02 09:52:47 +02:00
Benjamin Trent
115f19ff6d
[ML] adds start and end params to _preview and excludes cold/frozen tiers from unbounded previews (#86989)
n larger clusters with complicated datafeed requirements, being able to preview only a specific window of time is important. Previously, datafeed previews would always start at 0 (or from the beginning of the data). This causes issues if the index pattern contains indices on slower hardware, but when the datafeed is actually started, the "start" time is set to more recent data (and thus on faster hardware).

Additionally, when _preview is unbounded (as before), it attempts to only preview indices that are NOT frozen or cold. This is done through a query against the _tier field. Meaning, it only effects newer indices that actually have that field set.
2022-05-20 13:56:53 -04:00
István Zoltán Szabó
f3e8904b2c
[DOCS] Adds settings of question_answering to inference_config of PUT and infer trained model APIs (#86895)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2022-05-19 11:04:14 +02:00
Lisa Cawley
6b7320790f
[DOCS] Updates example output for start trained model deployment API (#86824) 2022-05-17 07:27:44 -07:00
Lisa Cawley
a9c8c12814
[DOCS] Removes infer trained model deployment API (#86497) 2022-05-10 09:56:36 -07:00
Dimitris Athanasiou
68c51f3ada
[ML] Rename threading params in _start trained model deployment API (#86597)
When starting a trained model deployment the user can tweak performance
by setting the `model_threads` and `inference_threads` parameters.
These parameters are hard to understand and cause confusion.

This commit renames these as well as the fields where their values are
reported in the stats API.

- `model_threads` => `number_of_allocations`
- `inference_threads` => `threads_per_allocation`

Now the terminology is as follows.

A model deployment starts with a requested `number_of_allocations`.
Each allocation means the model gets another thread for executing
parallel inference requests. Thus, more allocations should increase
throughput. In its turn, each allocation is may be using a number
of threads to parallelize each individual inference request.
This is the `threads_per_allocation` setting and increases inference
speed (which might also result in improved throughput).
2022-05-10 17:41:00 +03:00
Lisa Cawley
89a3e18e10
[DOCS] Add preview admonition to infer API (#86486) 2022-05-05 13:49:02 -07:00
Benjamin Trent
a907f0bb6f
[ML] add new trained_models/{model_id}/_infer endpoint for all supervised models and deprecate deployment infer api (#86361)
This commit adds a new `_ml/trained_models/{model_id}/_infer` API. This api works for both native NLP models and supervised models trained via Data Frame analytics. 

The format of the API is the same as the old `_ml/trained_models/{model_id}/deployment/_infer`. Taking a `docs` and an `inference_config` parameter.

This PR also deprecates the old experimental `_ml/trained_models/{model_id}/deployment/_infer` API.

The biggest difference is that the response now nests all results under an "inference_results" object.

closes: https://github.com/elastic/elasticsearch/issues/86032
2022-05-05 14:58:59 -04:00
Benjamin Trent
25d1afbe6f
[ML] rename trained model allocations to assignments (#85503)
This renames the internal concept of a trained model allocation into an assignment.

Now models are assigned to a node and routes created for inference. Not "allocated".

This is an internal rename only. The user facing concepts of trained models and deployments are untouched.
2022-04-18 11:35:10 -04:00
István Zoltán Szabó
7f556ece75
[DOCS] Adds size param to evaluate DFA API docs (#85735) 2022-04-07 10:03:09 +02:00
Dimitris Athanasiou
5d670e45ac
Revert "[ML] Only one of inference_threads and model_threads may be great… (#84794)" (#85089)
This reverts commit 4eaedb265d.

On further investigation of how to improve allocation of trained models,
we concluded that being able to set `inference_threads` in combination with
`model_threads` is fundamental for scalability.
2022-03-18 09:41:27 +02:00
Benjamin Trent
258d2b71e2
[ML] add roberta/bart docs (#85001)
adds roberta section to NLP tokenization documentation.
2022-03-17 12:14:57 -04:00
Dimitris Athanasiou
4eaedb265d
[ML] Only one of inference_threads and model_threads may be great… (#84794)
Starting a trained model deployment the user may set values for `inference_threads`
of `model_threads`. The first improves latency whereas the latter improves throughput.
It is easier to reason on how a model allocation uses resources if we ensure only
one of those two may be greater than one. In addition, it allows us to distribute
the cores of the ML nodes in the cluster across the model allocations in the future.

This commit adds a validation that prevents both `inference_threads` and `model_threads`
to be greater than one.
2022-03-09 16:33:35 +02:00
David Kyle
27ae82139a
[ML] Add throughput stats for Trained Model Deployments (#84628)
Throughput is measured as the number of inference requests 
processed per minute. The node level stats peak_throughput_per_minute, 
throughput_last_minute and average_inference_time_ms_last_minute are 
added with a deployment level stat peak_throughput_per_minute which
 is the summed throughput of all nodes.
2022-03-08 11:06:36 +00:00
Lisa Cawley
cae3a662dc
[DOCS] Refresh automated screenshots (#84543) 2022-03-02 09:30:07 -08:00
Benjamin Trent
45deac4c96
[ML] add windowing support for text_classification (#83989)
This commit adds initial windowing support for text_classification tasks.

Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.

Default value is span: -1 indicates that no windowing should take place.
2022-03-01 08:29:12 -05:00
David Roberts
bf00ab381e
[ML] Add ML memory stats API (#83802)
Adds an API that can be used to find out how much memory ML
is permitted to use and is currently using on each node, both
within the JVM heap, and natively, outside of the JVM.
2022-02-17 09:19:14 +00:00
Lisa Cawley
458ef91066
[DOCS] Move ML info and upgrade APIs (#84005) 2022-02-16 11:23:00 -08:00
Tobias Stadler
e3deacf547
[DOCS] Fix typos (#83895) 2022-02-15 12:42:17 -05:00
Lisa Cawley
104efd4343
[DOCS] Minor edits to trained model APIs (#81549) 2022-02-09 13:44:13 -08:00
David Kyle
c1fbf87de8
[ML] Add error counts to trained model stats (#82705)
Adds inference_count, timeout_count, rejected_execution_count
and error_count fields to trained model stats.
2022-01-27 16:18:20 +00:00
István Zoltán Szabó
b42ba64019
[DOCS] Fixes geo function field names. (#83198) 2022-01-27 12:03:58 +01:00
Ugo Sangiorgi
305ff20b8f
[DOCS] Add missing HTML anchors to CCR and ML (#80287) 2022-01-26 11:00:40 -08:00
István Zoltán Szabó
a5affc7104
[DOCS] Fixes field names in ML sum functions. (#83048) 2022-01-25 15:28:06 +01:00
Lisa Cawley
91cd38df57
[DOCS] Fix links to anomaly detection docs (#82836) 2022-01-19 17:54:18 -08:00
Lisa Cawley
c98833f9c6
[DOCS] Fix links to anomaly detection docs (#82774) 2022-01-18 17:42:16 -08:00
Dimitris Athanasiou
93777b4e99
[ML] Add latest search interval to datafeed stats (#82620)
This commit adds `search_interval` to the datafeed stats API
`running_state` object. When the datafeed is running, it reports
the last search interval that was searched. It is useful to
understand the point in time where the datafeed is currently
searching.

Closes #82405
2022-01-16 16:04:35 +02:00
David Kyle
1473b09415
[ML] Add NLP inference configs to the inference processor docs (#82320) 2022-01-11 08:50:45 +00:00
Ed Savage
e8a46649c5
[ML] Warn when creating job with an unusual bucket span (#82145)
Emit deprecation warning when creating new jobs with bucket spans that
aren't an integral divisor or multiple of a day.

Relates #81645

Co-authored-by: lcawl <lcawley@elastic.co>
2022-01-10 17:04:18 +00:00
Benjamin Trent
9dc8aea1cb
[ML] adds new mpnet tokenization for nlp models (#82234)
This commit adds support for MPNet based models.

MPNet models differ from BERT style models in that:

 - Special tokens are different
 - Input to the model doesn't require token positions.

To configure an MPNet tokenizer for your pytorch MPNet based model:

```
"tokenization": {
  "mpnet": {...}
}
```
The options provided to `mpnet` are the same as the previously supported `bert` configuration.
2022-01-05 12:56:47 -05:00
Dimitris Athanasiou
14a63ac115
[ML] Improve reporting of trained model size stats (#82000)
This improves reporting of trained model size in the response of the stats API.

In particular, it removes the `model_size_bytes` from the `deployment_stats` section and
replaces it with a top-level `model_size_stats` object that contains:

- `model_size_bytes`: the actual model size
- `required_native_memory_bytes`: the amount of memory required to load a model

In addition, these are now reported for PyTorch models regardless of their deployment state.
2021-12-22 18:20:47 +02:00
Ed Savage
a646f55c57
[ML] Set default value of 30 days for model prune window (#81377)
For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater.

Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-12-20 11:27:30 +00:00
David Kyle
d1ee756da8
[ML][DOCS] Add note about max values of thread settings (#81367) 2021-12-14 13:07:34 +00:00
David Roberts
0559dd087b
[ML] Model snapshot upgrade needs a stats endpoint (#81641)
Previously the ML model snapshot upgrade endpoint did not
provide a way to reliably monitor progress. This could lead
to the upgrade assistant UI thinking that a model snapshot
upgrade had finished when it actually hadn't.

This change adds a new "stats" API that allows external
interested parties to find out the status of each model
snapshot upgrade and which node (if any) each is running on.

Fixes #81519
2021-12-14 08:31:49 +00:00