elasticsearch/docs/reference/ml
Benjamin Trent afa28d49b4
[ML] add new cache_size parameter to trained_model deployments API (#88450)
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.

By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead. 

This is due to the model having to be loaded in memory twice when serializing to the native process. 

But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.

Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
2022-07-18 09:19:01 -04:00
..
anomaly-detection [DOCS] Add authorization details to update datafeed API (#88099) 2022-06-28 13:43:58 -07:00
common/apis [ML] Add ML memory stats API (#83802) 2022-02-17 09:19:14 +00:00
df-analytics/apis [DOCS] Add authorization info to create, get, and update DFA jobs APIs (#88098) 2022-06-30 08:41:04 -07:00
images [DOCS] Refresh automated screenshots (#84543) 2022-03-02 09:30:07 -08:00
trained-models/apis [ML] add new cache_size parameter to trained_model deployments API (#88450) 2022-07-18 09:19:01 -04:00
ml-shared.asciidoc [DOCS] Adds settings of question_answering to inference_config of PUT and infer trained model APIs (#86895) 2022-05-19 11:04:14 +02:00