mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length. Changes include: - A new dynamic operator setting to control the maximum batch size in bytes. - Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory. - Clearing inference results dynamically after each bulk item to free up memory sooner. This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
5 lines
120 B
YAML
5 lines
120 B
YAML
pr: 124313
|
|
summary: Optimize memory usage in `ShardBulkInferenceActionFilter`
|
|
area: Search
|
|
type: enhancement
|
|
issues: []
|