If the xpack.ml.use_auto_machine_memory_percent setting is true,
and xpack.ml.max_model_memory_limit is not set then
xpack.ml.max_model_memory_limit is now considered to be set to
the largest size that could be assigned in the cluster.
This functionality will be crucial for Cloud once the Elasticsearch
startup code is setting the Elasticsearch JVM heap size. Then the
Cloud code will no longer be able to accurately set
xpack.ml.max_model_memory_limit, so will not set it at all.
Instead the Cloud code will just set
xpack.ml.use_auto_machine_memory_percent and the ML code will
calculate the appropriate maximum model_memory_limit that should
be permitted.
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.
If a node does not have a view into its native memory, we ignore it for assignment.
This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
When running ML, sometimes it is best to automatically adjust the
memory allotted for machine learning based on the nodesize
and how much space is given to the JVM
This commit adds a new static setting xpack.ml.use_auto_machine_memory_percent for
allowing this dynamic calculation. The old setting remains as a backup
just in case the limit cannot be automatically determined due to
lack of information.
Closes#63795
* Adding ESS icons to supported ES settings.
* Adding new file for supported ESS settings.
* Adding supported ESS settings for HTTP and disk-based shard allocation.
* Adding more supported settings for ESS.
* Adding descriptions for each Cloud section, plus additional settings.
* Adding new warehouse file for Cloud, plus additional settings.
* Adding node settings for Cloud.
* Adding audit settings for Cloud.
* Resolving merge conflict.
* Adding SAML settings (part 1).
* Adding SAML realm encryption and signing settings.
* Adding SAML SSL settings.
* Adding Kerberos realm settings.
* Adding OpenID Connect Realm settings.
* Adding OpenID Connect SSL settings.
* Resolving leftover Git merge markers.
* Removing Cloud settings page and link to it.
* Add link to mapping source
* Update docs/reference/docs/reindex.asciidoc
* Incorporate edit of HTTP settings
* Remove "cloud" from tag and ID
* Remove "cloud" from tag and update description
* Remove "cloud" from tag and ID
* Change "whitelists" to "specifies"
* Remove "cloud" from end tag
* Removing cloud from IDs and tags.
* Changing link reference to fix build issue.
* Adding index management page for missing settings.
* Removing warehouse file for Cloud and moving settings elsewhere.
* Clarifying true/false usage of http.detailed_errors.enabled.
* Changing underscore to dash in link to fix ci build.
This adds new plugin level circuit breaker for the ML plugin.
`model_inference` is the circuit breaker qualified name.
Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.
The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
This change does the following:
1. Makes the per-node setting xpack.ml.max_open_jobs
into a cluster-wide dynamic setting
2. Changes the job node selection to continue to use the
per-node attributes storing the maximum number of open
jobs if any node in the cluster is older than 7.1, and
use the dynamic cluster-wide setting if all nodes are on
7.1 or later
3. Changes the docs to reflect this
4. Changes the thread pools for native process communication
from fixed size to scaling, to support the dynamic nature
of xpack.ml.max_open_jobs
5. Renames the autodetect thread pool to the job comms
thread pool to make clear that it will be used for other
types of ML jobs (data frame analytics in particular)
Closes#29809
* Adding new xpack.ml.max_lazy_ml_nodes setting to docs
* Fixing docs, making it clearer what the setting does
* Adding note about external process need