* Add inference counts by NLP model to the machine learning usage stats.
* Update docs/changelog/101915.yaml
* Add inference_counts_by_model to yamlRestTest.
* Strip leading dot from internal model IDs.
* Add last access and task type to the stats by model.
* Change stats_by_model for map to list
* Simplify code.
* Fix style
**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`.
**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security
Rel: https://github.com/elastic/platform-docs-team/issues/208
This adds a `data_lifecycle` section to the _xpack/usage API, giving basic information about data lifecycles in the cluster. The data looks something like:
```
"data_lifecycle": {
"available": true,
"enabled": true,
"lifecycle": {
"count": 1,
"default_rollover_used": true,
"retention": {
"minimum_millis": 360000,
"maximum_millis": 360000,
"average_millis": 360000.0
}
}
}
```
The previous fix (#95565) didn't work since the section was misplaced.
Note that this test runs only on snapshot build so I tested manually and the failure is now related to remote_clusters section missing.
Closes#95603
Every node (post `8.7`) collects stats from every health-api request it receives. We extend the `_xpack/usage` endpoint to expose these stats. When a node receives the request it will fan out to collect data from all other nodes, merge them and expose them. If the cluster is not fully upgraded, it will signal it with the `available` flag set to`false`.
This deprecates estimated_heap_memory_usage_bytes on model put and replaces it with model_size_bytes.
On GET, only model_size_bytes is returned unless v7 rest-api compatibility is requested.
For the ml/info API, only model_size_bytes is returned
A forward-port of: #80545
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
We have already decided not to have xpack usage for field mappers
(see #53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see #74974).
This removes xpack actions for vector field.
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
This adds additional statistics into the usage API for data frame analytics
and trained models.
For data frame analytics the added stats are:
- count of jobs by analysis type
- stats for peak_usage_bytes
For trained models the added stats are:
- counts of: total, prepackaged, other (not created by data frame analytics)
- counts by analysis type based on the inference config
- stats for estimated heap usage
- stats for estimated number of operations
* Fixing Painless tests.
* Update runtime field context to fix test cases.
* Remove watcher logging from usage API and replace test.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit adds the `data_frozen` node role as part of the formalization of data tiers. It also
adds the `"frozen"` phase to ILM, currently allowing the same actions as the existing cold phase.
The frozen phase is intended to be used for data even less frequently searched than the cold phase,
and will eventually be loosely tied to data using partial searchable snapshots (as oppposed to full
searchable snapshots in the cold phase).
Relates to #60848
Adds a multi_terms aggregation support. The multi terms aggregation works
very similarly to the terms aggregation but supports multiple terms. The goal
of this PR is to add the basic functionality so it is not optimized at the
moment. It will be done in follow up PRs.
Closes#65623
* Adds datetime as a date, which is necessary in setup.
* Updating field context example.
* Fixing sample data, updating context example, and updating runtime example.
* Updating field context and changing runtime field to use seats data.
* Update filter context to use the seats data.
* Updating min-should-match context to use seats data.
* Replacing last mentions of TEST[skip].
* Update usage with watcher response for build error.
* Updating usage API again for watcher.
* Third time's a charm for fixing test cases.
* Adding specific test replacement for watcher logging total.
* Change actors to keyword based on review feedback.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In the process of developing a new implementation for the Elasticsearch Rollups functionality we came up with the concept of the aggregate metric field type.
The aggregate_metric_double field type can store the results of aggregations (currently min, max, sum, value_count and avg are supported - more to come).
This field allows us to run (min, max, sum, value_count, avg) aggregations on the container field and the field will return the correct metric depending on the aggregation that is computed.
This commit adds telemetry for our data tier formalization. This telemetry helps determine the
topology of the cluster with regard to the content, hot, warm, & cold tiers/roles.
An example of the telemetry looks like:
```
GET /_xpack/usage?human
{
...
"data_tiers" : {
"available" : true,
"enabled" : true,
"data_warm" : {
...
},
"data_cold" : {
...
},
"data_content" : {
"node_count" : 1,
"index_count" : 6,
"total_shard_count" : 6,
"primary_shard_count" : 6,
"doc_count" : 71,
"total_size" : "59.6kb",
"total_size_bytes" : 61110,
"primary_size" : "59.6kb",
"primary_size_bytes" : 61110,
"primary_shard_size_avg" : "9.9kb",
"primary_shard_size_avg_bytes" : 10185,
"primary_shard_size_median" : "8kb",
"primary_shard_size_median_bytes" : 8254,
"primary_shard_size_mad" : "7.2kb",
"primary_shard_size_mad_bytes" : 7391
},
"data_hot" : {
...
}
}
}
```
The fields are as follows:
- node_count :: number of nodes with this tier/role
- index_count :: number of indices on this tier
- total_shard_count :: total number of shards for all nodes in this tier
- primary_shard_count :: number of primary shards for all nodes in this tier
- doc_count :: number of documents for all nodes in this tier
- total_size_bytes :: total number of bytes for all shards for all nodes in this tier
- primary_size_bytes :: number of bytes for all primary shards on all nodes in this tier
- primary_shard_size_avg_bytes :: average shard size for primary shard in this tier
- primary_shard_size_median_bytes :: median shard size for primary shard in this tier
- primary_shard_size_mad_bytes :: [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) of shard size for primary shard in this tier
Relates to #60848
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:
```
...
"data_streams" : {
"available" : true,
"enabled" : true,
"data_streams" : 3,
"indices_count" : 17
}
...
```
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
Closes#51005.
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
The current consensus is that we don't need info actions for smaller items like
field mappers. We can also remove the usage action since the cluster stats API
now tracks information about mappings, like what field types are defined.