mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
* reorg files for docs-assembler and create toc.yml files * fix build error, add redirects * only toc * move images
166 lines
5.6 KiB
Markdown
166 lines
5.6 KiB
Markdown
---
|
||
navigation_title: "{{infer-cap}} bucket"
|
||
mapped_pages:
|
||
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-inference-bucket-aggregation.html
|
||
---
|
||
|
||
# {{infer-cap}} bucket aggregation [search-aggregations-pipeline-inference-bucket-aggregation]
|
||
|
||
|
||
A parent pipeline aggregation which loads a pre-trained model and performs {{infer}} on the collated result fields from the parent bucket aggregation.
|
||
|
||
To use the {{infer}} bucket aggregation, you need to have the same security privileges that are required for using the [get trained models API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-get-trained-models).
|
||
|
||
## Syntax [inference-bucket-agg-syntax]
|
||
|
||
A `inference` aggregation looks like this in isolation:
|
||
|
||
```js
|
||
{
|
||
"inference": {
|
||
"model_id": "a_model_for_inference", <1>
|
||
"inference_config": { <2>
|
||
"regression_config": {
|
||
"num_top_feature_importance_values": 2
|
||
}
|
||
},
|
||
"buckets_path": {
|
||
"avg_cost": "avg_agg", <3>
|
||
"max_cost": "max_agg"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
1. The unique identifier or alias for the trained model.
|
||
2. The optional inference config which overrides the model’s default settings
|
||
3. Map the value of `avg_agg` to the model’s input field `avg_cost`
|
||
|
||
|
||
$$$inference-bucket-params$$$
|
||
|
||
| Parameter Name | Description | Required | Default Value |
|
||
| --- | --- | --- | --- |
|
||
| `model_id` | The ID or alias for the trained model. | Required | - |
|
||
| `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - |
|
||
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |
|
||
|
||
|
||
## Configuration options for {{infer}} models [_configuration_options_for_infer_models]
|
||
|
||
The `inference_config` setting is optional and usually isn’t required as the pre-trained models come equipped with sensible defaults. In the context of aggregations some options can be overridden for each of the two types of model.
|
||
|
||
|
||
#### Configuration options for {{regression}} models [inference-agg-regression-opt]
|
||
|
||
`num_top_feature_importance_values`
|
||
: (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. By default, it is zero and no {{feat-imp}} calculation occurs.
|
||
|
||
|
||
#### Configuration options for {{classification}} models [inference-agg-classification-opt]
|
||
|
||
`num_top_classes`
|
||
: (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
|
||
|
||
`num_top_feature_importance_values`
|
||
: (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. Defaults to 0 which means no {{feat-imp}} calculation occurs.
|
||
|
||
`prediction_field_type`
|
||
: (Optional, string) Specifies the type of the predicted field to write. Valid values are: `string`, `number`, `boolean`. When `boolean` is provided `1.0` is transformed to `true` and `0.0` to `false`.
|
||
|
||
|
||
## Example [inference-bucket-agg-example]
|
||
|
||
The following snippet aggregates a web log by `client_ip` and extracts a number of features via metric and bucket sub-aggregations as input to the {{infer}} aggregation configured with a model trained to identify suspicious client IPs:
|
||
|
||
```console
|
||
GET kibana_sample_data_logs/_search
|
||
{
|
||
"size": 0,
|
||
"aggs": {
|
||
"client_ip": { <1>
|
||
"composite": {
|
||
"sources": [
|
||
{
|
||
"client_ip": {
|
||
"terms": {
|
||
"field": "clientip"
|
||
}
|
||
}
|
||
}
|
||
]
|
||
},
|
||
"aggs": { <2>
|
||
"url_dc": {
|
||
"cardinality": {
|
||
"field": "url.keyword"
|
||
}
|
||
},
|
||
"bytes_sum": {
|
||
"sum": {
|
||
"field": "bytes"
|
||
}
|
||
},
|
||
"geo_src_dc": {
|
||
"cardinality": {
|
||
"field": "geo.src"
|
||
}
|
||
},
|
||
"geo_dest_dc": {
|
||
"cardinality": {
|
||
"field": "geo.dest"
|
||
}
|
||
},
|
||
"responses_total": {
|
||
"value_count": {
|
||
"field": "timestamp"
|
||
}
|
||
},
|
||
"success": {
|
||
"filter": {
|
||
"term": {
|
||
"response": "200"
|
||
}
|
||
}
|
||
},
|
||
"error404": {
|
||
"filter": {
|
||
"term": {
|
||
"response": "404"
|
||
}
|
||
}
|
||
},
|
||
"error503": {
|
||
"filter": {
|
||
"term": {
|
||
"response": "503"
|
||
}
|
||
}
|
||
},
|
||
"malicious_client_ip": { <3>
|
||
"inference": {
|
||
"model_id": "malicious_clients_model",
|
||
"buckets_path": {
|
||
"response_count": "responses_total",
|
||
"url_dc": "url_dc",
|
||
"bytes_sum": "bytes_sum",
|
||
"geo_src_dc": "geo_src_dc",
|
||
"geo_dest_dc": "geo_dest_dc",
|
||
"success": "success._count",
|
||
"error404": "error404._count",
|
||
"error503": "error503._count"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
1. A composite bucket aggregation that aggregates the data by `client_ip`.
|
||
2. A series of metrics and bucket sub-aggregations.
|
||
3. {{infer-cap}} bucket aggregation that specifies the trained model and maps the aggregation names to the model’s input fields.
|
||
|
||
|
||
|