elasticsearch/docs/reference/aggregations/search-aggregations-pipeline-inference-bucket-aggregation.md
Colleen McGinnis 9bcd59596d
[docs] Prepare for docs-assembler (#125118)
* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images
2025-03-20 12:09:12 -05:00

5.6 KiB
Raw Blame History

navigation_title mapped_pages
{{infer-cap}} bucket
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-inference-bucket-aggregation.html

{{infer-cap}} bucket aggregation [search-aggregations-pipeline-inference-bucket-aggregation]

A parent pipeline aggregation which loads a pre-trained model and performs {{infer}} on the collated result fields from the parent bucket aggregation.

To use the {{infer}} bucket aggregation, you need to have the same security privileges that are required for using the get trained models API.

Syntax [inference-bucket-agg-syntax]

A inference aggregation looks like this in isolation:

{
  "inference": {
    "model_id": "a_model_for_inference", <1>
    "inference_config": { <2>
      "regression_config": {
        "num_top_feature_importance_values": 2
      }
    },
    "buckets_path": {
      "avg_cost": "avg_agg", <3>
      "max_cost": "max_agg"
    }
  }
}
  1. The unique identifier or alias for the trained model.
  2. The optional inference config which overrides the models default settings
  3. Map the value of avg_agg to the models input field avg_cost

$$$inference-bucket-params$

Parameter Name Description Required Default Value
model_id The ID or alias for the trained model. Required -
inference_config Contains the inference type and its options. There are two types: regression and classification Optional -
buckets_path Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See buckets_path Syntax for more details Required -

Configuration options for {{infer}} models [_configuration_options_for_infer_models]

The inference_config setting is optional and usually isnt required as the pre-trained models come equipped with sensible defaults. In the context of aggregations some options can be overridden for each of the two types of model.

Configuration options for {{regression}} models [inference-agg-regression-opt]

num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of {{feat-imp}} values per document. By default, it is zero and no {{feat-imp}} calculation occurs.

Configuration options for {{classification}} models [inference-agg-classification-opt]

num_top_classes
(Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.
num_top_feature_importance_values
(Optional, integer) Specifies the maximum number of {{feat-imp}} values per document. Defaults to 0 which means no {{feat-imp}} calculation occurs.
prediction_field_type
(Optional, string) Specifies the type of the predicted field to write. Valid values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.

Example [inference-bucket-agg-example]

The following snippet aggregates a web log by client_ip and extracts a number of features via metric and bucket sub-aggregations as input to the {{infer}} aggregation configured with a model trained to identify suspicious client IPs:

GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "client_ip": { <1>
      "composite": {
        "sources": [
          {
            "client_ip": {
              "terms": {
                "field": "clientip"
              }
            }
          }
        ]
      },
      "aggs": { <2>
        "url_dc": {
          "cardinality": {
            "field": "url.keyword"
          }
        },
        "bytes_sum": {
          "sum": {
            "field": "bytes"
          }
        },
        "geo_src_dc": {
          "cardinality": {
            "field": "geo.src"
          }
        },
        "geo_dest_dc": {
          "cardinality": {
            "field": "geo.dest"
          }
        },
        "responses_total": {
          "value_count": {
            "field": "timestamp"
          }
        },
        "success": {
          "filter": {
            "term": {
              "response": "200"
            }
          }
        },
        "error404": {
          "filter": {
            "term": {
              "response": "404"
            }
          }
        },
        "error503": {
          "filter": {
            "term": {
              "response": "503"
            }
          }
        },
        "malicious_client_ip": { <3>
          "inference": {
            "model_id": "malicious_clients_model",
            "buckets_path": {
              "response_count": "responses_total",
              "url_dc": "url_dc",
              "bytes_sum": "bytes_sum",
              "geo_src_dc": "geo_src_dc",
              "geo_dest_dc": "geo_dest_dc",
              "success": "success._count",
              "error404": "error404._count",
              "error503": "error503._count"
            }
          }
        }
      }
    }
  }
}
  1. A composite bucket aggregation that aggregates the data by client_ip.
  2. A series of metrics and bucket sub-aggregations.
  3. {{infer-cap}} bucket aggregation that specifies the trained model and maps the aggregation names to the models input fields.