Add azure module doc

Fixes #9510
This commit is contained in:
Karen Metts 2018-05-02 10:16:55 -04:00 committed by karen.metts
parent 6f471fa665
commit 89ffcd14fb
2 changed files with 464 additions and 0 deletions

View file

@ -156,6 +156,9 @@ include::static/arcsight-module.asciidoc[]
:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/netflow-module.asciidoc
include::static/netflow-module.asciidoc[]
:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/azure-module.asciidoc
include::static/azure-module.asciidoc[]
// Working with Filebeat Modules
:edit_url: https://github.com/elastic/logstash/edit/{branch}/docs/static/filebeat-modules.asciidoc

461
docs/static/azure-module.asciidoc vendored Normal file
View file

@ -0,0 +1,461 @@
[role="xpack"]
[[azure-module]]
=== Azure Module [Experimental]
experimental[]
The https://azure.microsoft.com/en-us/overview/what-is-azure/[Microsoft Azure]
module in Logstash helps you easily integrate your Azure activity logs and SQL
diagnostic logs with the Elastic Stack.
You can monitor your Azure cloud environments and SQL DB deployments with
deep operational insights across multiple Azure subscriptions. You can explore
the health of your infrastructure in real-time, accelerating root cause analysis
and decreasing overall time to resolution. The Azure module helps you:
* Analyze infrastructure changes and authorization activity
* Identify suspicious behaviors and potential malicious actors
* Perform root-cause analysis by investigating user activity
* Monitor and optimize your SQL DB deployments.
NOTE: The Logstash Azure module is an
https://www.elastic.co/products/x-pack[{xpack}] feature under the Basic License
and is therefore free to use. Please contact
mailto:monitor-azure@elastic.co[monitor-azure@elastic.co] for questions or more
information.
The Azure module uses the
{logstash-ref}/plugins-inputs-azure_event_hubs.html[Logstash Azure Event Hubs
plugin] to consume data from Azure Event Hubs. The module taps directly into the
Azure dashboard, parses and indexes events into Elasticsearch, and installs a
suite of {kib} dashboards to help you start exploring your data immediately.
[[azure-dashboards]]
==== Dashboards
These {kib} dashboards are available and ready for you to use. You can use them as they are, or tailor them to meet your needs.
===== Infrastructure activity monitoring
* *Overview*. Top-level view into your Azure operations, including info about users, resource groups, service health, access, activities, and alerts.
* *Alerts*. Alert info, including activity, alert status, and alerts heatmap
* *User Activity*. Info about system users, their activity, and requests.
===== SQL Database monitoring
* *SQL DB Overview*. Top-level view into your SQL Databases, including counts for databases, servers, resource groups, and subscriptions.
* *SQL DB Database View*. Detailed info about each SQL Database, including wait time, errors, DTU and storage utilization, size, and read and write input/output.
* *SQL DB Queries*. Info about SQL Database queries and performance.
==== Azure_event_hubs plugin
The Azure module uses the `azure_event_hubs` plugin. Basic understanding of the
plugin and options is helpful when you set up the Azure module. See
{logstash-ref}/plugins-inputs-azure_event_hubs.html[azure_event_hubs plugin
documentation] for more information about configurations and options.
[[azure-module-prereqs]]
==== Prerequisites
Azure Monitor enabled with Azure Event Hubs and the Elastic Stack are required
for this module.
[[azure-elk-prereqs]]
===== Elastic prereqs
Logstash, Elasticsearch, and Kibana should be installed and running. The
products are https://www.elastic.co/downloads[available to download] and easy to
install.
The Elastic Stack version 6.4 (or later) is required for this module.
NOTE: Logstash, Elasticsearch, and Kibana must run locally. You can also run
Elasticsearch, Kibana and Logstash on separate hosts to consume data from Azure.
[[azure-prereqs]]
===== Azure prereqs
Azure Monitor should be configured to stream logs to one or more Event Hubs.
See <<azure-resources>> at the end of this topic for links to Microsoft Azure documentation.
[[azure-module-setup]]
==== Set up the module
Modify this command for your environment, and run it from the Logstash
directory.
["source","shell",subs="attributes"]
-----
bin/logstash --modules azure --setup \
-M "azure.var.elasticsearch.username={username}" \
-M "azure.var.elasticsearch.password={pwd}" \
-M "azure.var.kibana.username={username}" \
-M "azure.var.kibana.password={pwd}" \
-M "azure.var.elasticsearch.hosts={hostname}" \
-M "azure.var.kibana.host={hostname}"
-----
The `--modules azure` option starts a Logstash pipeline for ingestion from Azure
Event Hubs. The `--setup` option creates an `azure-*` index pattern in
Elasticsearch and imports Kibana dashboards and visualizations.
NOTE: The `--setup` option is intended only for first-time setup. If you include
`--setup` on subsequent runs, your existing Kibana dashboards will be
overwritten.
[[configuring-azure]]
==== Configure the module
You can specify <<azure_config_options,options>> for the Logstash Azure module in the
`logstash.yml` configuration file or with overrides through the command line.
* *Command line configuration.* You can pass configuration options and run the module from the command line.
The command line configuration uses the configuration options you provide and supports only one Event Hub.
* *Basic configuration.* You can use the `logstash.yml` file to configure inputs from multiple Event Hubs that share the same configuration.
* *Advanced configuration.* The advanced configuration is available for deployments where different Event Hubs
require different configurations. The `logstash.yml` file holds your settings. Advanced configuration is not necessary or
recommended for most use cases.
See {logstash-ref}/plugins-inputs-azure_event_hubs.html[azure_event_hubs plugin
documentation] for more information about basic and advanced configuration
models.
[[command-line-sample]]
===== Command line configuration sample
You can use the command line to set up the basic configuration for a single
Event Hub. This command starts the Azure module with command line arguments,
bypassing any settings in `logstash.yml`.
["source","shell",subs="attributes"]
-----
bin/logstash --modules azure -M "azure.var.elasticsearch.host=es.mycloud.com" -M "azure.var.input.azure_event_hubs.threads=8" -M "azure.var.input.azure_event_hubs.consumer_group=logstash" -M "azure.var.input.azure_event_hubs.decorate_events=true" -M "azure.var.input.azure_event_hubs.event_hub_connections=Endpoint=sb://example1...EntityPath=insights-logs-errors" -M "azure.var.input.azure_event_hubs.storage_connection=DefaultEndpointsProtocol=https;AccountName=example...."
-----
The command line configuration requires the `azure.var.input.azure_event_hubs.` prefix before a configuration option.
Notice the notation for the `threads` option.
[[basic-config-sample]]
===== Basic configuration sample
The configuration in the `logstash.yml` file is shared between Event Hubs.
["source","shell",subs="attributes"]
-----
modules:
- name: azure
var.elasticsearch.hosts: "localhost:9200"
var.kibana.host: "localhost:5601"
var.input.azure_event_hubs.threads: 8
var.input.azure_event_hubs.decorate_events: true
var.input.azure_event_hubs.consumer_group: "logstash"
var.input.azure_event_hubs.storage_connection: "DefaultEndpointsProtocol=https;AccountName=example...."
var.input.azure_event_hubs.event_hub_connections:
- "Endpoint=sb://example1...EntityPath=insights-logs-errors"
- "Endpoint=sb://example2...EntityPath=insights-metrics-pt1m"
-----
The basic configuration requires the `var.input.azure_event_hubs.` prefix
before a configuration option.
Notice the notation for the `threads` option.
[[adv-config-sample]]
===== Advanced configuration sample
Advanced configuration in the `logstash.yml` file supports Event Hub specific options.
Advanced configuration is not necessary or recommended for most use cases. Use
it only if it is required for your deployment scenario.
You must define the `header` array with `name` in the first position. You can
define other options in any order. The per Event Hub configuration takes
precedence. Any values not defined per Event Hub use the global config value.
In this example 'consumer_group' will be applied to each of the configured Event
Hubs. Note that 'decorate_events' is defined in both the 'global' and per Event Hub
configuration. The per Event Hub configuration takes precedence, and the
global configuration is effectively ignored.
["source","shell",subs="attributes"]
-----
modules:
- name: azure
var.elasticsearch.hosts: "localhost:9200"
var.kibana.host: "localhost:5601"
var.input.azure_event_hubs.threads: 8
var.input.azure_event_hubs.decorate_events: true
var.input.azure_event_hubs.consumer_group: logstash
var.input.azure_event_hubs.event_hubs:
- ["name", "event_hub_connection", "storage_connection", "initial_position", "decorate_events"]
- ["insights-operational-logs", "Endpoint=sb://example1...", "DefaultEndpointsProtocol=https;AccountName=example1....", "HEAD", "true"]
- ["insights-metrics-pt1m", "Endpoint=sb://example2...", "DefaultEndpointsProtocol=https;AccountName=example2....", "TAIL", "true"]
- ["insights-logs-errors", "Endpoint=sb://example3...", "DefaultEndpointsProtocol=https;AccountName=example3....", "TAIL", "false"]
- ["insights-operational-logs", "Endpoint=sb://example4...", "DefaultEndpointsProtocol=https;AccountName=example4....", "HEAD", "true"]
-----
The advanced configuration doesn't require a prefix before a per Event Hub
configuration option. Notice the notation for the `decorate_events` option.
[[azure_best_practices]]
===== Best practices
Here are some guidelines to help you avoid data conflicts that can cause lost
events.
* **Create a {ls} consumer group.**
Create a new consumer group specifically for {ls}. Do not use the $default or
any other consumer group that might already be in use. Reusing consumer groups
among non-related consumers can cause unexpected behavior and possibly lost
events. All {ls} instances should use the same consumer group so that they can
work together for processing events.
* **Avoid overwriting offset with multiple Event Hubs.**
The offsets (position) of the Event Hubs are stored in the configured Azure Blob
store. The Azure Blob store uses paths like a file system to store the offsets.
If the paths between multiple Event Hubs overlap, then the offsets may be stored
incorrectly.
To avoid duplicate file paths, use the advanced configuration model and make
sure that at least one of these options is different per Event Hub:
** storage_connection
** storage_container (defaults to Event Hub name if not defined)
** consumer_group
[[azure_config_options]]
===== Configuration options
NOTE: All Event Hubs options are common to both basic and advanced
configurations, with the following exceptions. The basic configuration uses
`event_hub_connections` to support multiple connections. The advanced
configuration uses `event_hubs` and `event_hub_connection` (singular).
[[azure_event_hubs]]
===== `event_hubs`
* Value type is <<array,array>>
* No default value
* Ignored for basic and command line configuration
* Required for advanced configuration
Defines the per Event Hubs configuration for the <<adv-config-sample,advanced configuration>>.
The advanced configuration uses `event_hub_connection` instead of `event_hub_connections`.
The `event_hub_connection` option is defined per Event Hub.
[[azure_event_hub_connections]]
===== `event_hub_connections`
* Value type is <<array,array>>
* No default value
* Required for basic and command line configuration
* Ignored for advanced configuration
List of connection strings that identifies the Event Hubs to be read. Connection
strings include the EntityPath for the Event Hub.
[[azure_checkpoint_interval]]
===== `checkpoint_interval`
* Value type is <<number,number>>
* Default value is `5` seconds
* Set to `0` to disable.
Interval in seconds to write checkpoints during batch processing. Checkpoints
tell {ls} where to resume processing after a restart. Checkpoints are
automatically written at the end of each batch, regardless of this setting.
Writing checkpoints too frequently can slow down processing unnecessarily.
[[azure_consumer_group]]
===== `consumer_group`
* Value type is <<string,string>>
* Default value is `$Default`
Consumer group used to read the Event Hub(s). Create a consumer group
specifically for Logstash. Then ensure that all instances of Logstash use that
consumer group so that they can work together properly.
[[azure_decorate_events]]
===== `decorate_events`
* Value type is <<boolean,boolean>>
* Default value is `false`
Adds metadata about the Event Hub, including `Event Hub name`, `consumer_group`,
`processor_host`, `partition`, `offset`, `sequence`, `timestamp`, and `event_size`.
[[azure_initial_position]]
===== `initial_position`
* Value type is <<string,string>>
* Valid arguments are `beginning`, `end`, `look_back`
* Default value is `beginning`
When first reading from an Event Hub, start from this position:
* `beginning` reads all pre-existing events in the Event Hub
* `end` does not read any pre-existing events in the Event Hub
* `look_back` reads `end` minus a number of seconds worth of pre-existing events.
You control the number of seconds using the `initial_position_look_back` option.
Note: If `storage_connection` is set, the `initial_position` value is used only
the first time Logstash reads from the Event Hub.
[[azure_initial_position_look_back]]
===== `initial_position_look_back`
* Value type is <<number,number>>
* Default value is `86400`
* Used only if `initial_position` is set to `look-back`
Number of seconds to look back to find the initial position for pre-existing
events. This option is used only if `initial_position` is set to `look_back`. If
`storage_connection` is set, this configuration applies only the first time {ls}
reads from the Event Hub.
[[azure_max_batch_size]]
===== `max_batch_size`
* Value type is <<number,number>>
* Default value is `125`
Maximum number of events retrieved and processed together. A checkpoint is
created after each batch. Increasing this value may help with performance, but
requires more memory.
[[azure_storage_connection]]
===== `storage_connection`
* Value type is <<string,string>>
* No default value
Connection string for blob account storage. Blob account storage persists the
offsets between restarts, and ensures that multiple instances of Logstash
process different partitions.
When this value is set, restarts resume where processing left off.
When this value is not set, the `initial_position` value is used on every restart.
We strongly recommend that you define this value for production environments.
[[azure_storage_container]]
===== `storage_container`
* Value type is <<string,string>>
* Defaults to the Event Hub name if not defined
Name of the storage container used to persist offsets and allow multiple instances of {ls}
to work together.
To avoid overwriting offsets, you can use different storage containers. This is
particularly important if you are monitoring two Event Hubs with the same name.
You can use the advanced configuration model to configure different storage
containers.
[[azure_threads]]
===== `threads`
* Value type is <<number,number>>
* Minimum value is `2`
* Default value is `4`
Total number of threads used to process events. The value you set here applies
to all Event Hubs. Even with advanced configuration, this value is a global
setting, and can't be set per event hub.
include::shared-module-options.asciidoc[]
[[run-azure]]
==== Start the module
. Be sure that the `logstash.yml` file is <<configuring-azure,configured correctly>>.
. Run this command from the Logstash directory:
["source","shell",subs="attributes"]
-----
bin/logstash
-----
[[exploring-data-azure]]
==== Explore your data
When the Logstash Azure module starts receiving events, you can begin using the
packaged Kibana dashboards to explore and visualize your data.
To explore your data with Kibana:
. Open a browser to http://localhost:5601[http://localhost:5601] (username:
"elastic"; password: "{pwd}")
. Click *Dashboard*.
. Select the dashboard you want to see.
==== Azure module schema
This module reads data from the Azure Event Hub and adds some additional structure to the data for Activity Logs and SQL Diagnostics. The original data is always preserved and any data added or parsed will be namespaced under 'azure'. For example, 'azure.subscription' may have been parsed from a longer more complex URN.
[cols="<,<,<",options="header",]
|=======================================================================
|Name |Description|Notes
|azure.subscription |Azure subscription from which this data originates. |Some Activity Log events may not be associated with a subscription.
|azure.group |Primary type of data. |Current values are either 'activity_log' or 'sql_diagnostics'
|azure.category* |Secondary type of data specific to group from which the data originated |
|azure.provider |Azure provider |
|azure.resource_group |Azure resource group |
|azure.resource_type |Azure resource type |
|azure.resource_name |Azure resource name |
|azure.database |Azure database name, for display purposes |SQL Diagnostics only
|azure.db_unique_id |Azure database name that is guaranteed to be unique |SQL Diagnostics only
|azure.server |Azure server for the database |SQL Diagnostics only
|azure.server_and_database |Azure server and database combined |SQL Diagnostics only
|azure.metadata |Any @metadata added by the plugins, for example var.input.azure_event_hubs.decorate_events: true
|=======================================================================
Notes:
* Activity Logs can have the following categories: Administrative, ServiceHealth, Alert, Autoscale, Security
* SQL Diagnostics can have the following categories: Metric, Blocks, Errors, Timeouts, QueryStoreRuntimeStatistics, QueryStoreWaitStatistics, DatabaseWaitStatistics, SQLInsights
Microsoft documents Activity log schema
https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-activity-log-schema[here].
The SQL Diagnostics data is documented
https://docs.microsoft.com/en-us/azure/sql-database/sql-database-metrics-diag-logging[here].
Elastic does not own these data models, and as such, cannot make any
assurances of information accuracy or passivity.
===== Special note - Properties field
Many of the logs contain a `properties` top level field. This is often where the
most interesting data lives. There is not a fixed schema between log types for
properties fields coming from different sources.
For example, one log may have
`properties.type` where one log sets this a String type and another sets this an
Integer type. To avoid mapping errors, the original properties field is moved to
`<azure.group>_<azure_category>_properties.<original_key>`.
For example
`properties.type` may end up as `sql_diagnostics_Errors_properties.type` or
`activity_log_Security_properties.type` depending on the group/category where
the event originated.
[[azure-production]]
==== Deploying the module in production
Use security best practices to secure your configuration.
See {stack-ov}/xpack-security.html for details and recommendations.
[[azure-resources]]
==== Microsoft Azure resources
Microsoft is the best source for the most up-to-date Azure information.
* https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-overview-azure-monitor[Overview of Azure Monitor]
* https://docs.microsoft.com/en-us/azure/sql-database/sql-database-metrics-diag-logging[Azure SQL Database metrics and diagnostics logging]
* https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-stream-activity-logs-event-hubs[Stream the Azure Activity Log to Event Hubs]