elasticsearch/docs/reference/ingest/processors/geoip.asciidoc

428 lines
14 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[geoip-processor]]
=== GeoIP processor
++++
<titleabbrev>GeoIP</titleabbrev>
++++
The `geoip` processor adds information about the geographical location of an
IPv4 or IPv6 address.
[[geoip-automatic-updates]]
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
ASN GeoIP2 databases from
http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
CC BY-SA 4.0 license. {es} automatically downloads updates for
these databases from the Elastic GeoIP endpoint:
https://geoip.elastic.co/v1/database. To get download statistics for these
updates, use the <<geoip-stats-api,GeoIP stats API>>.
If your cluster can't connect to the Elastic GeoIP endpoint or you want to
manage your own updates, see <<manage-geoip-database-updates>>.
If {es} can't connect to the endpoint for 30 days all updated databases will become
invalid. {es} will stop enriching documents with geoip data and will add `tags: ["_geoip_expired_database"]`
field instead.
[[using-ingest-geoip]]
==== Using the `geoip` Processor in a Pipeline
[[ingest-geoip-options]]
.`geoip` options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
|======
*Depends on what is available in `database_file`:
* If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
* If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
were configured in `properties`.
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
in `properties`.
Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
[source,console]
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip"
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=geoip
{
"ip": "89.160.20.128"
}
GET my-index-000001/_doc/my_id
--------------------------------------------------
Which returns:
[source,console-result]
--------------------------------------------------
{
"found": true,
"_index": "my-index-000001",
"_id": "my_id",
"_version": 1,
"_seq_no": 55,
"_primary_term": 1,
"_source": {
"ip": "89.160.20.128",
"geoip": {
"continent_name": "Europe",
"country_name": "Sweden",
"country_iso_code": "SE",
"city_name" : "Linköping",
"region_iso_code" : "SE-E",
"region_name" : "Östergötland County",
"location": { "lat": 58.4167, "lon": 15.6167 }
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
Here is an example that uses the default country database and adds the
geographical information to the `geo` field based on the `ip` field. Note that
this database is included in the module. So this:
[source,console]
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip",
"target_field" : "geo",
"database_file" : "GeoLite2-Country.mmdb"
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=geoip
{
"ip": "89.160.20.128"
}
GET my-index-000001/_doc/my_id
--------------------------------------------------
returns this:
[source,console-result]
--------------------------------------------------
{
"found": true,
"_index": "my-index-000001",
"_id": "my_id",
"_version": 1,
"_seq_no": 65,
"_primary_term": 1,
"_source": {
"ip": "89.160.20.128",
"geo": {
"continent_name": "Europe",
"country_name": "Sweden",
"country_iso_code": "SE"
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
Not all IP addresses find geo information from the database, When this
occurs, no `target_field` is inserted into the document.
Here is an example of what documents will be indexed as when information for "80.231.5.0"
cannot be found:
[source,console]
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip"
}
}
]
}
PUT my-index-000001/_doc/my_id?pipeline=geoip
{
"ip": "80.231.5.0"
}
GET my-index-000001/_doc/my_id
--------------------------------------------------
Which returns:
[source,console-result]
--------------------------------------------------
{
"_index" : "my-index-000001",
"_id" : "my_id",
"_version" : 1,
"_seq_no" : 71,
"_primary_term": 1,
"found" : true,
"_source" : {
"ip" : "80.231.5.0"
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
[[ingest-geoip-mappings-note]]
===== Recognizing Location as a Geopoint
Although this processor enriches your document with a `location` field containing
the estimated latitude and longitude of the IP address, this field will not be
indexed as a {ref}/geo-point.html[`geo_point`] type in Elasticsearch without explicitly defining it
as such in the mapping.
You can use the following mapping for the example index above:
[source,console]
--------------------------------------------------
PUT my_ip_locations
{
"mappings": {
"properties": {
"geoip": {
"properties": {
"location": { "type": "geo_point" }
}
}
}
}
}
--------------------------------------------------
////
[source,console]
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip"
}
}
]
}
PUT my_ip_locations/_doc/1?refresh=true&pipeline=geoip
{
"ip": "89.160.20.128"
}
GET /my_ip_locations/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1m",
"geoip.location": {
"lon": 15.6167,
"lat": 58.4167
}
}
}
}
}
}
--------------------------------------------------
// TEST[continued]
[source,console-result]
--------------------------------------------------
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value": 1,
"relation": "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_ip_locations",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"geoip" : {
"continent_name" : "Europe",
"country_name" : "Sweden",
"country_iso_code" : "SE",
"city_name" : "Linköping",
"region_iso_code" : "SE-E",
"region_name" : "Östergötland County",
"location" : {
"lon" : 15.6167,
"lat" : 58.4167
}
},
"ip" : "89.160.20.128"
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 3/"took" : $body.took/]
////
[[manage-geoip-database-updates]]
==== Manage your own GeoIP2 database updates
If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
databases from the Elastic endpoint, you have a few other options:
* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
* <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
[[use-proxy-geoip-endpoint]]
**Use a proxy endpoint**
If you can't connect directly to the Elastic GeoIP endpoint, consider setting up
a secure proxy. You can then specify the proxy endpoint URL in the
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
of each nodes `elasticsearch.yml` file.
In a strict setup the following domains may need to be added to the allowed
domains list:
* `geoip.elastic.co`
* `storage.googleapis.com`
[[use-custom-geoip-endpoint]]
**Use a custom endpoint**
You can create a service that mimics the Elastic GeoIP endpoint. You can then
get automatic updates from this service.
. Download your `.mmdb` database files from the
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
. Copy your database files to a single directory.
. From your {es} directory, run:
+
[source,sh]
----
./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]
----
. Serve the static database files from your directory. For example, you can use
Docker to serve the files from an nginx server:
+
[source,sh]
----
docker run -v my/source/dir:/usr/share/nginx/html:ro nginx
----
. Specify the service's endpoint URL in the
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
of each nodes `elasticsearch.yml` file.
+
By default, {es} checks the endpoint for updates every three days. To use
another polling interval, use the <<cluster-update-settings,cluster update
settings API>> to set
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.
[[manually-update-geoip-databases]]
**Manually update your GeoIP2 databases**
. Use the <<cluster-update-settings,cluster update settings API>> to set
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
that may overwrite your database changes. This also deletes all downloaded
databases.
. Download your `.mmdb` database files from the
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
+
You can also use custom city, country, and ASN `.mmdb` files. These files must
be uncompressed and use the respective `-City.mmdb`, `-Country.mmdb`, or
`-ASN.mmdb` extensions.
. On {ess} deployments upload database using
a {cloud}/ec-custom-bundles.html[custom bundle].
. On self-managed deployments copy the database files to `$ES_CONFIG/ingest-geoip`.
. In your `geoip` processors, configure the `database_file` parameter to use a
custom database file.
[[ingest-geoip-settings]]
===== Node Settings
The `geoip` processor supports the following setting:
`ingest.geoip.cache_size`::
The maximum number of results that should be cached. Defaults to `1000`.
Note that these settings are node settings and apply to all `geoip` processors, i.e. there is one cache for all defined `geoip` processors.
[[geoip-cluster-settings]]
===== Cluster settings
[[ingest-geoip-downloader-enabled]]
`ingest.geoip.downloader.enabled`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
updates and deletes all downloaded databases. Defaults to `true`.
[[ingest-geoip-downloader-endpoint]]
`ingest.geoip.downloader.endpoint`::
(<<static-cluster-setting,Static>>, string)
Endpoint URL used to download updates for GeoIP2 databases. Defaults to
`https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
each node's <<es-tmpdir,temporary directory>> at
`$ES_TMPDIR/geoip-databases/<node_id>`.
[[ingest-geoip-downloader-poll-interval]]
`ingest.geoip.downloader.poll.interval`::
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
How often {es} checks for GeoIP2 database updates at the
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
to `3d` (three days).