[[geoip-processor]] === GeoIP processor ++++ GeoIP ++++ The `geoip` processor adds information about the geographical location of an IPv4 or IPv6 address. [[geoip-automatic-updates]] By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN IP geolocation databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either: * `ingest.geoip.downloader.eager.download` is set to true * your cluster has at least one pipeline with a `geoip` processor {es} automatically downloads updates for these databases from the Elastic GeoIP endpoint: https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree[https://geoip.elastic.co/v1/database]. To get download statistics for these updates, use the <>. If your cluster can't connect to the Elastic GeoIP endpoint or you want to manage your own updates, see <>. If {es} can't connect to the endpoint for 30 days all updated databases will become invalid. {es} will stop enriching documents with geoip data and will add `tags: ["_geoip_expired_database"]` field instead. [[using-ingest-geoip]] ==== Using the `geoip` Processor in a Pipeline [[ingest-geoip-options]] .`geoip` options [options="header"] |====== | Name | Required | Default | Description | `field` | yes | - | The field to get the ip address from for the geographical lookup. | `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database. | `database_file` | no | GeoLite2-City.mmdb | The database filename referring to one of the automatically downloaded GeoLite2 databases (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or the name of a supported database file in the `ingest-geoip` config directory. | `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup. | `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document | `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array | `download_database_on_pipeline_creation` | no | `true` | If `true` (and if `ingest.geoip.downloader.eager.download` is `false`), the missing database is downloaded when the pipeline is created. Else, the download is triggered by when the pipeline is used as the `default_pipeline` or `final_pipeline` in an index. |====== *Depends on what is available in `database_file`: * If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`, `country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`. * If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`, `country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties were configured in `properties`. * If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`, `asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured in `properties`. * If the GeoIP2 Anonymous IP database is used, then the following fields may be added under the `target_field`: `ip`, `hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added depend on what has been found and which properties were configured in `properties`. * If the GeoIP2 Domain database is used, then the following fields may be added under the `target_field`: `ip`, and `domain`. * If the GeoIP2 Enterprise database is used, then the following fields may be added under the `target_field`: `ip`, `country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `location`, `asn`, `organization_name`, `network`, `hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added depend on what has been found and which properties were configured in `properties`. Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field: [source,console] -------------------------------------------------- PUT _ingest/pipeline/geoip { "description" : "Add geoip info", "processors" : [ { "geoip" : { "field" : "ip" } } ] } PUT my-index-000001/_doc/my_id?pipeline=geoip { "ip": "89.160.20.128" } GET my-index-000001/_doc/my_id -------------------------------------------------- Which returns: [source,console-result] -------------------------------------------------- { "found": true, "_index": "my-index-000001", "_id": "my_id", "_version": 1, "_seq_no": 55, "_primary_term": 1, "_source": { "ip": "89.160.20.128", "geoip": { "continent_name": "Europe", "country_name": "Sweden", "country_iso_code": "SE", "city_name" : "Linköping", "region_iso_code" : "SE-E", "region_name" : "Östergötland County", "location": { "lat": 58.4167, "lon": 15.6167 } } } } -------------------------------------------------- // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/] Here is an example that uses the default country database and adds the geographical information to the `geo` field based on the `ip` field. Note that this database is downloaded automatically. So this: [source,console] -------------------------------------------------- PUT _ingest/pipeline/geoip { "description" : "Add geoip info", "processors" : [ { "geoip" : { "field" : "ip", "target_field" : "geo", "database_file" : "GeoLite2-Country.mmdb" } } ] } PUT my-index-000001/_doc/my_id?pipeline=geoip { "ip": "89.160.20.128" } GET my-index-000001/_doc/my_id -------------------------------------------------- returns this: [source,console-result] -------------------------------------------------- { "found": true, "_index": "my-index-000001", "_id": "my_id", "_version": 1, "_seq_no": 65, "_primary_term": 1, "_source": { "ip": "89.160.20.128", "geo": { "continent_name": "Europe", "country_name": "Sweden", "country_iso_code": "SE" } } } -------------------------------------------------- // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] Not all IP addresses find geo information from the database, When this occurs, no `target_field` is inserted into the document. Here is an example of what documents will be indexed as when information for "80.231.5.0" cannot be found: [source,console] -------------------------------------------------- PUT _ingest/pipeline/geoip { "description" : "Add geoip info", "processors" : [ { "geoip" : { "field" : "ip" } } ] } PUT my-index-000001/_doc/my_id?pipeline=geoip { "ip": "80.231.5.0" } GET my-index-000001/_doc/my_id -------------------------------------------------- Which returns: [source,console-result] -------------------------------------------------- { "_index" : "my-index-000001", "_id" : "my_id", "_version" : 1, "_seq_no" : 71, "_primary_term": 1, "found" : true, "_source" : { "ip" : "80.231.5.0" } } -------------------------------------------------- // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] [[ingest-geoip-mappings-note]] ===== Recognizing Location as a Geopoint Although this processor enriches your document with a `location` field containing the estimated latitude and longitude of the IP address, this field will not be indexed as a {ref}/geo-point.html[`geo_point`] type in Elasticsearch without explicitly defining it as such in the mapping. You can use the following mapping for the example index above: [source,console] -------------------------------------------------- PUT my_ip_locations { "mappings": { "properties": { "geoip": { "properties": { "location": { "type": "geo_point" } } } } } } -------------------------------------------------- //// [source,console] -------------------------------------------------- PUT _ingest/pipeline/geoip { "description" : "Add geoip info", "processors" : [ { "geoip" : { "field" : "ip" } } ] } PUT my_ip_locations/_doc/1?refresh=true&pipeline=geoip { "ip": "89.160.20.128" } GET /my_ip_locations/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "geo_distance": { "distance": "1m", "geoip.location": { "lon": 15.6167, "lat": 58.4167 } } } } } } -------------------------------------------------- // TEST[continued] [source,console-result] -------------------------------------------------- { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value": 1, "relation": "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "my_ip_locations", "_id" : "1", "_score" : 1.0, "_source" : { "geoip" : { "continent_name" : "Europe", "country_name" : "Sweden", "country_iso_code" : "SE", "city_name" : "Linköping", "region_iso_code" : "SE-E", "region_name" : "Östergötland County", "location" : { "lon" : 15.6167, "lat" : 58.4167 } }, "ip" : "89.160.20.128" } } ] } } -------------------------------------------------- // TESTRESPONSE[s/"took" : 3/"took" : $body.took/] //// [[manage-geoip-database-updates]] ==== Manage your own IP geolocation database updates If you can't <> your IP geolocation databases from the Elastic endpoint, you have a few other options: * <> * <> * <> [[use-proxy-geoip-endpoint]] **Use a proxy endpoint** If you can't connect directly to the Elastic GeoIP endpoint, consider setting up a secure proxy. You can then specify the proxy endpoint URL in the <> setting of each node’s `elasticsearch.yml` file. In a strict setup the following domains may need to be added to the allowed domains list: * `geoip.elastic.co` * `storage.googleapis.com` [[use-custom-geoip-endpoint]] **Use a custom endpoint** You can create a service that mimics the Elastic GeoIP endpoint. You can then get automatic updates from this service. . Download your `.mmdb` database files from the http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site]. . Copy your database files to a single directory. . From your {es} directory, run: + [source,sh] ---- ./bin/elasticsearch-geoip -s my/source/dir [-t target/directory] ---- . Serve the static database files from your directory. For example, you can use Docker to serve the files from an nginx server: + [source,sh] ---- docker run -v my/source/dir:/usr/share/nginx/html:ro nginx ---- . Specify the service's endpoint URL in the <> setting of each node’s `elasticsearch.yml` file. + By default, {es} checks the endpoint for updates every three days. To use another polling interval, use the <> to set <>. [[manually-update-geoip-databases]] **Manually update your IP geolocation databases** . Use the <> to set `ingest.geoip.downloader.enabled` to `false`. This disables automatic updates that may overwrite your database changes. This also deletes all downloaded databases. . Download your `.mmdb` database files from the http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site]. + You can also use custom city, country, and ASN `.mmdb` files. These files must be uncompressed. The type (city, country, or ASN) will be pulled from the file metadata, so the filename does not matter. . On {ess} deployments upload database using a {cloud}/ec-custom-bundles.html[custom bundle]. . On self-managed deployments copy the database files to `$ES_CONFIG/ingest-geoip`. . In your `geoip` processors, configure the `database_file` parameter to use a custom database file. [[ingest-geoip-settings]] ===== Node Settings The `geoip` processor supports the following setting: `ingest.geoip.cache_size`:: The maximum number of results that should be cached. Defaults to `1000`. Note that these settings are node settings and apply to all `geoip` processors, i.e. there is one cache for all defined `geoip` processors. [[geoip-cluster-settings]] ===== Cluster settings [[ingest-geoip-downloader-enabled]] `ingest.geoip.downloader.enabled`:: (<>, Boolean) If `true`, {es} automatically downloads and manages updates for IP geolocation databases from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download updates and deletes all downloaded databases. Defaults to `true`. [[ingest-geoip-downloader-eager-download]] `ingest.geoip.downloader.eager.download`:: (<>, Boolean) If `true`, {es} downloads IP geolocation databases immediately, regardless of whether a pipeline exists with a geoip processor. If `false`, {es} only begins downloading the databases if a pipeline with a geoip processor exists or is added. Defaults to `false`. [[ingest-geoip-downloader-endpoint]] `ingest.geoip.downloader.endpoint`:: (<>, string) Endpoint URL used to download updates for IP geolocation databases. For example, `https://myDomain.com/overview.json`. Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in each node's <> at `$ES_TMPDIR/geoip-databases/`. Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`, expecting the list of metadata about databases typically found in `overview.json`. The GeoIP downloader uses the JDK's builtin cacerts. If you're using a custom endpoint, add the custom https endpoint cacert(s) to the JDK's truststore. [[ingest-geoip-downloader-poll-interval]] `ingest.geoip.downloader.poll.interval`:: (<>, <>) How often {es} checks for IP geolocation database updates at the `ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults to `3d` (three days).