mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 23:27:25 -04:00
453 lines
16 KiB
Text
453 lines
16 KiB
Text
[[geoip-processor]]
|
||
=== GeoIP processor
|
||
++++
|
||
<titleabbrev>GeoIP</titleabbrev>
|
||
++++
|
||
|
||
The `geoip` processor adds information about the geographical location of an
|
||
IPv4 or IPv6 address.
|
||
|
||
[[geoip-automatic-updates]]
|
||
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
|
||
ASN IP geolocation databases from http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
|
||
CC BY-SA 4.0 license. It automatically downloads these databases if your nodes can connect to `storage.googleapis.com` domain and either:
|
||
|
||
* `ingest.geoip.downloader.eager.download` is set to true
|
||
* your cluster has at least one pipeline with a `geoip` processor
|
||
|
||
{es} automatically downloads updates for these databases from the Elastic GeoIP
|
||
endpoint:
|
||
https://geoip.elastic.co/v1/database?elastic_geoip_service_tos=agree[https://geoip.elastic.co/v1/database].
|
||
To get download statistics for these updates, use the <<geoip-stats-api,GeoIP
|
||
stats API>>.
|
||
|
||
If your cluster can't connect to the Elastic GeoIP endpoint or you want to
|
||
manage your own updates, see <<manage-geoip-database-updates>>.
|
||
|
||
If {es} can't connect to the endpoint for 30 days all updated databases will become
|
||
invalid. {es} will stop enriching documents with geoip data and will add `tags: ["_geoip_expired_database"]`
|
||
field instead.
|
||
|
||
[[using-ingest-geoip]]
|
||
==== Using the `geoip` Processor in a Pipeline
|
||
|
||
[[ingest-geoip-options]]
|
||
.`geoip` options
|
||
[options="header"]
|
||
|======
|
||
| Name | Required | Default | Description
|
||
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
|
||
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
|
||
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to one of the automatically downloaded GeoLite2 databases (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or the name of a supported database file in the `ingest-geoip` config directory.
|
||
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
|
||
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
|
||
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
|
||
| `download_database_on_pipeline_creation` | no | `true` | If `true` (and if `ingest.geoip.downloader.eager.download` is `false`), the missing database is downloaded when the pipeline is created. Else, the download is triggered by when the pipeline is used as the `default_pipeline` or `final_pipeline` in an index.
|
||
|======
|
||
|
||
*Depends on what is available in `database_file`:
|
||
|
||
* If a GeoLite2 City or GeoIP2 City database is used, then the following fields may be added under the `target_field`: `ip`,
|
||
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`,
|
||
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
|
||
* If a GeoLite2 Country or GeoIP2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
|
||
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
|
||
were configured in `properties`.
|
||
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
|
||
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
|
||
in `properties`.
|
||
* If the GeoIP2 Anonymous IP database is used, then the following fields may be added under the `target_field`: `ip`,
|
||
`hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`. The fields actually added
|
||
depend on what has been found and which properties were configured in `properties`.
|
||
* If the GeoIP2 Domain database is used, then the following fields may be added under the `target_field`: `ip`, and `domain`.
|
||
* If the GeoIP2 Enterprise database is used, then the following fields may be added under the `target_field`: `ip`,
|
||
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `location`, `asn`,
|
||
`organization_name`, `network`, `hosting_provider`, `tor_exit_node`, `anonymous_vpn`, `anonymous`, `public_proxy`, and `residential_proxy`.
|
||
The fields actually added depend on what has been found and which properties were configured in `properties`.
|
||
|
||
|
||
Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _ingest/pipeline/geoip
|
||
{
|
||
"description" : "Add geoip info",
|
||
"processors" : [
|
||
{
|
||
"geoip" : {
|
||
"field" : "ip"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
PUT my-index-000001/_doc/my_id?pipeline=geoip
|
||
{
|
||
"ip": "89.160.20.128"
|
||
}
|
||
GET my-index-000001/_doc/my_id
|
||
--------------------------------------------------
|
||
|
||
Which returns:
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"found": true,
|
||
"_index": "my-index-000001",
|
||
"_id": "my_id",
|
||
"_version": 1,
|
||
"_seq_no": 55,
|
||
"_primary_term": 1,
|
||
"_source": {
|
||
"ip": "89.160.20.128",
|
||
"geoip": {
|
||
"continent_name": "Europe",
|
||
"country_name": "Sweden",
|
||
"country_iso_code": "SE",
|
||
"city_name" : "Linköping",
|
||
"region_iso_code" : "SE-E",
|
||
"region_name" : "Östergötland County",
|
||
"location": { "lat": 58.4167, "lon": 15.6167 }
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
|
||
|
||
Here is an example that uses the default country database and adds the
|
||
geographical information to the `geo` field based on the `ip` field. Note that
|
||
this database is downloaded automatically. So this:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _ingest/pipeline/geoip
|
||
{
|
||
"description" : "Add geoip info",
|
||
"processors" : [
|
||
{
|
||
"geoip" : {
|
||
"field" : "ip",
|
||
"target_field" : "geo",
|
||
"database_file" : "GeoLite2-Country.mmdb"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
PUT my-index-000001/_doc/my_id?pipeline=geoip
|
||
{
|
||
"ip": "89.160.20.128"
|
||
}
|
||
GET my-index-000001/_doc/my_id
|
||
--------------------------------------------------
|
||
|
||
returns this:
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"found": true,
|
||
"_index": "my-index-000001",
|
||
"_id": "my_id",
|
||
"_version": 1,
|
||
"_seq_no": 65,
|
||
"_primary_term": 1,
|
||
"_source": {
|
||
"ip": "89.160.20.128",
|
||
"geo": {
|
||
"continent_name": "Europe",
|
||
"country_name": "Sweden",
|
||
"country_iso_code": "SE"
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
|
||
|
||
|
||
Not all IP addresses find geo information from the database, When this
|
||
occurs, no `target_field` is inserted into the document.
|
||
|
||
Here is an example of what documents will be indexed as when information for "80.231.5.0"
|
||
cannot be found:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _ingest/pipeline/geoip
|
||
{
|
||
"description" : "Add geoip info",
|
||
"processors" : [
|
||
{
|
||
"geoip" : {
|
||
"field" : "ip"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
|
||
PUT my-index-000001/_doc/my_id?pipeline=geoip
|
||
{
|
||
"ip": "80.231.5.0"
|
||
}
|
||
|
||
GET my-index-000001/_doc/my_id
|
||
--------------------------------------------------
|
||
|
||
Which returns:
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"_index" : "my-index-000001",
|
||
"_id" : "my_id",
|
||
"_version" : 1,
|
||
"_seq_no" : 71,
|
||
"_primary_term": 1,
|
||
"found" : true,
|
||
"_source" : {
|
||
"ip" : "80.231.5.0"
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
|
||
|
||
[[ingest-geoip-mappings-note]]
|
||
===== Recognizing Location as a Geopoint
|
||
Although this processor enriches your document with a `location` field containing
|
||
the estimated latitude and longitude of the IP address, this field will not be
|
||
indexed as a {ref}/geo-point.html[`geo_point`] type in Elasticsearch without explicitly defining it
|
||
as such in the mapping.
|
||
|
||
You can use the following mapping for the example index above:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT my_ip_locations
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"geoip": {
|
||
"properties": {
|
||
"location": { "type": "geo_point" }
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
////
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _ingest/pipeline/geoip
|
||
{
|
||
"description" : "Add geoip info",
|
||
"processors" : [
|
||
{
|
||
"geoip" : {
|
||
"field" : "ip"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
|
||
PUT my_ip_locations/_doc/1?refresh=true&pipeline=geoip
|
||
{
|
||
"ip": "89.160.20.128"
|
||
}
|
||
|
||
GET /my_ip_locations/_search
|
||
{
|
||
"query": {
|
||
"bool": {
|
||
"must": {
|
||
"match_all": {}
|
||
},
|
||
"filter": {
|
||
"geo_distance": {
|
||
"distance": "1m",
|
||
"geoip.location": {
|
||
"lon": 15.6167,
|
||
"lat": 58.4167
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[continued]
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"took" : 3,
|
||
"timed_out" : false,
|
||
"_shards" : {
|
||
"total" : 1,
|
||
"successful" : 1,
|
||
"skipped" : 0,
|
||
"failed" : 0
|
||
},
|
||
"hits" : {
|
||
"total" : {
|
||
"value": 1,
|
||
"relation": "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my_ip_locations",
|
||
"_id" : "1",
|
||
"_score" : 1.0,
|
||
"_source" : {
|
||
"geoip" : {
|
||
"continent_name" : "Europe",
|
||
"country_name" : "Sweden",
|
||
"country_iso_code" : "SE",
|
||
"city_name" : "Linköping",
|
||
"region_iso_code" : "SE-E",
|
||
"region_name" : "Östergötland County",
|
||
"location" : {
|
||
"lon" : 15.6167,
|
||
"lat" : 58.4167
|
||
}
|
||
},
|
||
"ip" : "89.160.20.128"
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"took" : 3/"took" : $body.took/]
|
||
////
|
||
|
||
[[manage-geoip-database-updates]]
|
||
==== Manage your own IP geolocation database updates
|
||
|
||
If you can't <<geoip-automatic-updates,automatically update>> your IP geolocation databases
|
||
from the Elastic endpoint, you have a few other options:
|
||
|
||
* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
|
||
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
|
||
* <<manually-update-geoip-databases,Manually update your IP geolocation databases>>
|
||
|
||
[[use-proxy-geoip-endpoint]]
|
||
**Use a proxy endpoint**
|
||
|
||
If you can't connect directly to the Elastic GeoIP endpoint, consider setting up
|
||
a secure proxy. You can then specify the proxy endpoint URL in the
|
||
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
|
||
of each node’s `elasticsearch.yml` file.
|
||
|
||
In a strict setup the following domains may need to be added to the allowed
|
||
domains list:
|
||
|
||
* `geoip.elastic.co`
|
||
* `storage.googleapis.com`
|
||
|
||
[[use-custom-geoip-endpoint]]
|
||
**Use a custom endpoint**
|
||
|
||
You can create a service that mimics the Elastic GeoIP endpoint. You can then
|
||
get automatic updates from this service.
|
||
|
||
. Download your `.mmdb` database files from the
|
||
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
|
||
|
||
. Copy your database files to a single directory.
|
||
|
||
. From your {es} directory, run:
|
||
+
|
||
[source,sh]
|
||
----
|
||
./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]
|
||
----
|
||
|
||
. Serve the static database files from your directory. For example, you can use
|
||
Docker to serve the files from an nginx server:
|
||
+
|
||
[source,sh]
|
||
----
|
||
docker run -v my/source/dir:/usr/share/nginx/html:ro nginx
|
||
----
|
||
|
||
. Specify the service's endpoint URL in the
|
||
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
|
||
of each node’s `elasticsearch.yml` file.
|
||
+
|
||
By default, {es} checks the endpoint for updates every three days. To use
|
||
another polling interval, use the <<cluster-update-settings,cluster update
|
||
settings API>> to set
|
||
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.
|
||
|
||
[[manually-update-geoip-databases]]
|
||
**Manually update your IP geolocation databases**
|
||
|
||
. Use the <<cluster-update-settings,cluster update settings API>> to set
|
||
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
|
||
that may overwrite your database changes. This also deletes all downloaded
|
||
databases.
|
||
|
||
. Download your `.mmdb` database files from the
|
||
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
|
||
+
|
||
You can also use custom city, country, and ASN `.mmdb` files. These files must
|
||
be uncompressed. The type (city, country, or ASN) will be pulled from the file
|
||
metadata, so the filename does not matter.
|
||
|
||
. On {ess} deployments upload database using
|
||
a {cloud}/ec-custom-bundles.html[custom bundle].
|
||
|
||
. On self-managed deployments copy the database files to `$ES_CONFIG/ingest-geoip`.
|
||
|
||
. In your `geoip` processors, configure the `database_file` parameter to use a
|
||
custom database file.
|
||
|
||
[[ingest-geoip-settings]]
|
||
===== Node Settings
|
||
|
||
The `geoip` processor supports the following setting:
|
||
|
||
`ingest.geoip.cache_size`::
|
||
|
||
The maximum number of results that should be cached. Defaults to `1000`.
|
||
|
||
Note that these settings are node settings and apply to all `geoip` processors, i.e. there is one cache for all defined `geoip` processors.
|
||
|
||
[[geoip-cluster-settings]]
|
||
===== Cluster settings
|
||
|
||
[[ingest-geoip-downloader-enabled]]
|
||
`ingest.geoip.downloader.enabled`::
|
||
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
|
||
If `true`, {es} automatically downloads and manages updates for IP geolocation databases
|
||
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
|
||
updates and deletes all downloaded databases. Defaults to `true`.
|
||
|
||
[[ingest-geoip-downloader-eager-download]]
|
||
`ingest.geoip.downloader.eager.download`::
|
||
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
|
||
If `true`, {es} downloads IP geolocation databases immediately, regardless of whether a
|
||
pipeline exists with a geoip processor. If `false`, {es} only begins downloading
|
||
the databases if a pipeline with a geoip processor exists or is added. Defaults
|
||
to `false`.
|
||
|
||
[[ingest-geoip-downloader-endpoint]]
|
||
`ingest.geoip.downloader.endpoint`::
|
||
(<<static-cluster-setting,Static>>, string)
|
||
Endpoint URL used to download updates for IP geolocation databases. For example, `https://myDomain.com/overview.json`.
|
||
Defaults to `https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
|
||
each node's <<es-tmpdir,temporary directory>> at `$ES_TMPDIR/geoip-databases/<node_id>`.
|
||
Note that {es} will make a GET request to `${ingest.geoip.downloader.endpoint}?elastic_geoip_service_tos=agree`,
|
||
expecting the list of metadata about databases typically found in `overview.json`.
|
||
|
||
The GeoIP downloader uses the JDK's builtin cacerts. If you're using a custom endpoint, add the custom https endpoint cacert(s) to the JDK's truststore.
|
||
|
||
[[ingest-geoip-downloader-poll-interval]]
|
||
`ingest.geoip.downloader.poll.interval`::
|
||
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
|
||
How often {es} checks for IP geolocation database updates at the
|
||
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
|
||
to `3d` (three days).
|