mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-27 00:27:25 -04:00
* Documentation for time-series geo_line * Fix incorrect ids in geoline docs * Some updates from review Added image of kibana map, improved first example, linked to TSDS and added section on line simplification with link to wikipedia. * Diagrams of truncation versus simplification
483 lines
19 KiB
Text
483 lines
19 KiB
Text
[role="xpack"]
|
||
[[search-aggregations-metrics-geo-line]]
|
||
=== Geo-Line Aggregation
|
||
++++
|
||
<titleabbrev>Geo-Line</titleabbrev>
|
||
++++
|
||
|
||
The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a `LineString` ordered
|
||
by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
|
||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.
|
||
|
||
[source,console,id=search-aggregations-metrics-geo-line-simple]
|
||
----
|
||
PUT test
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"my_location": { "type": "geo_point" },
|
||
"group": { "type": "keyword" },
|
||
"@timestamp": { "type": "date" }
|
||
}
|
||
}
|
||
}
|
||
|
||
POST /test/_bulk?refresh
|
||
{"index":{}}
|
||
{"my_location": {"lat":52.373184, "lon":4.889187}, "@timestamp": "2023-01-02T09:00:00Z"}
|
||
{"index":{}}
|
||
{"my_location": {"lat":52.370159, "lon":4.885057}, "@timestamp": "2023-01-02T10:00:00Z"}
|
||
{"index":{}}
|
||
{"my_location": {"lat":52.369219, "lon":4.901618}, "@timestamp": "2023-01-02T13:00:00Z"}
|
||
{"index":{}}
|
||
{"my_location": {"lat":52.374081, "lon":4.912350}, "@timestamp": "2023-01-02T16:00:00Z"}
|
||
{"index":{}}
|
||
{"my_location": {"lat":52.371667, "lon":4.914722}, "@timestamp": "2023-01-03T12:00:00Z"}
|
||
|
||
POST /test/_search?filter_path=aggregations
|
||
{
|
||
"aggs": {
|
||
"line": {
|
||
"geo_line": {
|
||
"point": {"field": "my_location"},
|
||
"sort": {"field": "@timestamp"}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
Which returns:
|
||
|
||
[source,js]
|
||
----
|
||
{
|
||
"aggregations": {
|
||
"line": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"type": "LineString",
|
||
"coordinates": [
|
||
[ 4.889187, 52.373184 ],
|
||
[ 4.885057, 52.370159 ],
|
||
[ 4.901618, 52.369219 ],
|
||
[ 4.912350, 52.374081 ],
|
||
[ 4.914722, 52.371667 ]
|
||
]
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE
|
||
|
||
The resulting https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] contains both a `LineString` geometry
|
||
for the path generated by the aggregation, as well as a map of `properties`.
|
||
The property `complete` informs of whether all documents matched were used to generate the geometry.
|
||
The `size` option described below can be used to limit the number of documents included in the aggregation,
|
||
leading to results with `complete: false`.
|
||
Exactly which documents are dropped from results depends on whether the aggregation is based
|
||
on `time_series` or not, and this is discussed in
|
||
<<search-aggregations-metrics-geo-line-grouping-time-series-advantages,more detail below>>.
|
||
|
||
The above result could be displayed in a map user interface:
|
||
|
||
image:images/spatial/geo_line.png[Kibana map with museum tour of Amsterdam]
|
||
|
||
[[search-aggregations-metrics-geo-line-options]]
|
||
==== Options
|
||
|
||
`point`::
|
||
(Required)
|
||
|
||
This option specifies the name of the `geo_point` field
|
||
|
||
Example usage configuring `my_location` as the point field:
|
||
|
||
[source,js]
|
||
----
|
||
"point": {
|
||
"field": "my_location"
|
||
}
|
||
----
|
||
// NOTCONSOLE
|
||
|
||
`sort`::
|
||
(Required outside <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>> aggregations)
|
||
|
||
This option specifies the name of the numeric field to use as the sort key for ordering the points.
|
||
When the `geo_line` aggregation is nested inside a
|
||
<<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
|
||
aggregation, this field defaults to `@timestamp`, and any other value will result in error.
|
||
|
||
Example usage configuring `@timestamp` as the sort key:
|
||
|
||
[source,js]
|
||
----
|
||
"sort": {
|
||
"field": "@timestamp"
|
||
}
|
||
----
|
||
// NOTCONSOLE
|
||
|
||
`include_sort`::
|
||
(Optional, boolean, default: `false`) This option includes, when true, an additional array of the sort values in the
|
||
feature properties.
|
||
|
||
`sort_order`::
|
||
(Optional, string, default: `"ASC"`) This option accepts one of two values: "ASC", "DESC".
|
||
The line is sorted in ascending order by the sort key when set to "ASC", and in descending
|
||
with "DESC".
|
||
|
||
`size`::
|
||
(Optional, integer, default: `10000`) The maximum length of the line represented in the aggregation.
|
||
Valid sizes are between one and 10000.
|
||
Within <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
|
||
the aggregation uses line simplification to constrain the size, otherwise it uses truncation.
|
||
See <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,below>>
|
||
for a discussion on the subtleties involved.
|
||
|
||
[[search-aggregations-metrics-geo-line-grouping]]
|
||
==== Grouping
|
||
|
||
The simple example above will produce a single track for all the data selected by the query. However, it is far more
|
||
common to need to group the data into multiple tracks. For example, grouping flight transponder measurements by
|
||
flight call-sign before sorting each flight by timestamp and producing a separate track for each.
|
||
|
||
In the following examples we will group the locations of points of interest in the cities of
|
||
Amsterdam, Antwerp and Paris.
|
||
The tracks will be ordered by the planned visit sequence for a walking tour of the museums and others attractions.
|
||
|
||
In order to demonstrate the difference between a time-series grouping and a non-time-series grouping, we will
|
||
first create an index with <<tsds-index-settings,time-series enabled>>,
|
||
and then give examples of grouping the same data without time-series and with time-series.
|
||
|
||
[source,console,id=search-aggregations-metrics-geo-line-grouping-setup]
|
||
----
|
||
PUT tour
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"city": {
|
||
"type": "keyword",
|
||
"time_series_dimension": true
|
||
},
|
||
"category": { "type": "keyword" },
|
||
"route": { "type": "long" },
|
||
"name": { "type": "keyword" },
|
||
"location": { "type": "geo_point" },
|
||
"@timestamp": { "type": "date" }
|
||
}
|
||
},
|
||
"settings": {
|
||
"index": {
|
||
"mode": "time_series",
|
||
"routing_path": [ "city" ],
|
||
"time_series": {
|
||
"start_time": "2023-01-01T00:00:00Z",
|
||
"end_time": "2024-01-01T00:00:00Z"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
POST /tour/_bulk?refresh
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-02T09:00:00Z", "route": 0, "location": "POINT(4.889187 52.373184)", "city": "Amsterdam", "category": "Attraction", "name": "Royal Palace Amsterdam"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-02T10:00:00Z", "route": 1, "location": "POINT(4.885057 52.370159)", "city": "Amsterdam", "category": "Attraction", "name": "The Amsterdam Dungeon"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-02T13:00:00Z", "route": 2, "location": "POINT(4.901618 52.369219)", "city": "Amsterdam", "category": "Museum", "name": "Museum Het Rembrandthuis"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-02T16:00:00Z", "route": 3, "location": "POINT(4.912350 52.374081)", "city": "Amsterdam", "category": "Museum", "name": "NEMO Science Museum"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-03T12:00:00Z", "route": 4, "location": "POINT(4.914722 52.371667)", "city": "Amsterdam", "category": "Museum", "name": "Nederlands Scheepvaartmuseum"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-04T09:00:00Z", "route": 5, "location": "POINT(4.401384 51.220292)", "city": "Antwerp", "category": "Attraction", "name": "Cathedral of Our Lady"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-04T12:00:00Z", "route": 6, "location": "POINT(4.405819 51.221758)", "city": "Antwerp", "category": "Museum", "name": "Snijders&Rockoxhuis"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-04T15:00:00Z", "route": 7, "location": "POINT(4.405200 51.222900)", "city": "Antwerp", "category": "Museum", "name": "Letterenhuis"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-05T10:00:00Z", "route": 8, "location": "POINT(2.336389 48.861111)", "city": "Paris", "category": "Museum", "name": "Musée du Louvre"}
|
||
{"index":{}}
|
||
{"@timestamp": "2023-01-05T14:00:00Z", "route": 9, "location": "POINT(2.327000 48.860000)", "city": "Paris", "category": "Museum", "name": "Musée dOrsay"}
|
||
----
|
||
|
||
[[search-aggregations-metrics-geo-line-grouping-terms]]
|
||
==== Grouping with terms
|
||
|
||
Using the above data, for a non-time-series use case, the grouping can be done using a
|
||
<<search-aggregations-bucket-terms-aggregation,terms aggregation>> based on city name.
|
||
This would work whether or not we had defined the `tour` index as a time series index.
|
||
|
||
[source,console,id=search-aggregations-metrics-geo-line-terms]
|
||
----
|
||
POST /tour/_search?filter_path=aggregations
|
||
{
|
||
"aggregations": {
|
||
"path": {
|
||
"terms": {"field": "city"},
|
||
"aggregations": {
|
||
"museum_tour": {
|
||
"geo_line": {
|
||
"point": {"field": "location"},
|
||
"sort": {"field": "@timestamp"}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
Which returns:
|
||
|
||
[source,js]
|
||
----
|
||
{
|
||
"aggregations": {
|
||
"path": {
|
||
"doc_count_error_upper_bound": 0,
|
||
"sum_other_doc_count": 0,
|
||
"buckets": [
|
||
{
|
||
"key": "Amsterdam",
|
||
"doc_count": 5,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"key": "Antwerp",
|
||
"doc_count": 3,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
},
|
||
{
|
||
"key": "Paris",
|
||
"doc_count": 2,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE
|
||
|
||
The above results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
|
||
of the `city` field, and an inner aggregation result called `museum_tour` containing a
|
||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
|
||
actual route between the various attractions in that city.
|
||
Each result also includes a `properties` object with a `complete` value which will be `false` if the geometry
|
||
was truncated to the limits specified in the `size` parameter.
|
||
Note that when we use `time_series` in the example below, we will get the same results structured a little differently.
|
||
|
||
[[search-aggregations-metrics-geo-line-grouping-time-series]]
|
||
==== Grouping with time-series
|
||
|
||
Using the same data as before, we can also perform the grouping with a
|
||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
|
||
This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`,
|
||
in this case the same `city` field used in the previous
|
||
<<search-aggregations-bucket-terms-aggregation,terms aggregation>>.
|
||
This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
|
||
|
||
[source,console,id=search-aggregations-metrics-geo-line-time-series]
|
||
----
|
||
POST /tour/_search?filter_path=aggregations
|
||
{
|
||
"aggregations": {
|
||
"path": {
|
||
"time_series": {},
|
||
"aggregations": {
|
||
"museum_tour": {
|
||
"geo_line": {
|
||
"point": {"field": "location"}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TEST[continued]
|
||
|
||
NOTE: The `geo_line` aggregation no longer requires the `sort` field when nested within a
|
||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
|
||
This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by.
|
||
If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
|
||
|
||
The above query will result in:
|
||
|
||
[source,js]
|
||
----
|
||
{
|
||
"aggregations": {
|
||
"path": {
|
||
"buckets": {
|
||
"{city=Paris}": {
|
||
"key": {
|
||
"city": "Paris"
|
||
},
|
||
"doc_count": 2,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
},
|
||
"{city=Antwerp}": {
|
||
"key": {
|
||
"city": "Antwerp"
|
||
},
|
||
"doc_count": 3,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
},
|
||
"{city=Amsterdam}": {
|
||
"key": {
|
||
"city": "Amsterdam"
|
||
},
|
||
"doc_count": 5,
|
||
"museum_tour": {
|
||
"type": "Feature",
|
||
"geometry": {
|
||
"coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
|
||
"type": "LineString"
|
||
},
|
||
"properties": {
|
||
"complete": true
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE
|
||
|
||
The above results are essentially the same as with the previous `terms` aggregation example, but structured differently.
|
||
Here we see the buckets returned as a map, where the key is an internal description of the TSID.
|
||
This TSID is unique for each unique combination of fields with `time_series_dimension: true`.
|
||
Each bucket contains a `key` field which is also a map of all dimension values for the TSID, in this case only the city
|
||
name is used for grouping.
|
||
In addition, there is an inner aggregation result called `museum_tour` containing a
|
||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
|
||
actual route between the various attractions in that city.
|
||
Each result also includes a `properties` object with a `complete` value which will be false if the geometry
|
||
was simplified to the limits specified in the `size` parameter.
|
||
|
||
[[search-aggregations-metrics-geo-line-grouping-time-series-advantages]]
|
||
==== Why group with time-series?
|
||
|
||
When reviewing the above examples, you might think that there is little difference between using
|
||
<<search-aggregations-bucket-terms-aggregation,`terms`>> or
|
||
<<search-aggregations-bucket-time-series-aggregation,`time_series`>>
|
||
to group the geo-lines. However, there are some important differences in behaviour between the two cases.
|
||
Time series indexes are stored in a very specific order on disk.
|
||
They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field.
|
||
This allows the `geo_line` aggregation to be considerably optimized:
|
||
|
||
* The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets.
|
||
This is substantially less memory than required for non-time-series cases where all buckets are collected
|
||
concurrently.
|
||
* No sorting needs to be done, since the data is pre-sorted by `@timestamp`.
|
||
The time-series data will naturally arrive at the aggregation collector in `DESC` order.
|
||
This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order,
|
||
but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
|
||
* The `size` parameter can be used for a streaming line-simplification algorithm.
|
||
Without time-series, we are forced to truncate data, by default after 10000 documents per bucket, in order to
|
||
prevent memory usage from being unbounded.
|
||
This can result in geo-lines being truncated, and therefor loosing important data.
|
||
With time-series we can run a streaming line-simplification algorithm, retaining control over memory usage,
|
||
while also maintaining the overall geometry shape.
|
||
In fact, for most use cases it would work to set this `size` parameter to a much lower bound, and save even more
|
||
memory. For example, if the `geo_line` is to be drawn on a display map with a specific resolution, it might look
|
||
just as good to simplify to as few as 100 or 200 points. This will save memory on the server, on the network and
|
||
in the client.
|
||
|
||
Note: There are other significant advantages to working with time-series data and using `time_series` index mode.
|
||
These are discussed in the documentation on <<tsds,time series data streams>>.
|
||
|
||
[[search-aggregations-metrics-geo-line-simplification]]
|
||
==== Streaming line simplification
|
||
|
||
Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map
|
||
user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the
|
||
entire geometry to be maintained in memory together with supporting data for the simplification itself.
|
||
The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification
|
||
process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting
|
||
is required, which is the case when grouping is done by the
|
||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>,
|
||
running on an index with the `time_series` index mode.
|
||
|
||
Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that
|
||
memory with the incoming documents.
|
||
Once the memory is completely filled, documents from within the line are removed as new documents are added.
|
||
The choice of document to remove is made to minimize the visual impact on the geometry.
|
||
This process makes use of the
|
||
https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm[Visvalingam–Whyatt algorithm].
|
||
Essentially this means points are removed if they have the minimum triangle area, with the triangle defined
|
||
by the point under consideration and the two points before and after it in the line.
|
||
In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
|
||
|
||
In order to demonstrate how much better line simplification is to line truncation, consider this example of the north
|
||
shore of Kodiak Island.
|
||
The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
|
||
|
||
image:images/spatial/kodiak_geo_line_truncated.png[North short of Kodiak Island truncated to 100 points]
|
||
|
||
The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different
|
||
geometry than the original.
|
||
|
||
Now consider the same geometry simplified to 100 points.
|
||
|
||
image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
|
||
|
||
For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry
|
||
in magenta. It is possible to see where the new simplified line deviates from the original, but the overall
|
||
geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.
|