mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
Documentation for time-series geo_line (#97373)
* Documentation for time-series geo_line * Fix incorrect ids in geoline docs * Some updates from review Added image of kibana map, improved first example, linked to TSDS and added section on line simplification with link to wikipedia. * Diagrams of truncation versus simplification
This commit is contained in:
parent
cda4b86afe
commit
8151092b45
6 changed files with 419 additions and 56 deletions
|
@ -1,5 +1,25 @@
|
|||
pr: 94954
|
||||
summary: Asset tracking - geo_line for TSDB
|
||||
summary: Asset tracking - geo_line in time-series aggregations
|
||||
area: Geo
|
||||
type: enhancement
|
||||
issues: [ ]
|
||||
highlight:
|
||||
title: Asset tracking - geo_line in time-series aggregations
|
||||
body: |-
|
||||
The <<search-aggregations-metrics-geo-line,`geo_line` aggregation>> builds tracks from `geo_points`.
|
||||
It has previously needed to use large arrays in memory for collecting points into multiple buckets
|
||||
and sorting those buckets.
|
||||
|
||||
With the advances made in TSDB features and the `time_series` aggregation in particular,
|
||||
it is now possible to rely on data aggregating in both TSID and timestamp order,
|
||||
enabling the removal of all sorting, as well as the use of only a single bucket's
|
||||
worth of memory, a dramatic improvement in memory footprint. In addition, we can use the streaming line
|
||||
simplifier algorithm introduced in https://github.com/elastic/elasticsearch/pull/94859 to replace the previous
|
||||
behaviour of truncating very large tracks with the far more preferable approach of simplifying those tracks.
|
||||
|
||||
[role="screenshot"]
|
||||
image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
|
||||
|
||||
In this diagram, the grey line is the original geometry, the blue line is the truncated geometry as would be
|
||||
produced by the original `geo_line` aggregation, and the magenta line is the new simplified geometry.
|
||||
notable: false
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
<titleabbrev>Geo-Line</titleabbrev>
|
||||
++++
|
||||
|
||||
The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a LineString ordered
|
||||
The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a `LineString` ordered
|
||||
by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
|
||||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.
|
||||
|
||||
|
@ -14,31 +14,25 @@ https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing th
|
|||
PUT test
|
||||
{
|
||||
"mappings": {
|
||||
"dynamic": "strict",
|
||||
"_source": {
|
||||
"enabled": false
|
||||
},
|
||||
"properties": {
|
||||
"my_location": {
|
||||
"type": "geo_point"
|
||||
},
|
||||
"group": {
|
||||
"type": "keyword"
|
||||
},
|
||||
"@timestamp": {
|
||||
"type": "date"
|
||||
}
|
||||
"my_location": { "type": "geo_point" },
|
||||
"group": { "type": "keyword" },
|
||||
"@timestamp": { "type": "date" }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
POST /test/_bulk?refresh
|
||||
{"index":{}}
|
||||
{"my_location": {"lat":37.3450570, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:36"}
|
||||
{"my_location": {"lat":52.373184, "lon":4.889187}, "@timestamp": "2023-01-02T09:00:00Z"}
|
||||
{"index":{}}
|
||||
{"my_location": {"lat": 37.3451320, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:37Z"}
|
||||
{"my_location": {"lat":52.370159, "lon":4.885057}, "@timestamp": "2023-01-02T10:00:00Z"}
|
||||
{"index":{}}
|
||||
{"my_location": {"lat": 37.349283, "lon": -122.0505010}, "@timestamp": "2013-09-06T16:00:37Z"}
|
||||
{"my_location": {"lat":52.369219, "lon":4.901618}, "@timestamp": "2023-01-02T13:00:00Z"}
|
||||
{"index":{}}
|
||||
{"my_location": {"lat":52.374081, "lon":4.912350}, "@timestamp": "2023-01-02T16:00:00Z"}
|
||||
{"index":{}}
|
||||
{"my_location": {"lat":52.371667, "lon":4.914722}, "@timestamp": "2023-01-03T12:00:00Z"}
|
||||
|
||||
POST /test/_search?filter_path=aggregations
|
||||
{
|
||||
|
@ -64,18 +58,11 @@ Which returns:
|
|||
"geometry": {
|
||||
"type": "LineString",
|
||||
"coordinates": [
|
||||
[
|
||||
-122.049982,
|
||||
37.345057
|
||||
],
|
||||
[
|
||||
-122.050501,
|
||||
37.349283
|
||||
],
|
||||
[
|
||||
-122.049982,
|
||||
37.345132
|
||||
]
|
||||
[ 4.889187, 52.373184 ],
|
||||
[ 4.885057, 52.370159 ],
|
||||
[ 4.901618, 52.369219 ],
|
||||
[ 4.912350, 52.374081 ],
|
||||
[ 4.914722, 52.371667 ]
|
||||
]
|
||||
},
|
||||
"properties": {
|
||||
|
@ -87,6 +74,19 @@ Which returns:
|
|||
----
|
||||
// TESTRESPONSE
|
||||
|
||||
The resulting https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] contains both a `LineString` geometry
|
||||
for the path generated by the aggregation, as well as a map of `properties`.
|
||||
The property `complete` informs of whether all documents matched were used to generate the geometry.
|
||||
The `size` option described below can be used to limit the number of documents included in the aggregation,
|
||||
leading to results with `complete: false`.
|
||||
Exactly which documents are dropped from results depends on whether the aggregation is based
|
||||
on `time_series` or not, and this is discussed in
|
||||
<<search-aggregations-metrics-geo-line-grouping-time-series-advantages,more detail below>>.
|
||||
|
||||
The above result could be displayed in a map user interface:
|
||||
|
||||
image:images/spatial/geo_line.png[Kibana map with museum tour of Amsterdam]
|
||||
|
||||
[[search-aggregations-metrics-geo-line-options]]
|
||||
==== Options
|
||||
|
||||
|
@ -106,37 +106,378 @@ Example usage configuring `my_location` as the point field:
|
|||
// NOTCONSOLE
|
||||
|
||||
`sort`::
|
||||
(Required)
|
||||
(Required outside <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>> aggregations)
|
||||
|
||||
This option specifies the name of the numeric field to use as the sort key
|
||||
for ordering the points
|
||||
This option specifies the name of the numeric field to use as the sort key for ordering the points.
|
||||
When the `geo_line` aggregation is nested inside a
|
||||
<<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
|
||||
aggregation, this field defaults to `@timestamp`, and any other value will result in error.
|
||||
|
||||
Example usage configuring `@timestamp` as the sort key:
|
||||
|
||||
[source,js]
|
||||
----
|
||||
"point": {
|
||||
"sort": {
|
||||
"field": "@timestamp"
|
||||
}
|
||||
----
|
||||
// NOTCONSOLE
|
||||
|
||||
`include_sort`::
|
||||
(Optional, boolean, default: `false`)
|
||||
|
||||
This option includes, when true, an additional array of the sort values in the
|
||||
(Optional, boolean, default: `false`) This option includes, when true, an additional array of the sort values in the
|
||||
feature properties.
|
||||
|
||||
`sort_order`::
|
||||
(Optional, string, default: `"ASC"`)
|
||||
|
||||
This option accepts one of two values: "ASC", "DESC".
|
||||
|
||||
(Optional, string, default: `"ASC"`) This option accepts one of two values: "ASC", "DESC".
|
||||
The line is sorted in ascending order by the sort key when set to "ASC", and in descending
|
||||
with "DESC".
|
||||
|
||||
`size`::
|
||||
(Optional, integer, default: `10000`)
|
||||
(Optional, integer, default: `10000`) The maximum length of the line represented in the aggregation.
|
||||
Valid sizes are between one and 10000.
|
||||
Within <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
|
||||
the aggregation uses line simplification to constrain the size, otherwise it uses truncation.
|
||||
See <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,below>>
|
||||
for a discussion on the subtleties involved.
|
||||
|
||||
The maximum length of the line represented in the aggregation. Valid sizes are
|
||||
between one and 10000.
|
||||
[[search-aggregations-metrics-geo-line-grouping]]
|
||||
==== Grouping
|
||||
|
||||
The simple example above will produce a single track for all the data selected by the query. However, it is far more
|
||||
common to need to group the data into multiple tracks. For example, grouping flight transponder measurements by
|
||||
flight call-sign before sorting each flight by timestamp and producing a separate track for each.
|
||||
|
||||
In the following examples we will group the locations of points of interest in the cities of
|
||||
Amsterdam, Antwerp and Paris.
|
||||
The tracks will be ordered by the planned visit sequence for a walking tour of the museums and others attractions.
|
||||
|
||||
In order to demonstrate the difference between a time-series grouping and a non-time-series grouping, we will
|
||||
first create an index with <<tsds-index-settings,time-series enabled>>,
|
||||
and then give examples of grouping the same data without time-series and with time-series.
|
||||
|
||||
[source,console,id=search-aggregations-metrics-geo-line-grouping-setup]
|
||||
----
|
||||
PUT tour
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"city": {
|
||||
"type": "keyword",
|
||||
"time_series_dimension": true
|
||||
},
|
||||
"category": { "type": "keyword" },
|
||||
"route": { "type": "long" },
|
||||
"name": { "type": "keyword" },
|
||||
"location": { "type": "geo_point" },
|
||||
"@timestamp": { "type": "date" }
|
||||
}
|
||||
},
|
||||
"settings": {
|
||||
"index": {
|
||||
"mode": "time_series",
|
||||
"routing_path": [ "city" ],
|
||||
"time_series": {
|
||||
"start_time": "2023-01-01T00:00:00Z",
|
||||
"end_time": "2024-01-01T00:00:00Z"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
POST /tour/_bulk?refresh
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-02T09:00:00Z", "route": 0, "location": "POINT(4.889187 52.373184)", "city": "Amsterdam", "category": "Attraction", "name": "Royal Palace Amsterdam"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-02T10:00:00Z", "route": 1, "location": "POINT(4.885057 52.370159)", "city": "Amsterdam", "category": "Attraction", "name": "The Amsterdam Dungeon"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-02T13:00:00Z", "route": 2, "location": "POINT(4.901618 52.369219)", "city": "Amsterdam", "category": "Museum", "name": "Museum Het Rembrandthuis"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-02T16:00:00Z", "route": 3, "location": "POINT(4.912350 52.374081)", "city": "Amsterdam", "category": "Museum", "name": "NEMO Science Museum"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-03T12:00:00Z", "route": 4, "location": "POINT(4.914722 52.371667)", "city": "Amsterdam", "category": "Museum", "name": "Nederlands Scheepvaartmuseum"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-04T09:00:00Z", "route": 5, "location": "POINT(4.401384 51.220292)", "city": "Antwerp", "category": "Attraction", "name": "Cathedral of Our Lady"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-04T12:00:00Z", "route": 6, "location": "POINT(4.405819 51.221758)", "city": "Antwerp", "category": "Museum", "name": "Snijders&Rockoxhuis"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-04T15:00:00Z", "route": 7, "location": "POINT(4.405200 51.222900)", "city": "Antwerp", "category": "Museum", "name": "Letterenhuis"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-05T10:00:00Z", "route": 8, "location": "POINT(2.336389 48.861111)", "city": "Paris", "category": "Museum", "name": "Musée du Louvre"}
|
||||
{"index":{}}
|
||||
{"@timestamp": "2023-01-05T14:00:00Z", "route": 9, "location": "POINT(2.327000 48.860000)", "city": "Paris", "category": "Museum", "name": "Musée dOrsay"}
|
||||
----
|
||||
|
||||
[[search-aggregations-metrics-geo-line-grouping-terms]]
|
||||
==== Grouping with terms
|
||||
|
||||
Using the above data, for a non-time-series use case, the grouping can be done using a
|
||||
<<search-aggregations-bucket-terms-aggregation,terms aggregation>> based on city name.
|
||||
This would work whether or not we had defined the `tour` index as a time series index.
|
||||
|
||||
[source,console,id=search-aggregations-metrics-geo-line-terms]
|
||||
----
|
||||
POST /tour/_search?filter_path=aggregations
|
||||
{
|
||||
"aggregations": {
|
||||
"path": {
|
||||
"terms": {"field": "city"},
|
||||
"aggregations": {
|
||||
"museum_tour": {
|
||||
"geo_line": {
|
||||
"point": {"field": "location"},
|
||||
"sort": {"field": "@timestamp"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
|
||||
Which returns:
|
||||
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"aggregations": {
|
||||
"path": {
|
||||
"doc_count_error_upper_bound": 0,
|
||||
"sum_other_doc_count": 0,
|
||||
"buckets": [
|
||||
{
|
||||
"key": "Amsterdam",
|
||||
"doc_count": 5,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"key": "Antwerp",
|
||||
"doc_count": 3,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"key": "Paris",
|
||||
"doc_count": 2,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE
|
||||
|
||||
The above results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
|
||||
of the `city` field, and an inner aggregation result called `museum_tour` containing a
|
||||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
|
||||
actual route between the various attractions in that city.
|
||||
Each result also includes a `properties` object with a `complete` value which will be `false` if the geometry
|
||||
was truncated to the limits specified in the `size` parameter.
|
||||
Note that when we use `time_series` in the example below, we will get the same results structured a little differently.
|
||||
|
||||
[[search-aggregations-metrics-geo-line-grouping-time-series]]
|
||||
==== Grouping with time-series
|
||||
|
||||
Using the same data as before, we can also perform the grouping with a
|
||||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
|
||||
This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`,
|
||||
in this case the same `city` field used in the previous
|
||||
<<search-aggregations-bucket-terms-aggregation,terms aggregation>>.
|
||||
This example will only work if we defined the `tour` index as a time series index using `index.mode="time_series"`.
|
||||
|
||||
[source,console,id=search-aggregations-metrics-geo-line-time-series]
|
||||
----
|
||||
POST /tour/_search?filter_path=aggregations
|
||||
{
|
||||
"aggregations": {
|
||||
"path": {
|
||||
"time_series": {},
|
||||
"aggregations": {
|
||||
"museum_tour": {
|
||||
"geo_line": {
|
||||
"point": {"field": "location"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[continued]
|
||||
|
||||
NOTE: The `geo_line` aggregation no longer requires the `sort` field when nested within a
|
||||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
|
||||
This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by.
|
||||
If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
|
||||
|
||||
The above query will result in:
|
||||
|
||||
[source,js]
|
||||
----
|
||||
{
|
||||
"aggregations": {
|
||||
"path": {
|
||||
"buckets": {
|
||||
"{city=Paris}": {
|
||||
"key": {
|
||||
"city": "Paris"
|
||||
},
|
||||
"doc_count": 2,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 2.336389, 48.861111 ], [ 2.327, 48.86 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
},
|
||||
"{city=Antwerp}": {
|
||||
"key": {
|
||||
"city": "Antwerp"
|
||||
},
|
||||
"doc_count": 3,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 4.401384, 51.220292 ], [ 4.405819, 51.221758 ], [ 4.4052, 51.2229 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
},
|
||||
"{city=Amsterdam}": {
|
||||
"key": {
|
||||
"city": "Amsterdam"
|
||||
},
|
||||
"doc_count": 5,
|
||||
"museum_tour": {
|
||||
"type": "Feature",
|
||||
"geometry": {
|
||||
"coordinates": [ [ 4.889187, 52.373184 ], [ 4.885057, 52.370159 ], [ 4.901618, 52.369219 ], [ 4.91235, 52.374081 ], [ 4.914722, 52.371667 ] ],
|
||||
"type": "LineString"
|
||||
},
|
||||
"properties": {
|
||||
"complete": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE
|
||||
|
||||
The above results are essentially the same as with the previous `terms` aggregation example, but structured differently.
|
||||
Here we see the buckets returned as a map, where the key is an internal description of the TSID.
|
||||
This TSID is unique for each unique combination of fields with `time_series_dimension: true`.
|
||||
Each bucket contains a `key` field which is also a map of all dimension values for the TSID, in this case only the city
|
||||
name is used for grouping.
|
||||
In addition, there is an inner aggregation result called `museum_tour` containing a
|
||||
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
|
||||
actual route between the various attractions in that city.
|
||||
Each result also includes a `properties` object with a `complete` value which will be false if the geometry
|
||||
was simplified to the limits specified in the `size` parameter.
|
||||
|
||||
[[search-aggregations-metrics-geo-line-grouping-time-series-advantages]]
|
||||
==== Why group with time-series?
|
||||
|
||||
When reviewing the above examples, you might think that there is little difference between using
|
||||
<<search-aggregations-bucket-terms-aggregation,`terms`>> or
|
||||
<<search-aggregations-bucket-time-series-aggregation,`time_series`>>
|
||||
to group the geo-lines. However, there are some important differences in behaviour between the two cases.
|
||||
Time series indexes are stored in a very specific order on disk.
|
||||
They are pre-grouped by the time-series dimension fields, and pre-sorted by the `@timestamp` field.
|
||||
This allows the `geo_line` aggregation to be considerably optimized:
|
||||
|
||||
* The same memory allocated for the first bucket can be re-used over and over for all subsequent buckets.
|
||||
This is substantially less memory than required for non-time-series cases where all buckets are collected
|
||||
concurrently.
|
||||
* No sorting needs to be done, since the data is pre-sorted by `@timestamp`.
|
||||
The time-series data will naturally arrive at the aggregation collector in `DESC` order.
|
||||
This means that if we specify `sort_order:ASC` (the default), we still collect in `DESC` order,
|
||||
but perform an efficient in-memory reverse order before generating the final `LineString` geometry.
|
||||
* The `size` parameter can be used for a streaming line-simplification algorithm.
|
||||
Without time-series, we are forced to truncate data, by default after 10000 documents per bucket, in order to
|
||||
prevent memory usage from being unbounded.
|
||||
This can result in geo-lines being truncated, and therefor loosing important data.
|
||||
With time-series we can run a streaming line-simplification algorithm, retaining control over memory usage,
|
||||
while also maintaining the overall geometry shape.
|
||||
In fact, for most use cases it would work to set this `size` parameter to a much lower bound, and save even more
|
||||
memory. For example, if the `geo_line` is to be drawn on a display map with a specific resolution, it might look
|
||||
just as good to simplify to as few as 100 or 200 points. This will save memory on the server, on the network and
|
||||
in the client.
|
||||
|
||||
Note: There are other significant advantages to working with time-series data and using `time_series` index mode.
|
||||
These are discussed in the documentation on <<tsds,time series data streams>>.
|
||||
|
||||
[[search-aggregations-metrics-geo-line-simplification]]
|
||||
==== Streaming line simplification
|
||||
|
||||
Line simplification is a great way to reduce the size of the final results sent to the client, and displayed in a map
|
||||
user interface. However, normally these algorithms use a lot of memory to perform the simplification, requiring the
|
||||
entire geometry to be maintained in memory together with supporting data for the simplification itself.
|
||||
The use of a streaming line simplification algorithm allows for minimal memory usage during the simplification
|
||||
process by constraining memory to the bounds defined for the simplified geometry. This is only possible if no sorting
|
||||
is required, which is the case when grouping is done by the
|
||||
<<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>,
|
||||
running on an index with the `time_series` index mode.
|
||||
|
||||
Under these conditions the `geo_line` aggregation allocates memory to the `size` specified, and then fills that
|
||||
memory with the incoming documents.
|
||||
Once the memory is completely filled, documents from within the line are removed as new documents are added.
|
||||
The choice of document to remove is made to minimize the visual impact on the geometry.
|
||||
This process makes use of the
|
||||
https://en.wikipedia.org/wiki/Visvalingam%E2%80%93Whyatt_algorithm[Visvalingam–Whyatt algorithm].
|
||||
Essentially this means points are removed if they have the minimum triangle area, with the triangle defined
|
||||
by the point under consideration and the two points before and after it in the line.
|
||||
In addition, we calculate the area using spherical coordinates so that no planar distortions affect the choice.
|
||||
|
||||
In order to demonstrate how much better line simplification is to line truncation, consider this example of the north
|
||||
shore of Kodiak Island.
|
||||
The data for this is only 209 points, but if we want to set `size` to `100` we get dramatic truncation.
|
||||
|
||||
image:images/spatial/kodiak_geo_line_truncated.png[North short of Kodiak Island truncated to 100 points]
|
||||
|
||||
The grey line is the entire geometry of 209 points, while the blue line is the first 100 points, a very different
|
||||
geometry than the original.
|
||||
|
||||
Now consider the same geometry simplified to 100 points.
|
||||
|
||||
image:images/spatial/kodiak_geo_line_simplified.png[North short of Kodiak Island simplified to 100 points]
|
||||
|
||||
For comparison we have shown the original in grey, the truncated in blue and the new simplified geometry
|
||||
in magenta. It is possible to see where the new simplified line deviates from the original, but the overall
|
||||
geometry appears almost identical and is still clearly recognizable as the north shore of Kodiak Island.
|
||||
|
|
BIN
docs/reference/images/spatial/geo_line.png
Normal file
BIN
docs/reference/images/spatial/geo_line.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.5 MiB |
BIN
docs/reference/images/spatial/kodiak_geo_line_simplified.png
Normal file
BIN
docs/reference/images/spatial/kodiak_geo_line_simplified.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 34 KiB |
BIN
docs/reference/images/spatial/kodiak_geo_line_truncated.png
Normal file
BIN
docs/reference/images/spatial/kodiak_geo_line_truncated.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 31 KiB |
|
@ -9,11 +9,13 @@ Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
|
|||
* to find geopoints within a <<query-dsl-geo-bounding-box-query,bounding box>>,
|
||||
within a certain <<query-dsl-geo-distance-query,distance>> of a central point,
|
||||
or within a <<query-dsl-geo-shape-query,`geo_shape` query>> (for example, points in a polygon).
|
||||
* to aggregate documents by <<search-aggregations-bucket-geodistance-aggregation,distance>> from a central point
|
||||
* to aggregate documents by <<search-aggregations-bucket-geodistance-aggregation,distance>> from a central point.
|
||||
* to aggregate documents by geographic grids: either
|
||||
<<search-aggregations-bucket-geohashgrid-aggregation,`geo_hash`>>,
|
||||
<<search-aggregations-bucket-geotilegrid-aggregation,`geo_tile`>> or
|
||||
<<search-aggregations-bucket-geohexgrid-aggregation,`geo_hex`>>.
|
||||
* to aggregate geopoints into a track using the metrics aggregation
|
||||
<<search-aggregations-metrics-geo-line,`geo_line`>>.
|
||||
* to integrate distance into a document's <<query-dsl-function-score-query,relevance score>>.
|
||||
* to <<geo-sorting,sort>> documents by distance.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue