Commit graph

55 commits

Author SHA1 Message Date
Craig Taverner
40c34cd896
Optimize ST_EXTENT_AGG for geo_shape and cartesian_shape (#119889)
Support for `ST_EXTENT_AGG` was added in https://github.com/elastic/elasticsearch/pull/118829, and then partially optimized in https://github.com/elastic/elasticsearch/pull/118829. This optimization worked only for cartesian_shape fields, and worked by extracting the Extent from the doc-values and re-encoding it as a WKB `BBOX` geometry. This does not work for geo_shape, where we need to retain all 6 integers stored in the doc-values, in order to perform the datelline choice only at reduce time during the final phase of the aggregation.

Since both geo_shape and cartesian_shape perform the aggregations using integers, and the original Extent values in the doc-values are integers, this PR expands the previous optimization by:
* Saving all Extent values into a multi-valued field in an IntBlock for both cartesian_shape and geo_shape
* Simplifying the logic around merging intermediate states for all cases (geo/cartesian and grouped and non-grouped aggs)
* Widening test cases for testing more combinations of aggregations and types, and fixing a few bugs found
* Enhancing cartesian extent to convert from 6 ints to 4 ints at block loading time (for efficiency)
* Fixing bugs in both cartesian and geo extents for generating intermediate state with missing groups (flaky tests in serverless)
* Moved the int order to always match Rectangle for 4-int and Extent for 6-int cases (improved internal consistency)

Since the PR already changed the meaning of the invalid/infinite values of the intermediate state integers, it was already not compatible with the previous cluster versions. We disabled mixed-cluster testing to prevent errors as a result of that. This leaves us the opportunity to make further changes that are mixed-cluster incompatible, hence the decision to perform this consistency update now.
2025-01-16 19:43:51 +01:00
Gal Lalouche
2be4cd983f
ESQL: Support ST_EXTENT_AGG (#117451)
This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT.

This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values.
We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior.
Fixes #104659.
2024-12-13 12:41:24 +02:00
Craig Taverner
c7e985c3b6
Support ST_ENVELOPE and related ST_XMIN, etc. (#116964)
Support ST_ENVELOPE and related ST_XMIN, etc.

Based on the PostGIS equivalents:

https://postgis.net/docs/ST_Envelope.html
https://postgis.net/docs/ST_XMin.html
https://postgis.net/docs/ST_XMax.html
https://postgis.net/docs/ST_YMin.html
https://postgis.net/docs/ST_YMax.html
2024-12-04 12:20:47 +01:00
Ryan Ernst
e5d5c17c99
Use directory name as project name for libs (#115720)
The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.
2024-10-29 13:02:28 -07:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Craig Taverner
097fc0654f
Add maximum nested depth check to WKT parser (#111843)
* Add maximum nested depth check to WKT parser

This prevents StackOverflowErrors, replacing them with ParseException errors, which is more easily managed by running servers.

* Update docs/changelog/111843.yaml
2024-08-13 20:07:52 +02:00
Simon Cooper
1b8baf1cf8
Convert most uses of BaseMatcher to TypeSafeMatcher (#105764) 2024-03-11 09:12:42 +00:00
Ignacio Vera
a6b36eb20a
Add the possibility to transform WKT to WKB directly (#104030)
enhancement to geo utilities.
2024-01-08 16:10:32 +01:00
Ignacio Vera
9fb550be44
WellKnownBinary#toWKB should not throw an IOException (#100669)
The only reason this method is throwing an exception is because the
method ByteArrayOutputStream#close() is declaring it although it is a
noop. Therefore it can be safely ignored.

Thanks @romseygeek for bringing into attention.
2023-10-11 08:24:32 -04:00
Armin Braun
b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Craig Taverner
9dbce6f6c7
Asset tracking: geo_line for TSDB (#94954)
* WIP Started geo_line for TSDB work

Starting with YAML tests (which currently pass) and AggregatorTests
(currently failing, likely due to mistake in the tests)

* Update docs/changelog/94954.yaml

* WIP Refactoring to prepare for TSDB geo_line

* Created TimeSeries version of GeoLineAggregator, and wired it in so that time-series aggregations use it, but current behavior is still identical to non-time-series.
* Added both yaml and unit tests for testing that geo_line works with correct results in both time-series and non-time-series cases.
* Added additional tests to verify the grouping behaviour of time-series vs. terms aggs, and the combination of the two.

* WIP Refactoring to prepare for TSDB geo_line

* Started refactoring to re-use simplifier for all buckets

* Fixed bug with leaf collector not changing per segment

* Fixed bug with leaf collector not detecting bucket changes

The bucket id can change within a segment, so we need to detect this and save the geo_line.

* Renamed class since it no longer extends BucketedSort

The original geo_line relied on the BucketedSort for all intelligence.
The time-series geo_line uses none of that, and does its own memory management.

* Fixed bug with geo_point leaking between geo_line buckets

And enhanced unit tests to cover multiple groups

* Code review updates

* Verify that the sort field is specifically the TS timestamp

Only activate the time-series optimizations if the aggregation is both:
* Within a time-series aggregation (ie. tsid and @timestamp ordered)
* The geo_line sort field is @timestamp

* Allow geo_point time-series to skip sort config

Also disables the new geo_line for time-series even if the correct
sort and point fields are used if the point field is not explicitly
configured to be a position metric.

* Support geo_centroid and geo_bounds on position metric

* Update yaml tests for multi-terms tests

* Changed to disallow alternative sort-fields in ts-geo_line

Since the primary criteria for switching to the new algorithm is that
geo_line is within a time-series aggregation, we now disallow any other sort field.

We test the negative case in the yaml tests, but changed the unit tests to
use TermsAggregation to minim the time-series aggregation to get comparable
results.

* For non-time-series check missing sort field early

The old code only threw error if there was data because the check was done
inside the leaf collector just before actually reading the sort field.
And there were no tests for missing sort field.

This commit adds the tests, and checks early so even if data is missing.

* Reviewed TODOs

* Test that behaviour is identical with or without POSITION metric
* Removed fallback code in builder (was switching to old geo_line without POSITION metric)
* Removed two TODO's that are no longer valid concerns
2023-06-15 14:58:25 +02:00
Craig Taverner
38baa36346
Geometry simplifier (#94859)
Support geometry and streaming simplification

There are many opportunities to enable geometry simplification in Elasticsearch, both as an explicit feature available to users, and as an internal optimization technique for reducing memory consumption for complex geometries. For the latter case, it can even be considered a bug fix. This PR provides support for constraining Line and LinearRing sizes to a fixed number of points, and thereby a fixed amount of memory usage.

Consider, for example, the geo_line aggregation. This is similar to the top-10 aggregation, but allows the top-10k (ten thousand) points to be aggregated. This is not only a lot of memory, but can still cause unwanted line truncation for very large geometries. Line simplification is a solution to this. It is likely that a much smaller limit than 10k would suffice, while at the same time not truncating the geometry at all, so we fix a bug (truncation) while improving memory usage (pull limit from 10k down to perhaps just 1k).

This PR provides two APIs:

Streaming:
* By using the simplifier.consume(x, y) method on a stream of points, the total memory used is limited to a linear function of k, the total number of points to retain. This algorithm is at its heart based on the Visvalingam–Whyatt algorithm, with concepts from https://bost.ocks.org/mike/simplify/ and in particular the detailed streaming discussions in the paper at https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.7132&rep=rep1&type=pdf

Full-geometry:
* Simplifying full geometries using the simplifier.simplify(geometry) method can work with most geometry types, even GeometryCollection, but:
  - Some geometries do not get simplified because it makes no sense to: Point, Circle, Rectangle
  - The maxPoints parameter is used as is to apply to the main component (shell for polygons, largest geometry for multi-polygons and geometry collections), and all other sub-components (holes in polygons, etc.) are simplified to a scaled down version of the maxPoints, scaled by the relative size of the sub-component to the main component.
* The simplification itself is done on each Line and LinearRing component using the same streaming algorithm above. Since we use the Visvalingam–Whyatt algorithm, this works is applicable to both streaming and full-geometry simplification with the same essential result, but better control over memory than normal full-geometry simplifiers.

The basic algorithm for simplification on a stream of points requires maintaining two data structures:
* an array of all currently simplified points (implicitly ordered in stream order)
* a priority queue of all but the two end points with an estimated error on each that expresses the cost of removing that point from the line
2023-05-11 17:17:04 +02:00
Ignacio Vera
ab1d891340
Support for store parameter in geo_shape field (#94418)
This commit adds support for storing geo_shape fields as a lucene stored field. The geometry will be stored 
normalised in well-known binary format (WKB).
2023-03-21 12:57:58 +01:00
Simon Cooper
4c46ccacaa
Migrate the remaining uses of Version to TransportVersion (#93384)
Remove get/setVersion methods
2023-02-13 09:15:53 +00:00
Ievgen Degtiarenko
ad63465c64
Make mutateInstance non abstract (#93229) 2023-01-26 09:22:27 +01:00
Oleksandr Porunov
7790f616b8
Move SpatialUtils to geo library (#88088)
Moving parts of the code related to convert circles into polygons.
2022-10-31 12:42:25 +01:00
Tanguy Leroux
f2154e8687
Fix unnecessary string concatenations (#90405) 2022-09-27 16:14:38 +02:00
Chris Hegarty
3071c6a055
Modularize Elasticsearch (#81066)
This PR represents the initial phase of Modularizing Elasticsearch (with
Java Modules).

This initial phase modularizes the core of the Elasticsearch server
with Java Modules, which is then used to load and configure extension
components atop the server. Only a subset of extension components are
modularized at this stage (other components come in a later phase).
Components are loaded dynamically at runtime with custom class loaders
(same as is currently done). Components with a module-info.class are
defined to a module layer.

This architecture is somewhat akin to the Modular JDK, where
applications run on the classpath. In the analogy, the Elasticsearch
server modules are the platform (thus are always resolved and present),
while components without a module-info.class are non-modular code
running atop the Elasticsearch server modules. The extension components
cannot access types from non-exported packages of the server modules, in
the same way that classpath applications cannot access types from
non-exported packages of modules from the JDK. Broadly, the core
Elasticseach java modules simply "wrap" the existing packages and export
them. There are opportunites to export less, which is best done in more
narrowly focused follow-up PRs.

The Elasticsearch distribution startup scripts are updated to put jars
on the module path (the class path is empty), so the distribution will
run the core of the server as java modules. A number of key components
have been retrofitted with module-info.java's too, and the remaining
components can follow later. Unit and functional tests run as
non-modular (since they commonly require package-private access), while
higher-level integration tests, that run the distribution, run as
modular.

Co-authored-by: Chris Hegarty <christopher.hegarty@elastic.co>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
Co-authored-by: Rene Groeschke <rene@elastic.co>
2022-05-20 13:11:42 +01:00
Chris Hegarty
50528d5d79
Add missing explicit no-args ctors (#84763) 2022-03-09 11:08:48 +00:00
Ignacio Vera
ad036be98c
Add WKB support to lib geo (#82706)
Adds a WKB reader/writer in the same fashion as the existing WKT reader/writer.
2022-01-20 08:03:10 +01:00
Artem Prigoda
0699c9351f
Use Java 14 switch expressions (#82178)
JEP 361[https://openjdk.java.net/jeps/361] added support for switch expressions
which can be much more terse and less error-prone than switch statements.

Another useful feature of switch expressions is exhaustiveness: we can make
sure that an enum switch expression covers all the cases at compile time.
2022-01-10 09:53:35 +01:00
Mark Vieira
12ad399c48 Reformat Elasticsearch source 2021-10-27 08:19:51 -07:00
Rory Hunter
e55edf937a
Fix shadowed variables in various places - part 1 (#77555)
Part of #19752.

Fix a number of locations where local variables or parameters are shadowing a field
that is defined in the same class.
2021-09-13 13:48:46 +01:00
Ignacio Vera
07715438b5
Refactor of GeoShape integration tests (#77052)
This commit joins GeoFilterIT and GeoShapeIntegrationIT into one test case called GeoShapeIntegTestCase 
which is moved into the test framework.
2021-09-01 07:21:15 +02:00
Ignacio Vera
d42c0cf016
Make methods in GeoJson and WellKnownText static (#73805)
This change makes all methods on those utility classes static.
2021-06-08 08:37:29 +02:00
Marten
8e46acf6ba
Fix typo in Rectangle() error message (#73124)
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-05-18 10:58:44 -04:00
Rory Hunter
d181b947c2
Remove depth limit from checkstyle negation rule (#70274)
The Checkstyle rule that bans unary negation in favour of an explicit
`== false` has a `maximumDepth` of 2 configured, which meant that it
didn't catch all violations. The `maximumDepth` isn't required (actually
it has a really high default), so this change removes the limit and
fixes the resulting violations.
2021-03-10 22:06:50 +00:00
Rory Hunter
780f273067
Replace NOT operator with explicit false check - part 8 (#68625)
Part 8.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:20:34 +00:00
Mark Vieira
a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Rory Hunter
ad1f876daa
Replace NOT operator with explicit false check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Rene Groeschke
680ea07f7f
Remove deprecated usage of testCompile configuration (#57921)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-12 13:34:53 +02:00
Ryan Ernst
c0ee68b0a0
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 18:56:59 -07:00
Tal Levy
f26ee1298d
Add geo_shape support for geotile_grid and geohash_grid (#55966)
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.

it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
2020-05-05 08:00:16 -07:00
Ryan Ernst
842ce32870
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:23:55 -07:00
Mark Vieira
0e55fdeae9
Re-add origin url information to publish POM files (#55171) 2020-04-14 11:48:36 -07:00
Dominic Page
d1cbdfb753
Geo shape query vs geo point (#52382)
Enable geo_shape query to work on geo_point fields for shapes: circle, polygon, multipolygon, rectangle

see: #48928

Co-Authored-By:  @iverase
2020-03-18 17:03:52 +01:00
Ryan Ernst
bf317e8c4e
Remove comparison to true for booleans (#51723)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
2020-01-31 16:34:27 -08:00
Nik Everett
5ea750f2ca
Clean up wire test case a bit (#50627)
* Adds JavaDoc to `AbstractWireTestCase` and
`AbstractWireSerializingTestCase` so it is more obvious you should prefer
the latter if you have a choice
* Moves the `instanceReader` method out of `AbstractWireTestCase` becaue
it is no longer used.
* Marks a bunch of methods final so it is more obvious which classes are
for what.
* Cleans up the side effects of the above.
2020-01-05 14:42:34 -05:00
Igor Motov
a26e4d1e5e
Geo: Switch generated WKT to upper case (#50285)
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates #49568
2019-12-18 07:28:56 -10:00
Rory Hunter
3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00
Alpar Torok
ca54b442bf
Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 10:50:46 +03:00
Igor Motov
13a8835e5a
Geo: Change order of parameter in Geometries to lon, lat (#45332)
Changes the order of parameters in Geometries from lat, lon to lon, lat
and moves all Geometry classes are moved to the
org.elasticsearch.geomtery package.

Closes #45048
2019-08-09 13:22:00 -04:00
Igor Motov
612e7e5776
GEO: Switch to using GeoTestUtil to generate random geo shapes (#44635)
Switches to more robust way of generating random test geometries by
reusing lucene's GeoTestUtil. Removes duplicate random geometry
generators by moving them to the test framework.

Closes #37278
2019-07-22 08:51:03 -04:00
Igor Motov
6bd185317e
Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 10:20:39 -04:00
Igor Motov
8029b479b8
Geo: Makes coordinate validator in libs/geo plugable (#43657)
Moves coordinate validation from Geometry constructors into
parser.

Relates #43644
2019-06-27 13:34:33 -04:00
Igor Motov
f6a06d8b22
Geo: Add coerce support to libs/geo WKT parser (#43273)
Adds support for coercing not closed polygons and ignoring Z value
to libs/geo WKT parser.

Closes #43173
2019-06-18 07:03:45 -07:00
Mark Vieira
12d583dbf6
Remove unnecessary usage of Gradle dependency substitution rules (#42773) 2019-06-03 16:18:45 -07:00
Igor Motov
28ad74f889
Geo: Refactor libs/geo parsers (#42549)
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
2019-05-29 20:05:12 -04:00
Igor Motov
6d3fd8401d
Geo: Add GeoJson parser to libs/geo classes (#41575)
Adds GeoJson parser for Geometry classes defined in libs/geo.

Relates #40908 and #29872
2019-04-29 13:40:30 -04:00
Nick Knize
a8870ef98c
Refactor GeoHashUtils (#40869)
This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.
2019-04-25 11:59:13 -05:00