This modifies the ESQL test infrastructure to generate more of the
documentation for functions. It generates the *Description* section, the
*Examples* section, and the *Parameters* section as separate files so we
can use them as needed. It also generates a `layout` file that's just
a guess as to how to render the whole thing. In some cases it'll work
and we can use that instead of hand maintaining a "top level"
description file for the function.
Most newly generated files are unused. We have to chose to pick them up
by replacing the sections we were manually maintaining with an include
of the generated section. Or by replacing the entire hand maintained
file with the generated top level file.
Relates to #104247
* Support ST_INTERSECTS between geometry column and other geometry or string
* Pushdown to lucene for ST_INTERSECTS on GEO_POINT
* Get geo_shape working in ST_INTERSECTS bypassing SingleValueQuery
* Initial work to support cartesian shape queries in ESQL
* Fixed CSV tests for combined ST_INTERSECTS and ST_CENTROID
* Fixed bug in point-in-shape query for CARTESIAN_POINT
* Added unit tests for SpatialIntersects and fixed a few bugs found
* Added comments to public ShapeQueryBuilder class
* Move calls to random() later to avoid security exception
* Refined type checking support in ST_INTERSECTS
Improved the combinations supported as preparation for removing the uly try/catch way of detecting the difference between WKT and WKB in some code.
* Fixed bugs in incorrect use of doc-values in parameter type matching
Also made a few reminfments, including removing one try/catch approach to differentiating between WKT and WKB.
* Removed second place where we used try/catch to differentiate WKT from WKB
This was a workaround for a mistake in the planning, where we incorrectly mapped incoming types to the wrong FieldEvaluators. We fixed that mistake in an earlier commit.
* Fixed flaky tests were GEO was treated as CARTSIAN
We assumed if the incoming types were constants, they had no CRS, even when they did, which was wrong. For shapes crossing the dateline this lead to different (incorrect) behaviour.
* Fixed a flaky test by removing some point==point optimizations
* Moved spatial intersects to 'spatial' package
When we developed the ST_CENTROID work, this was requested, so let's do it here too.
* Use normal switch on enums
* Cleanup some static utility methods
Now all code paths that can convert a constant string to a geometry use the same code.
* Fixed bugs with non-quantized coordinates, and cleaned up code a little
* Fixed failing test after change to evaluator class names
* Refactored SpatialRelatesFunction into three files, and made evaluatorRules static
This was a general cleanup, making the code more organized, but did also achieve static evaluator rules so we don't re-created these on every query parsing.
* Fixed compile error after rebase
* Removed ConstantAndConstant support, using fold() correctly instead
* better error on circles
* Make sure compound predicates are supported in use-doc-values pushdown
* Testing ENRICH with ST_INTERSECTS
This required adding new data for an ENRICH index, and this data could be tested with a few other related tests, which were also added.
* Added missing mixed-cluster rules for testing only with 8.14
* Fixed some mixed-cluster issues where we failed to mark test for only 8.14
Also added an interesting polygon-polygon intersection case from real data.
* Fix flaky test where cartesian polygons were generated from geo
* Remove support for string literals in ST_INTERSECTS
* Fix failing tests after removing string support
* Removed unused code from previous string literal support (WKT parsing)
* Support case where both fields are points and doc-values
If we have an ST_INTERSECTS and an ST_CENTROID, the centroid asks to load the points as doc-values, and the ST_INTERSECTS needs to therefor support two doc-values points.
* Disallow more than one field from doc-values for ST_INTERSECTS
* Remove unused evaluator classes
* Add tests for multiple doc-values if not in same intersects
* Fix errors after rebase on main
* Fixed bug in missing support for spatial function expressions in EVAL
When a spatial aggregate expects doc-values, this was not being communicated to spatial functions in EVAL, only in WHERE.
* Reduce flaky tests when reading directly from enrich source indices
The test framework does not expect enrich source indices to be used directly in queries, leading to duplicated results on multi-node clusters, so we edit the queries to be less sensitive to this case.
* Fixed failing test
* Code style
* Fixed test file name and added function name annotation
* Added documentation for st_intersects
* Fixed failing show functions test
* Code review changes, notably simplifying the type resolution
* Fixed broken docs link
* Add two new OGC functions ST_X and ST_Y
Recently Nik did work that involved extracting the X and Y coordinates from geo_point data using `to_string(field)` followed by a DISSECT command to re-parse the string to get the X and Y coordinates.
This is much more efficiently achieved using existing known OGC functions `ST_X` and `ST_Y`.
* Update docs/changelog/105768.yaml
* Fixed invalid changelog yaml
* Fixed mixed cluster tests
* Fixed tests and added docs
* Removed false impression that these functions were different for geo/cartesian
With the use of WKB as the core type in the compute engine, many spatial functions are actually the same between these two types, so we should not give the impression they are different.
* Code review comments and reduced object creation.
* Revert temporary StringUtils hack, and fix bug in x/y extraction from WKB
* Revert object creation reduction
* Fixed mistakes in documentation
* Fix automatic generation of spatial function types files
The automatic mapping of spatial function names from class names was not working for spatial types, so the automatic generation of these files did not happen, and in fact existing files were deleted.
In addition, the generation of aggregation functions types does not yet exist at all, so the st_centroid.asciidoc file was always deleted. Until such support exists, this files contents will be moved back into the function definition file.
The railroad diagrams for syntax are now also created, however, not all functions in the documentation actually use these, and certainly none of the `TO_*` type-casting functions do, so we'll not include links to them from the docs, and leave that to the docs team to decide. Personally, while these diagrams are pretty, they contain no additional informational content, and in fact give a cluttered impression to the documentation visual appeal.
* Refined to use an annotation which is more generic
* Document nested expressions for stats
* More docs
* Apply suggestions from review
- count-distinct.asciidoc
- Content restructured, moving the section about approximate counts to end of doc.
- count.asciidoc
- Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.
- percentile.asciidoc
- Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.
- stats.asciidoc
- Clarified the `STATS` command
- Added a note indicating that individual `null` values are skipped during aggregation
* Comment out mentioning a buggy behavior
* Update sum with inline function example, update test file
* Fix typo
* Delete line
* Simplify wording
* Fix conflict fix typo
---------
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Add initial structure for ST_CENTROID
* Revert "Revert stab at implementing forStats for doc-values vs source"
This reverts commit cfc4341bf4.
* Refined csv-spect tests with st_centroid
* Spotless disagrees with intellij
* Fixes after reverting fieldmapper code to test GeoPointFieldMapper
* Get GeoPointFieldMapperTests working again after enabling doc-values reading
* Simplify after rebase on main
In particular, field-mappers that do not need to know about fields can have simpler calls.
* Support local physical planning of forStats attributes for spatial aggregations
* Get st_centroid aggregation working on doc-values
We changed it to produce BytesRef, so we don't (yet) need any doc-values types.
* Create both DocValues and SourceValues versions of st_centroid
* Support physical planning of DocValues and SourceValues SpatialCentroid
* Improve test for physical planning of DocValues in SpatialCentroid
* Fixed show functions for st_centroid
* More st_centroid tests with mv_expand
To test single and multi-value centroids
* Fix st_centroid from point literals
The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values.
* Teach CsvTests to handle spatial types alternative loading from doc-values
Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock.
* Fixed failing NodeSubclassTests
Required making the new constructor public and enabling Set as a valid parameter in the test framework.
* More complex st_centroid tests and fixed bug with multiple aggs
When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node.
* Update docs/changelog/104218.yaml
* Fix automatically generated changelog file
* Fixed failing test
The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children.
* More tests covering more combinations including MV_EXPAND and grouping
* Added cartesian st_centroid with grouping test
We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code.
* Reduce flaky tests by sorting results
* Reduce flaky tests by sorting results
* Added tests for stats on stats to ensure planner coped
* Add unit tests to ensure doc-values in query planning complex cases
* Some minor updates from code review
* Fixes after rebase on main
* Get correct error message on unsupported geo_shape for st_centroid
* Refined point vs shape differences after merging main
* Added basic docs
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 4bc596a442.
* Fixed broken docs tag link
* Simplify BlockReaderSupport in MapperTestCase from code review
* Moved spatial aggregations into a sub-package
* Added some more code review updates, including nested tests
* Get nested functions working, if only from source values for now
* Code review update
* Code review update
* Added second location column to airports for wider testing
* Use second location in tests, including nulls
Includes a test fix for loading and converting nulls to encoded longs.
* Fixed bug supporting multi spatial aggregations in the local node
The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data.
* Renamed forStats to fieldExtractPreference for clarity
Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases.
* EsqlSpecIT was failing on very high precision centroids on different computers
This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests.
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 12c6980881.
* Fixed changelog entry
This adds unit tests cases for all the functions that were missing tests
checking on the correct generation of the Warning headers in case the
execution raised an Exception that lead to a `null` result.
This creates the `MV_FIRST` and `MV_LAST` functions that return the
first and last values from a multivalue field. They are noops from a
single valued field. They are quite similar to `MV_MIN` and `MV_MAX`
except they work on positional data rather than relative size. That
sounds like a large distinction, but in practice our multivalued fields
are often sorted. And when they operate on sorted arrays `MV_MIN` does
*the same* thing as `MV_FIRST`.
But there are some cases where it really does matter - say you are
`SPLIT`ing something - so `MV_FIRST(SPLIT("foo;bar;baz", ";"))` gets you
`foo` like you'd expect. No sorting needed.
Relates to #103879
This updates the use of the exceptions subclassed from
`QlServerException` when the failure reason is user-caused. This ensures
that a 400-class response is returned, instead of a 500-class one.
This optimizes loading fields across many, many indices by resolving the
field loading infrastructure when it's first needed rather than up
front. This speeds things up because, if you are loading from many many
shards, you often don't need to set up the field loading infrastructure
for all shards at all - often you'll just need to set it up for a couple
of the shards.
This fixes `null`'s handling in date math. So far the `null` (of type
`NULL`) has been rejected by the type resolution. This is now allowed
through, leading to a `null` result, inline with the other types.
Fixes#103085.
Improve the docs for is_nan, is_finite, is_infinite functions.
This also adjusts the CamelCase to snake_case conversion, to not
consider the last capital letter (like in `IsNaN`).
* Start working on geo_point and point docs for ESQL
* Added to_cartesianpoint and includes
* Sub-headings for easier reading
* Improve sub-headings
* Hide to_long and support for longs in to_geopoint and to_cartesianpoint
This adds a tiny blurb for each operator to the docs with a railroad
diagram of the operator's syntax and a table of the input and output
types. This also fixes the tests to correctly generate the tables for
operators.
This adds the missing unit tests for the conversion functions.
It also extends the type support by adding the `TEXT` type to those functions that support `KEYWORD` already (which also simplifies the testing, actually). Some functions did have it, some didn't; they now all do.
The change also fixes two defects resulting from better testing coverage: `ToInteger` and `ToUnsignedLong` had some missing necessary exceptions declarations in the decorators for the evaluators.
It also updates `ToInteger`'s `fromDouble()` conversion to use a newly added utility, so that the failed conversions contain the right message (`out of [integer] range`, instead of the confusing `out of [long] range`).
Related: #102488, #102552.
This corrects an earlier mistake in the ES|QL language design. Initially we had thought to have pow return the same type as its inputs, but in practice even for integer inputs this quickly grows out of the representable range, and we returned null much of the time. This also created a lot of edge cases around casting to/from doubles (which the underlying java function uses). The version in this PR follows the java spec, by always casting its inputs to doubles, and returning a double. Doing it this way also allows for a rather significant reduction in lines of code.
I removed many of the tests covering pow specific edge cases. This seems reasonable to me as I expect java.lang.math.pow to be well behaved and most of those edge cases were around type testing which no longer applies. At the same time, this simplification allows us to leverage the new scalar function testing framework, which means better null coverage, better type coverage, and much easier extensibility.
We do consider this a breaking change, but as the feature is still in tech preview and this is a relatively small surface area, we are not too concerned with disruptions.
Resolves#99055
Relates to #100558
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds more tests for some of the `MV_` functions and updates their
docs now that the railroad diagram and table generated by the tests
covers all of the types.
* Break out 'Limitations' into separate page
* Add REST API docs
* Restructure commands, functions, and operators refs
* Add placeholder for getting started guide
* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'
* Add placeholder for Kibana page
* Add link from landing page
* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs
* Reword default LIMIT
* Add support for COUNT(*)
* Move 'Commands' and 'Functions and operators' to individual pages
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>