- Added SUM() agg tests (Which autogenerates docs)
- Converted non-finite doubles to nulls in aggregator
The complete set of tests depends on
https://github.com/elastic/elasticsearch/issues/110437, as commented in
code. After completion, the test can be uncommented and everything
should work fine
- Added Percentile aggregation tests and autogen docs
- Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.
`MAX()` currently doesn't work with doubles smaller than
`Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest
non-zero positive, not the smallest double).
This PR adds tests for Max and Min, and fixes the bug (Detected by the
tests).
Also, as the tests now generate the docs, replaced the old docs with the
generated ones, and updated the Max&Min examples.
Some work around aggregation tests, with AVG as an example:
- Added tests and autogenerated docs for AVG
- As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests.
The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)
- Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation.
- Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable
- Added a `TopListTests` example
- This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_
- Adapted Kibana docs to use `type: "agg"` (@drewdaemon)
The current tests are very basic: Consume a page, generate an output,
all in Single aggregation mode (No intermediates, no grouping). More
complex testing will be added in future PRs
Initial PR of https://github.com/elastic/elasticsearch/issues/109917
This adds the documentation for BUCKET as a grouping function and the
addition of the "direct" invocation mode providing a span (in addition
to the auto mode).
* WIP Started adding ST_CONTAINS
* Add generated evaluators
* Reduced warnings and use correct evaluators
* Refactored tests to remove duplicate code, and fixed Contains/multi-components
* Gradle build disallows using getDeclaredField
* Fixed cases where rectangles cross the dateline
* Fixed meta function tests
* Added ST_WITHIN to support inverting ST_CONTAINS
If the ST_CONTAINS is called with the constant on the left, we either have to create a lot more Evaluators to cover that case, or we have to invert it to ST_WITHIN. This inversion was a much easier option.
* Simplify inversion logic
* Add comment on choice of surrogate approach
* Add unit tests and missing fold() function
* Simple code cleanup
* Add integration tests for literals
* Add more integration tests based on actual data
* Generated documentation files
* Add documentation
* Fixed failing function count test
* Add tests that push-to-source works for ST_CONTAINS and ST_WITHIN
* Test more combinations of WITH/CONTAINS and literal on right and left
This also verifies that the re-writing of CONTAINS to WITHIN or vice versa occurs when the literal is on the left.
* test that physical planning also handles doc-values from STATS
* Added more tests for WITHIN/CONTAINS together with CENTROID
This should test the doc-values for points.
* Add cartesian_point tests
* Add cartesian_shape tests
* Disable Lucene-push-down for CARTESIAN data
This is a limitation in Lucene, which we could address as a performance optimization in a future PR, but since it probably requires Lucene changes, it cannot be done in this work.
* Fix doc links
* Added test data and tests for cartesian multi-polygons
Testing INTERSECTS, CONTAINS and WITHIN with multi-polydon fields
* Use required features for spatial points, shapes and centroid
* 8.13.0 is not yet historical version
This needs to be reverted as soon as 8.13.0 is released
* Added st_intersects and st_contains_within 'features'
* Code review updates
* Re-enable lucene push-down
* Added more required_features
* Fix point contains non-point
* Fix point contains point
* Re-enable lucene push-down in tests too
Forgot to change the physical planner unit tests after re-enabling lucene push-down
* Generate automatic docs
* Use generated examples docs
* Generated examples use '-result' prefix (singular)
* Mark spatial functions as preview/experimental
This creates the `VALUES` aggregation function which buffers all field
values it receives and emits them as a multivalued field. It can use a
significant amount of memory and will circuit break if it uses too much
memory, but it's really useful for putting together self-join-like
behavior. It sort of functions as a stop-gap measure until we have more
self-join style things.
In the future we'll have spill-to-disk for aggregations and, likely,
some kind of self-join command for aggregations at least so this will be
able to grow beyond memory. But for now, memory it is.
Example:
```
FROM employees
| EVAL first_letter = SUBSTRING(first_name, 0, 1)
| STATS first_name=VALUES(first_name) BY first_letter
| SORT first_letter
;
first_name:keyword | first_letter:keyword
[Anneke, Alejandro, Anoosh, Amabile, Arumugam] | A
[Bezalel, Berni, Bojan, Basil, Brendon, Berhard, Breannda] | B
[Chirstian, Cristinel, Claudi, Charlene] | C
[Duangkaew, Divier, Domenick, Danel] | D
```
I made this work for everything but `geo_point` and `cartesian_point`
because I'm not 100% sure how to integrate with those. We can grab those
in a follow up.
Closes#103600
* Add initial structure for ST_CENTROID
* Revert "Revert stab at implementing forStats for doc-values vs source"
This reverts commit cfc4341bf4.
* Refined csv-spect tests with st_centroid
* Spotless disagrees with intellij
* Fixes after reverting fieldmapper code to test GeoPointFieldMapper
* Get GeoPointFieldMapperTests working again after enabling doc-values reading
* Simplify after rebase on main
In particular, field-mappers that do not need to know about fields can have simpler calls.
* Support local physical planning of forStats attributes for spatial aggregations
* Get st_centroid aggregation working on doc-values
We changed it to produce BytesRef, so we don't (yet) need any doc-values types.
* Create both DocValues and SourceValues versions of st_centroid
* Support physical planning of DocValues and SourceValues SpatialCentroid
* Improve test for physical planning of DocValues in SpatialCentroid
* Fixed show functions for st_centroid
* More st_centroid tests with mv_expand
To test single and multi-value centroids
* Fix st_centroid from point literals
The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values.
* Teach CsvTests to handle spatial types alternative loading from doc-values
Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock.
* Fixed failing NodeSubclassTests
Required making the new constructor public and enabling Set as a valid parameter in the test framework.
* More complex st_centroid tests and fixed bug with multiple aggs
When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node.
* Update docs/changelog/104218.yaml
* Fix automatically generated changelog file
* Fixed failing test
The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children.
* More tests covering more combinations including MV_EXPAND and grouping
* Added cartesian st_centroid with grouping test
We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code.
* Reduce flaky tests by sorting results
* Reduce flaky tests by sorting results
* Added tests for stats on stats to ensure planner coped
* Add unit tests to ensure doc-values in query planning complex cases
* Some minor updates from code review
* Fixes after rebase on main
* Get correct error message on unsupported geo_shape for st_centroid
* Refined point vs shape differences after merging main
* Added basic docs
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 4bc596a442.
* Fixed broken docs tag link
* Simplify BlockReaderSupport in MapperTestCase from code review
* Moved spatial aggregations into a sub-package
* Added some more code review updates, including nested tests
* Get nested functions working, if only from source values for now
* Code review update
* Code review update
* Added second location column to airports for wider testing
* Use second location in tests, including nulls
Includes a test fix for loading and converting nulls to encoded longs.
* Fixed bug supporting multi spatial aggregations in the local node
The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data.
* Renamed forStats to fieldExtractPreference for clarity
Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases.
* EsqlSpecIT was failing on very high precision centroids on different computers
This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests.
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 12c6980881.
* Fixed changelog entry
* Break out 'Limitations' into separate page
* Add REST API docs
* Restructure commands, functions, and operators refs
* Add placeholder for getting started guide
* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'
* Add placeholder for Kibana page
* Add link from landing page
* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs
* Reword default LIMIT
* Add support for COUNT(*)
* Move 'Commands' and 'Functions and operators' to individual pages
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>