Commit graph

10 commits

Author SHA1 Message Date
Fang Xing
8abc8857f2
[ES|QL] weighted_avg (#109993)
* weighted_avg
2024-07-02 18:29:02 -04:00
Iván Cea Fontenla
c89ee3b648
ESQL: Renamed TopList to Top (#110347)
Rename TopList aggregation to Top, after internal discussions
2024-07-02 03:52:24 +10:00
Iván Cea Fontenla
fc0313f429
ESQL: Add aggregations testing base and docs (#110042)
- Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation.
  - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable
- Added a `TopListTests` example
  - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_
- Adapted Kibana docs to use `type: "agg"` (@drewdaemon)

The current tests are very basic: Consume a page, generate an output,
all in Single aggregation mode (No intermediates, no grouping). More
complex testing will be added in future PRs

Initial PR of https://github.com/elastic/elasticsearch/issues/109917
2024-06-27 21:21:55 +10:00
Bogdan Pintea
a21242054b
ESQL: Document BUCKET as a grouping function (#107864)
This adds the documentation for BUCKET as a grouping function and the
addition of the "direct" invocation mode providing a span (in addition
to the auto mode).
2024-04-25 12:38:12 -04:00
Craig Taverner
d915b964ba
Rename ST_CENTROID to ST_CENTROID_AGG (#107226)
* Rename ST_CENTROID to ST_CENTROID_AGG

In order to allow development of a scalar ST_CENTROID function.

* Fix table alignment
2024-04-10 17:56:45 +02:00
Craig Taverner
2380492fac
ESQL: Support ST_CONTAINS and ST_WITHIN (#106503)
* WIP Started adding ST_CONTAINS

* Add generated evaluators

* Reduced warnings and use correct evaluators

* Refactored tests to remove duplicate code, and fixed Contains/multi-components

* Gradle build disallows using getDeclaredField

* Fixed cases where rectangles cross the dateline

* Fixed meta function tests

* Added ST_WITHIN to support inverting ST_CONTAINS

If the ST_CONTAINS is called with the constant on the left, we either have to create a lot more Evaluators to cover that case, or we have to invert it to ST_WITHIN. This inversion was a much easier option.

* Simplify inversion logic

* Add comment on choice of surrogate approach

* Add unit tests and missing fold() function

* Simple code cleanup

* Add integration tests for literals

* Add more integration tests based on actual data

* Generated documentation files

* Add documentation

* Fixed failing function count test

* Add tests that push-to-source works for ST_CONTAINS and ST_WITHIN

* Test more combinations of WITH/CONTAINS and literal on right and left

This also verifies that the re-writing of CONTAINS to WITHIN or vice versa occurs when the literal is on the left.

* test that physical planning also handles doc-values from STATS

* Added more tests for WITHIN/CONTAINS together with CENTROID

This should test the doc-values for points.

* Add cartesian_point tests

* Add cartesian_shape tests

* Disable Lucene-push-down for CARTESIAN data

This is a limitation in Lucene, which we could address as a performance optimization in a future PR, but since it probably requires Lucene changes, it cannot be done in this work.

* Fix doc links

* Added test data and tests for cartesian multi-polygons

Testing INTERSECTS, CONTAINS and WITHIN with multi-polydon fields

* Use required features for spatial points, shapes and centroid

* 8.13.0 is not yet historical version

This needs to be reverted as soon as 8.13.0 is released

* Added st_intersects and st_contains_within 'features'

* Code review updates

* Re-enable lucene push-down

* Added more required_features

* Fix point contains non-point

* Fix point contains point

* Re-enable lucene push-down in tests too

Forgot to change the physical planner unit tests after re-enabling lucene push-down

* Generate automatic docs

* Use generated examples docs

* Generated examples use '-result' prefix (singular)

* Mark spatial functions as preview/experimental
2024-04-02 10:31:00 +02:00
Nik Everett
fa00e6176f
ESQL: Values aggregation function (#106065)
This creates the `VALUES` aggregation function which buffers all field
values it receives and emits them as a multivalued field. It can use a
significant amount of memory and will circuit break if it uses too much
memory, but it's really useful for putting together self-join-like
behavior. It sort of functions as a stop-gap measure until we have more
self-join style things.

In the future we'll have spill-to-disk for aggregations and, likely,
some kind of self-join command for aggregations at least so this will be
able to grow beyond memory. But for now, memory it is.

Example:

```
  FROM employees
| EVAL first_letter = SUBSTRING(first_name, 0, 1)
| STATS first_name=VALUES(first_name) BY first_letter
| SORT first_letter
;

                                        first_name:keyword | first_letter:keyword
            [Anneke, Alejandro, Anoosh, Amabile, Arumugam] | A
[Bezalel, Berni, Bojan, Basil, Brendon, Berhard, Breannda] | B
                  [Chirstian, Cristinel, Claudi, Charlene] | C
                      [Duangkaew, Divier, Domenick, Danel] | D
```

I made this work for everything but `geo_point` and `cartesian_point`
because I'm not 100% sure how to integrate with those. We can grab those
in a follow up.

Closes #103600
2024-03-21 12:52:04 -04:00
Craig Taverner
eb1c490264
ESQL: Support reading points from doc-values for STATS (#104218)
* Add initial structure for ST_CENTROID

* Revert "Revert stab at implementing forStats for doc-values vs source"

This reverts commit cfc4341bf4.

* Refined csv-spect tests with st_centroid

* Spotless disagrees with intellij

* Fixes after reverting fieldmapper code to test GeoPointFieldMapper

* Get GeoPointFieldMapperTests working again after enabling doc-values reading

* Simplify after rebase on main

In particular, field-mappers that do not need to know about fields can have simpler calls.

* Support local physical planning of forStats attributes for spatial aggregations

* Get st_centroid aggregation working on doc-values

We changed it to produce BytesRef, so we don't (yet) need any doc-values types.

* Create both DocValues and SourceValues versions of st_centroid

* Support physical planning of DocValues and SourceValues SpatialCentroid

* Improve test for physical planning of DocValues in SpatialCentroid

* Fixed show functions for st_centroid

* More st_centroid tests with mv_expand

To test single and multi-value centroids

* Fix st_centroid from point literals

The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values.

* Teach CsvTests to handle spatial types alternative loading from doc-values

Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock.

* Fixed failing NodeSubclassTests

Required making the new constructor public and enabling Set as a valid parameter in the test framework.

* More complex st_centroid tests and fixed bug with multiple aggs

When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node.

* Update docs/changelog/104218.yaml

* Fix automatically generated changelog file

* Fixed failing test

The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children.

* More tests covering more combinations including MV_EXPAND and grouping

* Added cartesian st_centroid with grouping test

We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code.

* Reduce flaky tests by sorting results

* Reduce flaky tests by sorting results

* Added tests for stats on stats to ensure planner coped

* Add unit tests to ensure doc-values in query planning complex cases

* Some minor updates from code review

* Fixes after rebase on main

* Get correct error message on unsupported geo_shape for st_centroid

* Refined point vs shape differences after merging main

* Added basic docs

* Delete docs/changelog/104218.yaml

* Revert "Delete docs/changelog/104218.yaml"

This reverts commit 4bc596a442.

* Fixed broken docs tag link

* Simplify BlockReaderSupport in MapperTestCase from code review

* Moved spatial aggregations into a sub-package

* Added some more code review updates, including nested tests

* Get nested functions working, if only from source values for now

* Code review update

* Code review update

* Added second location column to airports for wider testing

* Use second location in tests, including nulls

Includes a test fix for loading and converting nulls to encoded longs.

* Fixed bug supporting multi spatial aggregations in the local node

The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data.

* Renamed forStats to fieldExtractPreference for clarity

Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases.

* EsqlSpecIT was failing on very high precision centroids on different computers

This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests.

* Delete docs/changelog/104218.yaml

* Revert "Delete docs/changelog/104218.yaml"

This reverts commit 12c6980881.

* Fixed changelog entry
2024-01-23 16:04:45 +01:00
AlexB
931dcae41d
Add improvements to the ES|QL docs (#101195)
Content and structural improvements to the ES|QL docs

---------

Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-10-23 07:45:42 -07:00
Abdon Pijpelink
8ac4ba751e
Restructure ES|QL docs (#100806)
* Break out 'Limitations' into separate page

* Add REST API docs

* Restructure commands, functions, and operators refs

* Add placeholder for getting started guide

* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'

* Add placeholder for Kibana page

* Add link from landing page

* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs

* Reword default LIMIT

* Add support for COUNT(*)

* Move 'Commands' and 'Functions and operators' to individual pages

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-10-17 17:36:14 +02:00