Commit graph

44 commits

Author SHA1 Message Date
Carlos Delgado
c3a2b19993
[8.x] ESQL QSTR function (#112590) (#113189) 2024-09-23 10:13:53 +02:00
Fang Xing
e8569356ea
[ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193)
explicit cast to date_period and time_duration in arithmic operation
2024-09-09 14:56:43 -04:00
Chris Berkhout
fbaeb1ee61
[ESQL] Add SPACE function (#112350)
Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).
2024-09-09 21:41:35 +10:00
Iván Cea Fontenla
fc2760cfd4
ESQL: mv_median_absolute_deviation function (#112055)
- Added mv_median_absolute_deviation function
- Added possibility of having a fixed param in Multivalue "ascending" functions
- Add surrogate to MedianAbsoluteDeviation

### Calculations used to avoid overflows
First, a quick recap of how the MAD is calculated:
1. Sort values, and get the median
2. Calculate the difference between each value with the median (`abs(median - value)`)
3. Sort the differences, and get their median

Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`.
To solve this, some types are up-casted as follow:
- Int: Stored as longs, simple approach
- Long: Stored as longs, but switched to unsigned long representation when calculating the differences
- Unsigned long: No effect; the resulting range is the same
- Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort

Closes https://github.com/elastic/elasticsearch/issues/111590
2024-09-09 10:04:25 +02:00
Iván Cea Fontenla
65ce50c60a
ESQL: Added mv_percentile function (#111749)
- Added the `mv_percentile(values, percentile)` function
- Used as a surrogate in the `percentile(column, percentile)` aggregation
- Updated docs to specify that the surrogate _should_ be implemented if possible

The same way as mv_median does, this yields exact results (Ignoring double operations error).
For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`)

Closes https://github.com/elastic/elasticsearch/issues/111591
2024-08-20 15:29:19 +02:00
Pablo Machado
f79c62157d
ESQL: Add MV_PSERIES_WEIGHTED_SUM for score calculations used by security solution (#109017)
* Create MV_RIEMANN_ZETA scalar multivalue function



---------

Co-authored-by: Nik Everett <nik9000@gmail.com>
2024-07-31 12:08:28 +02:00
Iván Cea Fontenla
bc69827e1e
ESQL: WEIGHTED_AVG aggregation tests and docs (#111449) 2024-07-31 00:42:23 +10:00
Iván Cea Fontenla
735d80dffd
ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409) 2024-07-30 03:07:15 +10:00
Iván Cea Fontenla
826d49448b
ESQL: Added Median and MedianAbsoluteDeviation aggregations tests and kibana docs (#111231) 2024-07-26 22:11:01 +10:00
Iván Cea Fontenla
595d907f61
ESQL: SpatialCentroid aggregation tests and docs (#111236) 2024-07-26 10:41:18 +02:00
Iván Cea Fontenla
101775b93d
Added Sum aggregation tests and docs (#110984)
- Added SUM() agg tests (Which autogenerates docs)
- Converted non-finite doubles to nulls in aggregator

The complete set of tests depends on
https://github.com/elastic/elasticsearch/issues/110437, as commented in
code. After completion, the test can be uncommented and everything
should work fine
2024-07-22 21:43:58 +10:00
Iván Cea Fontenla
0e68117935
Added Percentile aggregation tests and Kibana docs (#111050)
- Added Percentile aggregation tests and autogen docs
- Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.
2024-07-19 14:28:11 +02:00
Carlos Delgado
453b82706d
Add the EXP ES|QL function (#110879) 2024-07-16 16:36:01 +02:00
Iván Cea Fontenla
5d3512fb33
ESQL: Fix Max doubles bug with negatives and add tests for Max and Min (#110586)
`MAX()` currently doesn't work with doubles smaller than
`Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest
non-zero positive, not the smallest double).

This PR adds tests for Max and Min, and fixes the bug (Detected by the
tests).

Also, as the tests now generate the docs, replaced the old docs with the
generated ones, and updated the Max&Min examples.
2024-07-09 21:05:00 +10:00
Iván Cea Fontenla
38cd0b333e
ESQL: AVG aggregation tests and ignore complex surrogates (#110579)
Some work around aggregation tests, with AVG as an example:
- Added tests and autogenerated docs for AVG
- As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests.

The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)
2024-07-09 12:01:46 +02:00
Iván Cea Fontenla
c89ee3b648
ESQL: Renamed TopList to Top (#110347)
Rename TopList aggregation to Top, after internal discussions
2024-07-02 03:52:24 +10:00
Iván Cea Fontenla
fc0313f429
ESQL: Add aggregations testing base and docs (#110042)
- Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation.
  - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable
- Added a `TopListTests` example
  - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_
- Adapted Kibana docs to use `type: "agg"` (@drewdaemon)

The current tests are very basic: Consume a page, generate an output,
all in Single aggregation mode (No intermediates, no grouping). More
complex testing will be added in future PRs

Initial PR of https://github.com/elastic/elasticsearch/issues/109917
2024-06-27 21:21:55 +10:00
Craig Taverner
536d614694
ES|QL ST_DISTANCE Function (#108764)
* WIP Started refactoring in preparation for ST_DISTANCE

* Initial evaluators for ST_DISTANCE

* Update docs/changelog/108764.yaml

* Fix invalid changelog generated by CI

* Register function and get unit tests working

* Fixed failing meta function description tests, and refined descriptions

* Added initial CsvTests and calculate Geo differently to Cartesian

* Added more csv-spec tests and changed to arcDistance for accuracy

* Added generated docs files

* Link to generated docs

* Fix examples tag for linking from generated docs

* Skip wrapper function

And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class

* Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN

* Code style

* Added more tests, this time for sorting on distance

* Fixes after rebase on main

* The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary

So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases.

The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests.

We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types.

* Fixed function count after rebasing on main

* Update docs/changelog/108764.yaml

* Added generated docs for ST_DWITHIN

* Connect docs for ST_DWITHIN

* Add back issue link

* Remove support for ST_DWITHIN

* Update docs/changelog/108764.yaml

* Bring back link to issue in changelog

* Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java

Co-authored-by: Ignacio Vera <iverase@gmail.com>

* Revert reformatting of function descriptions

We should put this into a separate PR

* Github merged commit with incorrectly formatted whitespace

---------

Co-authored-by: Ignacio Vera <iverase@gmail.com>
2024-06-21 11:59:44 +02:00
Parker Timmins
bb3ff8e924
ESQL: add REPEAT string function (#109220)
Add support for the string manipulation function REPEAT(string, number). This function concatenates the string argument with itself the specified number of times. If number is 0 an empty string is returned. If number is less than 0, null is returned and a warning is logged. If number is less than 0 and is a constant, the query will fail without executing.
2024-06-04 16:32:43 -05:00
Luigi Dell'Aquila
5f6e8f687b
ES|QL: add MV_APPEND function (#107001)
Adding `MV_APPEND(value1, value2)` function, that appends two values
creating a single multi-value. If one or both the inputs are
multi-values, the result is the concatenation of all the values, eg.

```
MV_APPEND([a, b], [c, d]) -> [a, b, c, d]
```

~I think for this specific case it makes sense to consider `null` values
as empty arrays, so that~ ~MV_APPEND(value, null) -> value~ ~It is
pretty uncommon for ESQL (all the other functions, apart from
`COALESCE`, short-circuit to `null` when one of the values is null), so
let's discuss this behavior.~

[EDIT] considering the feedback from Andrei, I changed this logic and
made it consistent with the other functions: now if one of the
parameters is null, the function returns null
2024-06-05 03:42:29 +10:00
Iván Cea Fontenla
f16f71e2a2
ESQL: Add ip_prefix function (#109070)
Added ESQL function to get the prefix of an IP. It works now with both
IPv4 and IPv6. For users planning to use it with mixed IPs, we may need
to add a function like "is_ipv4()" first.

**About the skipped test:** There's currently a "bug" in the
evaluators//functions that return null. Evaluators can't handle them.
We'll work on support for that in another PR. It affects other
functions, like `substring()`. In this function, however, it only
affects in "wrong" cases (Like an invalid prefix), so it has no impact.

Fixes https://github.com/elastic/elasticsearch/issues/99064
2024-05-29 10:23:45 -04:00
Iván Cea Fontenla
62b372b4dc
ESQL: CBRT function (#108574)
- Added the cube root function to ESQL (`CBRT(x)`). Nearly identical to SQRT, but without the negative numbers exception
- Added docs generation support for Windows end lines (CRLF), as within the examples, it was writing the "\r" without the "\n" (Which was being converted to "\\n"), and some other inconsistencies
- Some updates to `package-info.java` documentation over how to create functions
- Fixes https://github.com/elastic/elasticsearch/issues/108675

Functions issue: https://github.com/elastic/elasticsearch/issues/98545
2024-05-15 16:50:15 +02:00
Fang Xing
11de886346
[ES|QL] Add/Modify annotations for spatial and conditional functions for better doc generation (#107722)
* annotation for spatial functions and conditional functions
2024-05-10 14:49:25 -04:00
Luigi Dell'Aquila
fed808850d
ES|QL: Add unit tests for now() function (#108498) 2024-05-10 14:28:19 +02:00
Fang Xing
4daac77e3b
[ES|QL] Add/Modify annotations for operators for better doc generation (#108220)
* annotation for operators
2024-05-03 22:59:51 -04:00
Fang Xing
7ae08306a0
mv functions (#107839)
Add annotations for MV functions for better doc generation.
2024-05-01 10:47:22 -04:00
Fang Xing
ad15d50863
[ES|QL] more doc generation via annotations (#107541)
Annotations for math functions, datetime functions, string functions, type conversion functions.
2024-04-22 14:43:36 -04:00
Fang Xing
353abef214
[ES|QL] Base64 decoding and encoding functions (#107390)
* add base64 functions
2024-04-15 18:39:26 -04:00
Nik Everett
aac17616a3
ESQL: Improve tests and docs for some functions (#107331)
This improves the tests and docs for a few functions, specifically `E`,
`FLOOR`, `PI`, `POW`, and `ROUND`. The examples and tested signatures
will get copied into the docs and kibana signatures.


Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-04-11 12:41:56 -04:00
Bogdan Pintea
d6f9d1e69e
ESQL: Rename AUTO_BUCKET to just BUCKET (#107197)
This renames the function AUTO_BUCKET to just BUCKET.
It also removes the experimental tagging of the function in the docs, making it generally available.
2024-04-10 12:21:08 +02:00
Luigi Dell'Aquila
2588c72a52
ES|QL: Add unit tests and docs for DATE_TRUNC() (#107145) 2024-04-09 10:41:34 +02:00
Craig Taverner
a7b38394d9
ESQL: Support ST_DISJOINT (#107007)
* WIP Started developing ST_DISJOINT

Initially based on ST_INTERSECTS

* Fix functions list and add spatial point integration tests

* Update docs/changelog/107007.yaml

* More tests for shapes and cartesian-multigeoms

* Some more tests to highlight issues with DISJOINT on cartesian point indices

* Disable Lucene push-down for DISJOINT on cartesian point indices

* Added docs for ST_DISJOINT

* Support DISJOINT in the lucene-pushdown code for cartesian point indexes

* Re-enable push-to-source for DISJOINT on cartesian_point indices

* Fix docs example

* Try fix internal docs links which are not being rendered

* Fixed disjoint on empty geometry

* Added tests on empty linestring, and changed lucene push-down to exception

In lucene code only LineString can be empty, but in Elasticsearch even that is not allowed, resulting in parsing errors. So we cannot get to this code in the lucene push-down and now throw an error instead. The tests now assert on the warnings.

Note that for any predicate DISJOINT and INTERSECTS alike, the predicate fails, because the parsing error results in null, the function returns null, the predicate interprets this as false, and no documents match. This null-in-null-out rule means that DISJOINT and INTERSECTS give the same answer on invalid geometries.
2024-04-08 12:26:26 +02:00
Tommaso Teofili
54eeb622d5
Add ES|QL Locate function (#106899)
* Add ES|QL Locate function
2024-04-05 15:29:54 +02:00
Ioana Tagirta
7b254218fb
Add ES|QL signum function (#106866)
* Add ES|QL signum function

* Update docs/changelog/106866.yaml

* Skip csv tests for versions older than 8.14

* Reference layout docs file and fix instructions for adding functions

* Break csv specs by param type

* More tests
2024-04-04 09:48:35 +02:00
Nik Everett
b97e2d61fb
ESQL: Fixup docs for LOG and LOG10 (#106963)
This merges all of the hand written docs for `LOG` and `LOG10` into the
annotations which updates the `META FUNCTIONS` - now it'll always be the
same as the docs. This also deletes the hand maintained docs and let's
the documentation generation process rebuild it.
2024-04-03 09:46:32 -04:00
Craig Taverner
2380492fac
ESQL: Support ST_CONTAINS and ST_WITHIN (#106503)
* WIP Started adding ST_CONTAINS

* Add generated evaluators

* Reduced warnings and use correct evaluators

* Refactored tests to remove duplicate code, and fixed Contains/multi-components

* Gradle build disallows using getDeclaredField

* Fixed cases where rectangles cross the dateline

* Fixed meta function tests

* Added ST_WITHIN to support inverting ST_CONTAINS

If the ST_CONTAINS is called with the constant on the left, we either have to create a lot more Evaluators to cover that case, or we have to invert it to ST_WITHIN. This inversion was a much easier option.

* Simplify inversion logic

* Add comment on choice of surrogate approach

* Add unit tests and missing fold() function

* Simple code cleanup

* Add integration tests for literals

* Add more integration tests based on actual data

* Generated documentation files

* Add documentation

* Fixed failing function count test

* Add tests that push-to-source works for ST_CONTAINS and ST_WITHIN

* Test more combinations of WITH/CONTAINS and literal on right and left

This also verifies that the re-writing of CONTAINS to WITHIN or vice versa occurs when the literal is on the left.

* test that physical planning also handles doc-values from STATS

* Added more tests for WITHIN/CONTAINS together with CENTROID

This should test the doc-values for points.

* Add cartesian_point tests

* Add cartesian_shape tests

* Disable Lucene-push-down for CARTESIAN data

This is a limitation in Lucene, which we could address as a performance optimization in a future PR, but since it probably requires Lucene changes, it cannot be done in this work.

* Fix doc links

* Added test data and tests for cartesian multi-polygons

Testing INTERSECTS, CONTAINS and WITHIN with multi-polydon fields

* Use required features for spatial points, shapes and centroid

* 8.13.0 is not yet historical version

This needs to be reverted as soon as 8.13.0 is released

* Added st_intersects and st_contains_within 'features'

* Code review updates

* Re-enable lucene push-down

* Added more required_features

* Fix point contains non-point

* Fix point contains point

* Re-enable lucene push-down in tests too

Forgot to change the physical planner unit tests after re-enabling lucene push-down

* Generate automatic docs

* Use generated examples docs

* Generated examples use '-result' prefix (singular)

* Mark spatial functions as preview/experimental
2024-04-02 10:31:00 +02:00
Nik Everett
00b0c54a74
ESQL: Generate docs for the trig functions (#106891)
This updates the in-code docs on the trig functions to line up with the
docs, removes the docs, and uses the now mostly identical generated
docs. This means we only need to document these functions in one place -
right next to the code.
2024-03-29 12:24:31 -04:00
Luigi Dell'Aquila
3e406e2d57
ES|QL: Improve support for TEXT fields in functions (#106810)
Re-submitting https://github.com/elastic/elasticsearch/pull/106688 after
a revert due to a conflict after merge
2024-03-27 08:47:09 -04:00
Luigi Dell'Aquila
720188e95f Revert "ES|QL: Improve support for TEXT fields in functions (#106688)"
This reverts commit 62e3e5fd1b.
2024-03-27 12:14:02 +01:00
Luigi Dell'Aquila
62e3e5fd1b
ES|QL: Improve support for TEXT fields in functions (#106688) 2024-03-27 12:08:10 +01:00
Nik Everett
7c46c735e4
ESQL: Generate docs for ceil (#106616)
This replaces the hand maintained docs for `CEIL` and with the docs
generated by the tests. There shouldn't be any diff in the generated
docs.
2024-03-21 13:01:15 -04:00
Nik Everett
34899069b6
ESQL: Generate a few more docs (#106577)
And improve the error message on csv test failures.
2024-03-21 10:51:35 -04:00
Nik Everett
a1305373f2
ESQL: Use generated docs for abs and acos (#106510)
ESQL: Use generated docs for abs and acos
2024-03-20 16:32:12 -04:00
Nik Everett
1541da5e65
ESQL: Generate more docs (#106367)
This modifies the ESQL test infrastructure to generate more of the
documentation for functions. It generates the *Description* section, the
*Examples* section, and the *Parameters* section as separate files so we
can use them as needed. It also generates a `layout` file that's just
a guess as to how to render the whole thing. In some cases it'll work
and we can use that instead of hand maintaining a "top level"
description file for the function.

Most newly generated files are unused. We have to chose to pick them up
by replacing the sections we were manually maintaining with an include
of the generated section. Or by replacing the entire hand maintained
file with the generated top level file.

Relates to #104247
2024-03-19 15:40:13 -04:00