Resolves#111842
This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward.
It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.
Making that assumption does, however, create some weirdness. Consider two comparisons:
TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
- Added mv_median_absolute_deviation function
- Added possibility of having a fixed param in Multivalue "ascending" functions
- Add surrogate to MedianAbsoluteDeviation
### Calculations used to avoid overflows
First, a quick recap of how the MAD is calculated:
1. Sort values, and get the median
2. Calculate the difference between each value with the median (`abs(median - value)`)
3. Sort the differences, and get their median
Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`.
To solve this, some types are up-casted as follow:
- Int: Stored as longs, simple approach
- Long: Stored as longs, but switched to unsigned long representation when calculating the differences
- Unsigned long: No effect; the resulting range is the same
- Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort
Closes https://github.com/elastic/elasticsearch/issues/111590
- Added the `mv_percentile(values, percentile)` function
- Used as a surrogate in the `percentile(column, percentile)` aggregation
- Updated docs to specify that the surrogate _should_ be implemented if possible
The same way as mv_median does, this yields exact results (Ignoring double operations error).
For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`)
Closes https://github.com/elastic/elasticsearch/issues/111591
- Added SUM() agg tests (Which autogenerates docs)
- Converted non-finite doubles to nulls in aggregator
The complete set of tests depends on
https://github.com/elastic/elasticsearch/issues/110437, as commented in
code. After completion, the test can be uncommented and everything
should work fine
- Added Percentile aggregation tests and autogen docs
- Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.
`MAX()` currently doesn't work with doubles smaller than
`Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest
non-zero positive, not the smallest double).
This PR adds tests for Max and Min, and fixes the bug (Detected by the
tests).
Also, as the tests now generate the docs, replaced the old docs with the
generated ones, and updated the Max&Min examples.
Some work around aggregation tests, with AVG as an example:
- Added tests and autogenerated docs for AVG
- As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests.
The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)
- Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation.
- Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable
- Added a `TopListTests` example
- This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_
- Adapted Kibana docs to use `type: "agg"` (@drewdaemon)
The current tests are very basic: Consume a page, generate an output,
all in Single aggregation mode (No intermediates, no grouping). More
complex testing will be added in future PRs
Initial PR of https://github.com/elastic/elasticsearch/issues/109917
* WIP Started refactoring in preparation for ST_DISTANCE
* Initial evaluators for ST_DISTANCE
* Update docs/changelog/108764.yaml
* Fix invalid changelog generated by CI
* Register function and get unit tests working
* Fixed failing meta function description tests, and refined descriptions
* Added initial CsvTests and calculate Geo differently to Cartesian
* Added more csv-spec tests and changed to arcDistance for accuracy
* Added generated docs files
* Link to generated docs
* Fix examples tag for linking from generated docs
* Skip wrapper function
And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class
* Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN
* Code style
* Added more tests, this time for sorting on distance
* Fixes after rebase on main
* The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary
So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases.
The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests.
We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types.
* Fixed function count after rebasing on main
* Update docs/changelog/108764.yaml
* Added generated docs for ST_DWITHIN
* Connect docs for ST_DWITHIN
* Add back issue link
* Remove support for ST_DWITHIN
* Update docs/changelog/108764.yaml
* Bring back link to issue in changelog
* Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java
Co-authored-by: Ignacio Vera <iverase@gmail.com>
* Revert reformatting of function descriptions
We should put this into a separate PR
* Github merged commit with incorrectly formatted whitespace
---------
Co-authored-by: Ignacio Vera <iverase@gmail.com>
Add support for the string manipulation function REPEAT(string, number). This function concatenates the string argument with itself the specified number of times. If number is 0 an empty string is returned. If number is less than 0, null is returned and a warning is logged. If number is less than 0 and is a constant, the query will fail without executing.
Adding `MV_APPEND(value1, value2)` function, that appends two values
creating a single multi-value. If one or both the inputs are
multi-values, the result is the concatenation of all the values, eg.
```
MV_APPEND([a, b], [c, d]) -> [a, b, c, d]
```
~I think for this specific case it makes sense to consider `null` values
as empty arrays, so that~ ~MV_APPEND(value, null) -> value~ ~It is
pretty uncommon for ESQL (all the other functions, apart from
`COALESCE`, short-circuit to `null` when one of the values is null), so
let's discuss this behavior.~
[EDIT] considering the feedback from Andrei, I changed this logic and
made it consistent with the other functions: now if one of the
parameters is null, the function returns null
Added ESQL function to get the prefix of an IP. It works now with both
IPv4 and IPv6. For users planning to use it with mixed IPs, we may need
to add a function like "is_ipv4()" first.
**About the skipped test:** There's currently a "bug" in the
evaluators//functions that return null. Evaluators can't handle them.
We'll work on support for that in another PR. It affects other
functions, like `substring()`. In this function, however, it only
affects in "wrong" cases (Like an invalid prefix), so it has no impact.
Fixes https://github.com/elastic/elasticsearch/issues/99064
- Added the cube root function to ESQL (`CBRT(x)`). Nearly identical to SQRT, but without the negative numbers exception
- Added docs generation support for Windows end lines (CRLF), as within the examples, it was writing the "\r" without the "\n" (Which was being converted to "\\n"), and some other inconsistencies
- Some updates to `package-info.java` documentation over how to create functions
- Fixes https://github.com/elastic/elasticsearch/issues/108675
Functions issue: https://github.com/elastic/elasticsearch/issues/98545
This improves the tests and docs for a few functions, specifically `E`,
`FLOOR`, `PI`, `POW`, and `ROUND`. The examples and tested signatures
will get copied into the docs and kibana signatures.
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
This renames the function AUTO_BUCKET to just BUCKET.
It also removes the experimental tagging of the function in the docs, making it generally available.
* WIP Started developing ST_DISJOINT
Initially based on ST_INTERSECTS
* Fix functions list and add spatial point integration tests
* Update docs/changelog/107007.yaml
* More tests for shapes and cartesian-multigeoms
* Some more tests to highlight issues with DISJOINT on cartesian point indices
* Disable Lucene push-down for DISJOINT on cartesian point indices
* Added docs for ST_DISJOINT
* Support DISJOINT in the lucene-pushdown code for cartesian point indexes
* Re-enable push-to-source for DISJOINT on cartesian_point indices
* Fix docs example
* Try fix internal docs links which are not being rendered
* Fixed disjoint on empty geometry
* Added tests on empty linestring, and changed lucene push-down to exception
In lucene code only LineString can be empty, but in Elasticsearch even that is not allowed, resulting in parsing errors. So we cannot get to this code in the lucene push-down and now throw an error instead. The tests now assert on the warnings.
Note that for any predicate DISJOINT and INTERSECTS alike, the predicate fails, because the parsing error results in null, the function returns null, the predicate interprets this as false, and no documents match. This null-in-null-out rule means that DISJOINT and INTERSECTS give the same answer on invalid geometries.
* Add ES|QL signum function
* Update docs/changelog/106866.yaml
* Skip csv tests for versions older than 8.14
* Reference layout docs file and fix instructions for adding functions
* Break csv specs by param type
* More tests
This merges all of the hand written docs for `LOG` and `LOG10` into the
annotations which updates the `META FUNCTIONS` - now it'll always be the
same as the docs. This also deletes the hand maintained docs and let's
the documentation generation process rebuild it.
* WIP Started adding ST_CONTAINS
* Add generated evaluators
* Reduced warnings and use correct evaluators
* Refactored tests to remove duplicate code, and fixed Contains/multi-components
* Gradle build disallows using getDeclaredField
* Fixed cases where rectangles cross the dateline
* Fixed meta function tests
* Added ST_WITHIN to support inverting ST_CONTAINS
If the ST_CONTAINS is called with the constant on the left, we either have to create a lot more Evaluators to cover that case, or we have to invert it to ST_WITHIN. This inversion was a much easier option.
* Simplify inversion logic
* Add comment on choice of surrogate approach
* Add unit tests and missing fold() function
* Simple code cleanup
* Add integration tests for literals
* Add more integration tests based on actual data
* Generated documentation files
* Add documentation
* Fixed failing function count test
* Add tests that push-to-source works for ST_CONTAINS and ST_WITHIN
* Test more combinations of WITH/CONTAINS and literal on right and left
This also verifies that the re-writing of CONTAINS to WITHIN or vice versa occurs when the literal is on the left.
* test that physical planning also handles doc-values from STATS
* Added more tests for WITHIN/CONTAINS together with CENTROID
This should test the doc-values for points.
* Add cartesian_point tests
* Add cartesian_shape tests
* Disable Lucene-push-down for CARTESIAN data
This is a limitation in Lucene, which we could address as a performance optimization in a future PR, but since it probably requires Lucene changes, it cannot be done in this work.
* Fix doc links
* Added test data and tests for cartesian multi-polygons
Testing INTERSECTS, CONTAINS and WITHIN with multi-polydon fields
* Use required features for spatial points, shapes and centroid
* 8.13.0 is not yet historical version
This needs to be reverted as soon as 8.13.0 is released
* Added st_intersects and st_contains_within 'features'
* Code review updates
* Re-enable lucene push-down
* Added more required_features
* Fix point contains non-point
* Fix point contains point
* Re-enable lucene push-down in tests too
Forgot to change the physical planner unit tests after re-enabling lucene push-down
* Generate automatic docs
* Use generated examples docs
* Generated examples use '-result' prefix (singular)
* Mark spatial functions as preview/experimental
This updates the in-code docs on the trig functions to line up with the
docs, removes the docs, and uses the now mostly identical generated
docs. This means we only need to document these functions in one place -
right next to the code.
This modifies the ESQL test infrastructure to generate more of the
documentation for functions. It generates the *Description* section, the
*Examples* section, and the *Parameters* section as separate files so we
can use them as needed. It also generates a `layout` file that's just
a guess as to how to render the whole thing. In some cases it'll work
and we can use that instead of hand maintaining a "top level"
description file for the function.
Most newly generated files are unused. We have to chose to pick them up
by replacing the sections we were manually maintaining with an include
of the generated section. Or by replacing the entire hand maintained
file with the generated top level file.
Relates to #104247