* Document nested expressions for stats
* More docs
* Apply suggestions from review
- count-distinct.asciidoc
- Content restructured, moving the section about approximate counts to end of doc.
- count.asciidoc
- Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.
- percentile.asciidoc
- Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.
- stats.asciidoc
- Clarified the `STATS` command
- Added a note indicating that individual `null` values are skipped during aggregation
* Comment out mentioning a buggy behavior
* Update sum with inline function example, update test file
* Fix typo
* Delete line
* Simplify wording
* Fix conflict fix typo
---------
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Add initial structure for ST_CENTROID
* Revert "Revert stab at implementing forStats for doc-values vs source"
This reverts commit cfc4341bf4.
* Refined csv-spect tests with st_centroid
* Spotless disagrees with intellij
* Fixes after reverting fieldmapper code to test GeoPointFieldMapper
* Get GeoPointFieldMapperTests working again after enabling doc-values reading
* Simplify after rebase on main
In particular, field-mappers that do not need to know about fields can have simpler calls.
* Support local physical planning of forStats attributes for spatial aggregations
* Get st_centroid aggregation working on doc-values
We changed it to produce BytesRef, so we don't (yet) need any doc-values types.
* Create both DocValues and SourceValues versions of st_centroid
* Support physical planning of DocValues and SourceValues SpatialCentroid
* Improve test for physical planning of DocValues in SpatialCentroid
* Fixed show functions for st_centroid
* More st_centroid tests with mv_expand
To test single and multi-value centroids
* Fix st_centroid from point literals
The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values.
* Teach CsvTests to handle spatial types alternative loading from doc-values
Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock.
* Fixed failing NodeSubclassTests
Required making the new constructor public and enabling Set as a valid parameter in the test framework.
* More complex st_centroid tests and fixed bug with multiple aggs
When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node.
* Update docs/changelog/104218.yaml
* Fix automatically generated changelog file
* Fixed failing test
The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children.
* More tests covering more combinations including MV_EXPAND and grouping
* Added cartesian st_centroid with grouping test
We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code.
* Reduce flaky tests by sorting results
* Reduce flaky tests by sorting results
* Added tests for stats on stats to ensure planner coped
* Add unit tests to ensure doc-values in query planning complex cases
* Some minor updates from code review
* Fixes after rebase on main
* Get correct error message on unsupported geo_shape for st_centroid
* Refined point vs shape differences after merging main
* Added basic docs
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 4bc596a442.
* Fixed broken docs tag link
* Simplify BlockReaderSupport in MapperTestCase from code review
* Moved spatial aggregations into a sub-package
* Added some more code review updates, including nested tests
* Get nested functions working, if only from source values for now
* Code review update
* Code review update
* Added second location column to airports for wider testing
* Use second location in tests, including nulls
Includes a test fix for loading and converting nulls to encoded longs.
* Fixed bug supporting multi spatial aggregations in the local node
The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data.
* Renamed forStats to fieldExtractPreference for clarity
Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases.
* EsqlSpecIT was failing on very high precision centroids on different computers
This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests.
* Delete docs/changelog/104218.yaml
* Revert "Delete docs/changelog/104218.yaml"
This reverts commit 12c6980881.
* Fixed changelog entry
This adds unit tests cases for all the functions that were missing tests
checking on the correct generation of the Warning headers in case the
execution raised an Exception that lead to a `null` result.
This creates the `MV_FIRST` and `MV_LAST` functions that return the
first and last values from a multivalue field. They are noops from a
single valued field. They are quite similar to `MV_MIN` and `MV_MAX`
except they work on positional data rather than relative size. That
sounds like a large distinction, but in practice our multivalued fields
are often sorted. And when they operate on sorted arrays `MV_MIN` does
*the same* thing as `MV_FIRST`.
But there are some cases where it really does matter - say you are
`SPLIT`ing something - so `MV_FIRST(SPLIT("foo;bar;baz", ";"))` gets you
`foo` like you'd expect. No sorting needed.
Relates to #103879
This updates the use of the exceptions subclassed from
`QlServerException` when the failure reason is user-caused. This ensures
that a 400-class response is returned, instead of a 500-class one.
This optimizes loading fields across many, many indices by resolving the
field loading infrastructure when it's first needed rather than up
front. This speeds things up because, if you are loading from many many
shards, you often don't need to set up the field loading infrastructure
for all shards at all - often you'll just need to set it up for a couple
of the shards.
This fixes `null`'s handling in date math. So far the `null` (of type
`NULL`) has been rejected by the type resolution. This is now allowed
through, leading to a `null` result, inline with the other types.
Fixes#103085.
Improve the docs for is_nan, is_finite, is_infinite functions.
This also adjusts the CamelCase to snake_case conversion, to not
consider the last capital letter (like in `IsNaN`).
* Start working on geo_point and point docs for ESQL
* Added to_cartesianpoint and includes
* Sub-headings for easier reading
* Improve sub-headings
* Hide to_long and support for longs in to_geopoint and to_cartesianpoint
This adds a tiny blurb for each operator to the docs with a railroad
diagram of the operator's syntax and a table of the input and output
types. This also fixes the tests to correctly generate the tables for
operators.
This adds the missing unit tests for the conversion functions.
It also extends the type support by adding the `TEXT` type to those functions that support `KEYWORD` already (which also simplifies the testing, actually). Some functions did have it, some didn't; they now all do.
The change also fixes two defects resulting from better testing coverage: `ToInteger` and `ToUnsignedLong` had some missing necessary exceptions declarations in the decorators for the evaluators.
It also updates `ToInteger`'s `fromDouble()` conversion to use a newly added utility, so that the failed conversions contain the right message (`out of [integer] range`, instead of the confusing `out of [long] range`).
Related: #102488, #102552.
This corrects an earlier mistake in the ES|QL language design. Initially we had thought to have pow return the same type as its inputs, but in practice even for integer inputs this quickly grows out of the representable range, and we returned null much of the time. This also created a lot of edge cases around casting to/from doubles (which the underlying java function uses). The version in this PR follows the java spec, by always casting its inputs to doubles, and returning a double. Doing it this way also allows for a rather significant reduction in lines of code.
I removed many of the tests covering pow specific edge cases. This seems reasonable to me as I expect java.lang.math.pow to be well behaved and most of those edge cases were around type testing which no longer applies. At the same time, this simplification allows us to leverage the new scalar function testing framework, which means better null coverage, better type coverage, and much easier extensibility.
We do consider this a breaking change, but as the feature is still in tech preview and this is a relatively small surface area, we are not too concerned with disruptions.
Resolves#99055
Relates to #100558
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds more tests for some of the `MV_` functions and updates their
docs now that the railroad diagram and table generated by the tests
covers all of the types.
* Break out 'Limitations' into separate page
* Add REST API docs
* Restructure commands, functions, and operators refs
* Add placeholder for getting started guide
* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'
* Add placeholder for Kibana page
* Add link from landing page
* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs
* Reword default LIMIT
* Add support for COUNT(*)
* Move 'Commands' and 'Functions and operators' to individual pages
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Fixes https://github.com/elastic/elasticsearch/issues/99507
Enhance SHOW FUNCTIONS command to return as _structured_ information as
possible about the function signature, ie. - function name - return type
- param names - param types - param descriptions
For now, as an example, the annotations are used only on `sin()` and
`date_parse()` functions; **if we agree on this approach**, I'll proceed
to - enhance all the currently implemented functions with the needed
information - improve the function tests to verify that every new
implemented function provides meaningful information
---
This feature can be useful for the end user, but the main goal is to
give Kibana an easy way to produce in-line documentation (contextual
messages, autocomplete) for functions
Similar to current implementation, that has a `@Named("paramName")`
annotation for function parameters, this PR introduces two more
annotations `@Param(name, type, description, optional)` and
`@FunctionInfo()` to provide information about single parameters and
functions.
The result of `SHOW FUNCTIONS` query will have the following columns: -
name (keyword): the function name - synopsis (keyword): the full
signature of the funciton, eg. `double sin(n:integer|long|double)` -
argNames (keyword MV): the function argument names - argTypes (keyword
MV): the function argument types - argDescriptions (keyword MD): a
textual description of each function argument - returnType (keyword):
the return type of the function - description (keyword): a textual
description of the function
---
Open questions: - ~~how structured shoud *types* be? Eg. should we have
a strict `@Typed("keyword")`/`@Typed({"keyword", "text"})` or should we
have a more generic type description, eg. `@Typed("numeric")`,
`@Typed("any")`? The first one is more useful for API consumption but
it's hard with our complex type system (type classes, custom types,
unsupported and so on); the second one is less structured, but probably
more useful for documentation, that is the most immediate use case of
this feature.~~ All the types are listed explicitly
- ~~we have alternatives for the synopsis, eg.~~
- ~~`functionName(<paramName>:<paramType>, ...): <returnType>`~
- ~~`<returnType> functionName(<paramName>:<paramType>, ...)`~~
- ~~`<returnType> functionName(<paramType> <paramName>, ...)`~~
Using `<returnType> functionName(<paramName>:<paramType>, ...)` for now. If multiple types are supported, then they will be separated by pipes, eg. `double sin(n:integer|long|double)`.
This creates `Block.Ref`, a reference to a `Block` which may or may not
be part of a `Page`. `Block.Ref` is `Releasable` and closing it is a
noop if the `Block` is part of a `Page`, but if it is "free floating"
then closing the `Block.Ref` will close the block.
It also modified `ExpressionEvaluator` to return a `Block.Ref` instead
of a `Block` - so you tend to work with `ExpressionEvaluator`s like
this:
```
try (Block.Ref ref = eval.eval(page)) {
return ref.block().doStuff();
}
```
This should make it *much* easier to release the memory from `Block`s
built by `ExpressionEvaluator`s.
This change is mostly mechanical, introducing the new signature for
`ExpressionEvaluator`. In a follow up change I'll modify the tests to
make sure we're correctly using it to close pages.
I did think about changing `ExpressionEvaluator` to add a method telling
you if the block that it returns must be closed or not. This would have
been more difficult to work with, and, ultimately, limiting.
Specifically, it is possible for an `ExpressionEvaluator` to *sometimes*
return a free floating block and other times return one that is
contained in a `Page`. Imagine `mv_concat` - it returns the block it
receives if the block doesn't have multivalued fields. Otherwise it
concats things. If that block happens to come directly out of the
`Page`, then `mv_concat` will sometimes produce free floating blocks and
sometimes not.
This prevents `CONCAT` from using an unbounded amount of memory by
hooking it's temporary value into the circuit breaker. To do so, it
makes *all* `ExpressionEvaluator`s `Releasable`. Most of the changes in
this PR just plumb that through to every evaluator. The rest of the
changes correctly release evaluators after their use.
I considered another tactic but didn't like it as much, even though the
number of changes would be smaller - I could have created a fresh,
`Releasable` temporary value for every `Page`. It would be pretty
contained keep the releasable there. But I wanted to share the temporary
state across runs to avoid a bunch of allocations.
Here's a script that used to crash before this PR but is fine after:
```
curl -uelastic:password -XDELETE localhost:9200/test
curl -HContent-Type:application/json -uelastic:password -XPUT localhost:9200/test -d'{
"mappings": {
"properties": {
"short": {
"type": "keyword"
}
}
}
}'
curl -HContent-Type:application/json -uelastic:password -XPUT localhost:9200/test/_doc/1?refresh -d'{"short": "short"}'
echo -n '{"query": "FROM test ' > /tmp/evil
for i in {0..9}; do
echo -n '| EVAL short = CONCAT(short' >> /tmp/evil
for j in {1..9}; do
echo -n ', short' >> /tmp/evil
done
echo -n ')' >> /tmp/evil
done
echo '| EVAL len = LENGTH(short) | KEEP len"}'>> /tmp/evil
curl -HContent-Type:application/json -uelastic:password -XPOST localhost:9200/_query?pretty --data-binary @/tmp/evil
```
This swaps the argument of `date_extract()`, `date_format()` and
`date_parse()` functions, to align with `date_trunc()`. The field
argument is now always last, even for _format() and _parse(), whose
optional argument will now be provided as the first one.
Added an implementation for `ends_with` function in esql. `ends_with` -
Returns a boolean that indicates whether a keyword string ends with
another string. Also made sure that the docs look alright:
<img width="1677" alt="Screenshot 2023-09-16 at 18 10 46"
src="eccd81e1-40a2-4a66-a514-cf3e4205f9da">