Commit graph

415 commits

Author SHA1 Message Date
Mark Tozzi
2482f06f3c
ESQL - docs for to_date_nanos (#120124)
I forgot to link the ToDateNanos docs when I merged that function.
---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-01-14 16:31:24 -05:00
Ioana Tagirta
f5ac68df95
ESQL: Document support for semantic_text field mapping (#120052)
* Document support for semantic_text field mapping

* Address review comments
2025-01-13 22:18:47 +01:00
Nik Everett
c990377c95
ESQL: Limit memory usage of fold (#118602)
`fold` can be surprisingly heavy! The maximally efficient/paranoid thing
would be to fold each expression one time, in the constant folding rule,
and then store the result as a `Literal`. But this PR doesn't do that
because it's a big change. Instead, it creates the infrastructure for
tracking memory usage for folding as plugs it into as many places as
possible. That's not perfect, but it's better.

This infrastructure limit the allocations of fold similar to the
`CircuitBreaker` infrastructure we use for values, but it's different
in a critical way: you don't manually free any of the values. This is
important because the plan itself isn't `Releasable`, which is required
when using a real CircuitBreaker. We could have tried to make the plan
releasable, but that'd be a huge change.

Right now there's a single limit of 5% of heap per query. We create the
limit at the start of query planning and use it throughout planning.

There are about 40 places that don't yet use it. We should get them
plugged in as quick as we can manage. After that, we should look to the
maximally efficient/paranoid thing that I mentioned about waiting for
constant folding. That's an even bigger change, one I'm not equipped
to make on my own.
2025-01-13 15:04:27 +00:00
Mark Tozzi
e9f2d78923
Esql additional date format testing (#120000)
This wires up the randomized testing for DateFormat. Prior to this PR, none of the randomized testing was hitting the one parameter version of the function, so I wired that up as well. This required some compromises on the type signatures, see comments in line.less

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-01-13 14:11:52 +00:00
Aurélien FOUCRET
31f11c3c0c
[ES|QL] Enable KQL function as a tech preview (#119730) 2025-01-10 12:49:28 +01:00
Ievgen Degtiarenko
fd1be8ce6f
Hash functions (#118938)
This change adds md5, sha1 and sha256 hash functions.
2025-01-08 16:44:15 +01:00
Lisa Cawley
ba8beecdb0
[DOCS] More links to new API site (#119377) 2024-12-31 11:32:29 -08:00
Lisa Cawley
5e0fbef58b
[DOCS] Link to new API site (#119038)
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-12-30 16:52:16 +00:00
Carlos Delgado
6ee641bdfd
ESQL - Update WHERE command docs with MATCH and full text functions examples (#118987) 2024-12-19 16:44:53 +01:00
Bogdan Pintea
bc3b629d8d
ESQL: Docs: add example of date bucketing with offset (#116680)
Add an example of how to create date histograms with an offset.

Fixes #114167
2024-12-18 17:12:14 +01:00
Ievgen Degtiarenko
7cf28a910e
ESQL Add esql hash function (#117989)
This change introduces esql hash(alg, input) function that relies on the Java MessageDigest to compute the hash.
2024-12-18 09:56:42 +01:00
Mark Tozzi
1e26791515
Esql bucket function for date nanos (#118474)
This adds support for running the bucket function over a date nanos field. Code wise, this just delegates to DateTrunc, which already supports date nanos, so most of the PR is just tests and the auto-generated docs.

Resolves #118031
2024-12-13 09:25:52 -05:00
Gal Lalouche
2be4cd983f
ESQL: Support ST_EXTENT_AGG (#117451)
This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT.

This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values.
We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior.
Fixes #104659.
2024-12-13 12:41:24 +02:00
Alexander Spies
140d88c59a
ESQL: Dependency check for binary plans (#118326)
Make the dependency checker for query plans take into account binary plans and make sure that fields required from the left hand side are actually obtained from there (and analogously for the right).
2024-12-13 11:38:53 +01:00
Carlos Delgado
eb59b989ef
ESQL: Expand type compatibility for match function and operator (#117555) 2024-12-09 19:56:10 +01:00
kanoshiou
67ee03411b
ESQL: Enable async get to support formatting (#111104)
I've updated the listener for GET /_query/async/{id} to EsqlResponseListener, so it now accepts parameters (delimiter, drop_null_columns and format) like the POST /_query API. Additionally, I have added tests to verify the correctness of the code.

You can now set the format in the request parameters to specify the return style.

Closes #110926
2024-12-09 13:08:48 +01:00
Mark Tozzi
7cd17d2185
Esql compare nanos and millis (#118027)
Resolves #116281

Introduces support for comparing millisecond dates with nanosecond dates, without the need for casting. Millisecond dates outside of the nanosecond date range are handled correctly.
2024-12-06 09:17:32 -05:00
Tommaso Teofili
91605860ee
Term query for ES|QL (#117359)
This commit adds a `term` function for ES|QL to run `TermQueries`.

For example:
FROM test | WHERE term(content, "dog")
2024-12-06 07:42:48 +00:00
Craig Taverner
c7e985c3b6
Support ST_ENVELOPE and related ST_XMIN, etc. (#116964)
Support ST_ENVELOPE and related ST_XMIN, etc.

Based on the PostGIS equivalents:

https://postgis.net/docs/ST_Envelope.html
https://postgis.net/docs/ST_XMin.html
https://postgis.net/docs/ST_XMax.html
https://postgis.net/docs/ST_YMin.html
https://postgis.net/docs/ST_YMax.html
2024-12-04 12:20:47 +01:00
Jan Kuipers
31508f00a1
Document ES|QL categorize limitations (#117892)
* Document ES|QL categorize limitations

* Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Categorize.java

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>

---------

Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
2024-12-04 09:53:21 +01:00
Marci W
97a626b5ea
Remove ccs banner (#117844) 2024-12-02 14:46:41 -05:00
Mark Tozzi
913e0fbca8
ESQL Date Nanos Addition and Subtraction (#116839)
Resolves #109995

This adds support and tests for addition and subtraction of date nanos with periods and durations. It does not include support for date_diff, which is a separate ticket (#109999). The bulk of the PR is testing, the actual date math is all handled by library functions.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-12-02 14:08:07 -05:00
Jan Kuipers
ddc8b959ee
ES|QL categorize docs (#117827)
* Move ES|QL categorize out of snapshot functions

* Categorize docs

* Add experimental + fix docs

* Add experimental + fix docs
2024-12-02 16:41:02 +01:00
Nik Everett
9022cccba7
ESQL: CATEGORIZE as a BlockHash (#114317)
Re-implement `CATEGORIZE` in a way that works for multi-node clusters.

This requires that data is first categorized on each data node in a first pass, then the categorizers from each data node are merged on the coordinator node and previously categorized rows are re-categorized.

BlockHashes, used in HashAggregations, already work in a very similar way. E.g. for queries like `... | STATS ... BY field1, field2` they map values for `field1` and `field2` to unique integer ids that are then passed to the actual aggregate functions to identify which "bucket" a row belongs to. When passed from the data nodes to the coordinator, the BlockHashes are also merged to obtain unique ids for every value in `field1, field2` that is seen on the coordinator (not only on the local data nodes).

Therefore, we re-implement `CATEGORIZE` as a special BlockHash.

To choose the correct BlockHash when a query plan is mapped to physical operations, the `AggregateExec` query plan node needs to know that we will be categorizing the field `message` in a query containing `... | STATS ... BY c = CATEGORIZE(message)`. For this reason, _we do not extract the expression_ `c = CATEGORIZE(message)` into an `EVAL` node, in contrast to e.g. `STATS ... BY b = BUCKET(field, 10)`. The expression `c = CATEGORIZE(message)` simply remains inside the `AggregateExec`'s groupings.

**Important limitation:** For now, to use `CATEGORIZE` in a `STATS` command, there can be only 1 grouping (the `CATEGORIZE`) overall.
2024-11-27 17:44:55 +01:00
Craig Taverner
8c22fc479f
Make spatial search functions not preview (#117489) 2024-11-25 17:04:48 +01:00
florent-leborgne
fa9f2bff0e
Docs for starred esql queries in Kibana (#117468) 2024-11-25 15:13:23 +01:00
Aurélien FOUCRET
ff58d891a1
ES|QL kql function. (#116764) 2024-11-25 14:22:11 +01:00
Larisa Motova
7e801e0410
[ES|QL] Add a standard deviation function (#116531)
Uses Welford's online algorithm, as well as the parallel version, to
calculate standard deviation.
2024-11-22 12:33:46 -10:00
Nik Everett
4ecc7518ef
ESQL: Add docs for MV_PERCENTILE (#117377)
We built this a while back. Let's document it.
2024-11-23 06:41:18 +11:00
Nik Everett
893dfd3c9a
ESQL: Make WEIGHTED_AVG not preview (#117356)
It's not PREVIEW.
2024-11-22 16:28:06 +00:00
Bogdan Pintea
1fe3ed1e85
Add docs for aggs filtering (#116681)
Add documentation for aggs filtering (the WHERE in STATS command).

Fixes: #115083
2024-11-22 13:26:30 +01:00
Luigi Dell'Aquila
a1247b3e60
ES|QL: fix validation of SORT by aggregate functions (#117316) 2024-11-22 12:12:09 +01:00
Carlos Delgado
ea4b41fca8
ESQL - match operator included in non-snapshot builds (#116819) 2024-11-21 07:45:22 +01:00
Mark Tozzi
c3f73d0319
Esql Enable Date Nanos (#117080)
This enables date nanos support as tech preview. Basic operations, like reading values, binary comparisons, and functions that don't care about type should work, but some functions are not yet supported. Most notably, Bucket is not yet supported, although Date_Trunc is and can be used for grouping. See the docs for the full list of limitations.

relates to #109352
2024-11-20 09:31:01 -05:00
Costin Leau
bc785f5ca1
Esql/lookup join grammar (#116515)
First PR for adding LOOKUP JOIN in ESQL.
Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details).

Co-authored-by: Nik Everett <nik9000@users.noreply.github.com>
2024-11-19 17:52:24 -08:00
Craig Taverner
f3cd48209e
Added stricter range type checks and runtime warnings for ENRICH (#115091)
It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: https://github.com/elastic/elasticsearch/issues/107357

This error is thrown at runtime and contradicts the ES|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier.

However runtime parsing of KEYWORD fields has been a feature of ES|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index.

In order to not create a backwards compatibility problem, we have compromised with the following:

* Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD
* For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning

Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).
2024-11-19 16:34:21 +01:00
Fang Xing
d33bff6468
[ES|QL][DOCS] Add docs for date_period and time_duration (#116368)
* add docs for date_period and time_duration
2024-11-19 07:48:35 -05:00
Bogdan Pintea
b5addca40a
ESQL: Docs: COUNT: add an explanation to the use of the 3VL (#116684)
Add an explanation of why `... OR NULL` is needed with `COUNT(...)`.

Fixes: #99954
2024-11-19 10:37:47 +01:00
Gal Lalouche
c45977a5fd
[ESQL] Update docs format (missing space before '=') (#116808) 2024-11-14 16:05:28 +02:00
Gal Lalouche
591cd591ad
[ES|QL] Update length docs (#116734)
ESQL Update length docs (#116734)
2024-11-14 13:14:43 +02:00
Fang Xing
b37a829efa
[ES|QL] Implicit casting string literal to intervals in EsqlScalarFunction and GroupingFunction (#115814)
* implicit casting from string literals to datetime intervals
2024-11-13 18:25:06 -05:00
Gal Lalouche
b4898c959f
[ES|QL] Add support BYTE_LENGTH scalar function (#116591)
Also added documentation and examples for BIT_LENGTH and LENGTH regarding unicode.
2024-11-13 00:42:19 +02:00
Jack Pan
0914679225
Remove trailing semicolon in REPEAT function example (#116218)
Remove trailing semicolon in REPEAT function example (Closes #116156 )
2024-11-11 11:10:05 +01:00
florent-leborgne
ba65914285
refresh ESQL kibana docs (#116441) 2024-11-08 10:39:18 +01:00
Tim Grein
81fd1de76b
Add ES|QL bit_length function (#115792) 2024-11-07 08:51:26 +01:00
Mark Tozzi
744eb507f6
[ESQL] clean up date trunc tests (#116111)
While working on #110008 I discovered that the Date Trunc tests were only running in folding mode, because the interval types are marked as not representable. The correct way to test this is to set the forceLiteral flag for those fields, which will (as the name suggests) force them to be literals even in non-folding tests.

Doing that turned up errors in the evaluatorToString tests, which I fixed. There are two big changes here. First, the second parameter to the evaluator is a Rounding instance, not the actual interval. Since Rounding includes some information about the specific rounding in the toString results, I am just using a starts with matcher to validate the majority of the string, rather than trying to reconstruct the expected rounding string. Second, passing in a literal null for the interval parameter folds the whole expression to null, and thus a completely different toString. I added a clause in AnyNullIsNull to account for this.

While I was in there, I moved some specific test cases to a different file. I know moving code is something we're trying to minimize right now, but this seemed worth it. The tests in question do not depend on the parameters of the test case, but all methods in the class get run for every set of parameters. This was causing these tests to be run many times with the same values, which bloats our test run time and test count. Moving them to a distinct class means they'll only be executed once per test run. I feel like this benefit outweighs the cost of git history complexity.
2024-11-04 15:32:53 +01:00
Craig Taverner
535ad91bdb
Refine ESQL limitations (full-text, TEXT fields, unassigned indexes) (#116098)
* Refine ESQL limitations (full-text, TEXT fields, unassigned indexes)

This PR refactors a section of the ES|QL Limitations page to:
* Refactor both full-text and text-behaves-as-keyword sections to better reflect the new behaviour (the old text implies that no full-text search of any kind exists anywhere, which immediately contradicts the statements directly above it).
* Update text-behaves-as-keyword to include my recent work on making all functions return KEYWORD instead of TEXT or SEMANTIC_TEXT
* Add a section on multi-index querying to cover two limitations (union types and unassigned indexes).

* Fix full-text-search examples
2024-11-01 17:03:49 +01:00
Chris Hegarty
2275894ca0
ES|QL Add full-text search to the functions docs page (#116024)
Now that the match and qstr functions are Tech Previewing, we should add them to the top-level functions doc page.

Co-authored-by: Craig Taverner <craig@amanzi.com>
2024-11-01 12:04:55 +00:00
Craig Taverner
c9c1765986
Remove duplicate 'the the' (#116023)
There were many places where `the the` was typed, in comments, docs and messages. All were incorrect and replaces with a single `the`
2024-10-31 19:14:58 +01:00
Tim Grein
6a3a447f18
Remove double "the" from median absolute deviation description (#115826) 2024-10-31 15:25:20 +01:00