elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-29 18:03:32 -04:00

Author	SHA1	Message	Date
Carlos Delgado	d91d51600e	ESQL - Add Match function options (#120360 )	2025-01-28 08:54:33 +01:00
Mark Tozzi	2482f06f3c	ESQL - docs for to_date_nanos (#120124 ) I forgot to link the ToDateNanos docs when I merged that function. --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>	2025-01-14 16:31:24 -05:00
Ioana Tagirta	f5ac68df95	ESQL: Document support for semantic_text field mapping (#120052 ) * Document support for semantic_text field mapping * Address review comments	2025-01-13 22:18:47 +01:00
Ievgen Degtiarenko	fd1be8ce6f	Hash functions (#118938 ) This change adds md5, sha1 and sha256 hash functions.	2025-01-08 16:44:15 +01:00
Carlos Delgado	6ee641bdfd	ESQL - Update WHERE command docs with MATCH and full text functions examples (#118987 )	2024-12-19 16:44:53 +01:00
Ievgen Degtiarenko	7cf28a910e	ESQL Add esql hash function (#117989 ) This change introduces esql hash(alg, input) function that relies on the Java MessageDigest to compute the hash.	2024-12-18 09:56:42 +01:00
Gal Lalouche	2be4cd983f	ESQL: Support ST_EXTENT_AGG (#117451 ) This PR adds support for ST_EXTENT_AGG aggregation, i.e., computing a bounding box over a set of points/shapes (Cartesian or geo). Note the difference between this aggregation and the already implemented scalar function ST_EXTENT. This isn't a very efficient implementation, and future PRs will attempt to read these extents directly from the doc values. We currently always use longitude wrapping, i.e., we may wrap around the dateline for a smaller bounding box. Future PRs will let the user control this behavior. Fixes #104659.	2024-12-13 12:41:24 +02:00
Alexander Spies	140d88c59a	ESQL: Dependency check for binary plans (#118326 ) Make the dependency checker for query plans take into account binary plans and make sure that fields required from the left hand side are actually obtained from there (and analogously for the right).	2024-12-13 11:38:53 +01:00
Carlos Delgado	eb59b989ef	ESQL: Expand type compatibility for match function and operator (#117555 )	2024-12-09 19:56:10 +01:00
Tommaso Teofili	91605860ee	Term query for ES\|QL (#117359 ) This commit adds a `term` function for ES\|QL to run `TermQueries`. For example: FROM test \| WHERE term(content, "dog")	2024-12-06 07:42:48 +00:00
Craig Taverner	c7e985c3b6	Support ST_ENVELOPE and related ST_XMIN, etc. (#116964 ) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html	2024-12-04 12:20:47 +01:00
Jan Kuipers	31508f00a1	Document ES\|QL categorize limitations (#117892 ) * Document ES\|QL categorize limitations * Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/grouping/Categorize.java Co-authored-by: Alexander Spies <alexander.spies@elastic.co> --------- Co-authored-by: Alexander Spies <alexander.spies@elastic.co>	2024-12-04 09:53:21 +01:00
Jan Kuipers	ddc8b959ee	ES\|QL categorize docs (#117827 ) * Move ES\|QL categorize out of snapshot functions * Categorize docs * Add experimental + fix docs * Add experimental + fix docs	2024-12-02 16:41:02 +01:00
Aurélien FOUCRET	ff58d891a1	ES\|QL kql function. (#116764 )	2024-11-25 14:22:11 +01:00
Larisa Motova	7e801e0410	[ES\|QL] Add a standard deviation function (#116531 ) Uses Welford's online algorithm, as well as the parallel version, to calculate standard deviation.	2024-11-22 12:33:46 -10:00
Gal Lalouche	b4898c959f	[ES\|QL] Add support BYTE_LENGTH scalar function (#116591 ) Also added documentation and examples for BIT_LENGTH and LENGTH regarding unicode.	2024-11-13 00:42:19 +02:00
Tim Grein	81fd1de76b	Add ES\|QL bit_length function (#115792 )	2024-11-07 08:51:26 +01:00
Carlos Delgado	a262eb6dbd	Add ESQL match function (#113374 )	2024-10-14 07:31:55 +02:00
Larisa Motova	2155f1bed5	[ES\|QL] Add hypot function (#114382 ) Adds a hypotenuse function	2024-10-11 09:33:45 -10:00
Nik Everett	ebe3c0f10d	ESQL: Document MV_SLICE limitations (#114162 ) `MV_SLICE` is useful, but loading values from lucene frequently sorts them so `MV_SLICE` is not as useful as you think it is. It's mostly for after, say, a `SPLIT`. This documents that and adds a link to the section on multivalues. It also moves similar docs to a separate paragraph in the docs for easier reading.	2024-10-09 05:04:36 +11:00
Drew Tate	147461f5b1	[ES\|QL] add reverse function (#113297 ) Adds a REVERSE string function	2024-10-04 12:57:37 -05:00
Mark Tozzi	122e728820	[ESQL] Add TO_DATE_NANOS conversion function (#112150 ) Resolves #111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-09-26 12:03:01 -04:00
Carlos Delgado	8d1b22e7bc	ESQL QSTR function (#112590 )	2024-09-19 16:34:42 +02:00
Carlos Delgado	838b5a860d	ESQL - generate docs for snapshot functions (#113080 )	2024-09-19 07:46:43 +02:00
Fang Xing	e8569356ea	[ES\|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193 ) explicit cast to date_period and time_duration in arithmic operation	2024-09-09 14:56:43 -04:00
Nik Everett	cf98240950	Update docs from code	2024-09-09 11:28:31 -04:00
Chris Berkhout	fbaeb1ee61	[ESQL] Add `SPACE` function (#112350 ) Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).	2024-09-09 21:41:35 +10:00
Iván Cea Fontenla	fc2760cfd4	ESQL: mv_median_absolute_deviation function (#112055 ) - Added mv_median_absolute_deviation function - Added possibility of having a fixed param in Multivalue "ascending" functions - Add surrogate to MedianAbsoluteDeviation ### Calculations used to avoid overflows First, a quick recap of how the MAD is calculated: 1. Sort values, and get the median 2. Calculate the difference between each value with the median (`abs(median - value)`) 3. Sort the differences, and get their median Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`. To solve this, some types are up-casted as follow: - Int: Stored as longs, simple approach - Long: Stored as longs, but switched to unsigned long representation when calculating the differences - Unsigned long: No effect; the resulting range is the same - Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort Closes https://github.com/elastic/elasticsearch/issues/111590	2024-09-09 10:04:25 +02:00
Ioana Tagirta	90f1fb667c	[ES\|QL] Document return value for locate in case substring is not found (#112202 ) * Document return value for locate in case substring is not found * Add note that string positions start from 1	2024-09-03 12:46:20 +02:00
Nik Everett	d8e705d5da	ESQL: Document `date` instead of `datetime` (#111985 ) This changes the generated types tables in the docs to say `date` instead of `datetime`. That's the name of the field in Elasticsearch so it's a lot less confusing to call it that. Closes #111650	2024-08-21 01:59:13 +10:00
Iván Cea Fontenla	65ce50c60a	ESQL: Added mv_percentile function (#111749 ) - Added the `mv_percentile(values, percentile)` function - Used as a surrogate in the `percentile(column, percentile)` aggregation - Updated docs to specify that the surrogate _should_ be implemented if possible The same way as mv_median does, this yields exact results (Ignoring double operations error). For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`) Closes https://github.com/elastic/elasticsearch/issues/111591	2024-08-20 15:29:19 +02:00
Nik Everett	dc24003540	ESQL: Profile more timing information (#111855 ) This profiles additional timing information for each individual driver. To the results from `profile` it adds the start and stop time for each driver. That was already in the task status. To the profile and task status it also adds the number of times the driver slept and some more detailed history about a few of those times. Explanation time! The compute engine splits work into some number of `Drivers` per node. Each `Driver` is a single threaded entity - it runs on a thread for a while then does one of three things: 1. Finishes 2. Goes async because one of it's `Operator`s has gone async 3. Yields the thread pool because it has run for too long This PR measures the second two. At this point only three operators can go async: * ENRICH * Reading from an empty exchange * Writing to a full exchange We're quite interested the these sleeps at the moment because they think they may be slowing things down. Here's what it looks like when a driver goes async because it wants to read from an empty exchange: ``` ... the rest of the profile ... "sleeps" : { "counts" : { "exchange empty" : 2 }, "first" : [ { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:57.943Z", "sleep_millis" : 1723578357943, "wake" : "2024-08-13T19:45:58.159Z", "wake_millis" : 1723578358159 }, { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:58.164Z", "sleep_millis" : 1723578358164, "wake" : "2024-08-13T19:45:58.165Z", "wake_millis" : 1723578358165 } ], "last": [same as above] ``` Every time the driver goes async we count it in the `counts` map - grouped by the reason the driver slept. We also record the sleep and wake times for the first and last ten times the driver sleeps. In this case it only slept twice, so the `first` and `last` ten times is the same array. This should give us a good sense about why drivers sleep while using a limited amount of memory per driver.	2024-08-20 07:29:01 +10:00
Pablo Machado	f79c62157d	ESQL: Add `MV_PSERIES_WEIGHTED_SUM` for score calculations used by security solution (#109017 ) * Create MV_RIEMANN_ZETA scalar multivalue function --------- Co-authored-by: Nik Everett <nik9000@gmail.com>	2024-07-31 12:08:28 +02:00
Iván Cea Fontenla	bc69827e1e	ESQL: WEIGHTED_AVG aggregation tests and docs (#111449 )	2024-07-31 00:42:23 +10:00
Iván Cea Fontenla	735d80dffd	ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409 )	2024-07-30 03:07:15 +10:00
Iván Cea Fontenla	826d49448b	ESQL: Added Median and MedianAbsoluteDeviation aggregations tests and kibana docs (#111231 )	2024-07-26 22:11:01 +10:00
Iván Cea Fontenla	595d907f61	ESQL: SpatialCentroid aggregation tests and docs (#111236 )	2024-07-26 10:41:18 +02:00
Iván Cea Fontenla	101775b93d	Added Sum aggregation tests and docs (#110984 ) - Added SUM() agg tests (Which autogenerates docs) - Converted non-finite doubles to nulls in aggregator The complete set of tests depends on https://github.com/elastic/elasticsearch/issues/110437, as commented in code. After completion, the test can be uncommented and everything should work fine	2024-07-22 21:43:58 +10:00
Iván Cea Fontenla	0e68117935	Added Percentile aggregation tests and Kibana docs (#111050 ) - Added Percentile aggregation tests and autogen docs - Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.	2024-07-19 14:28:11 +02:00
Carlos Delgado	453b82706d	Add the EXP ES\|QL function (#110879 )	2024-07-16 16:36:01 +02:00
Nik Everett	55532c8d6f	ESQL: All descriptions are a full sentence (#110791 ) This asserts that all functions have descriptions that are complete sentences.	2024-07-11 16:44:15 -04:00
Iván Cea Fontenla	2901711c46	ESQL: Add boolean support to Max and Min aggs (#110527 ) - Added support for Booleans on Max and Min - Added some helper methods to BitArray (`set(index, value)` and `fill(from, to, value)`). This way, the container is more similar to other BigArrays, and it's easier to work with Part of https://github.com/elastic/elasticsearch/issues/110346, as Max and Min are dependencies of Top.	2024-07-10 23:10:32 +10:00
Iván Cea Fontenla	5d3512fb33	ESQL: Fix Max doubles bug with negatives and add tests for Max and Min (#110586 ) `MAX()` currently doesn't work with doubles smaller than `Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest non-zero positive, not the smallest double). This PR adds tests for Max and Min, and fixes the bug (Detected by the tests). Also, as the tests now generate the docs, replaced the old docs with the generated ones, and updated the Max&Min examples.	2024-07-09 21:05:00 +10:00
Iván Cea Fontenla	38cd0b333e	ESQL: AVG aggregation tests and ignore complex surrogates (#110579 ) Some work around aggregation tests, with AVG as an example: - Added tests and autogenerated docs for AVG - As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests. The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)	2024-07-09 12:01:46 +02:00
Iván Cea Fontenla	c89ee3b648	ESQL: Renamed TopList to Top (#110347 ) Rename TopList aggregation to Top, after internal discussions	2024-07-02 03:52:24 +10:00
Iván Cea Fontenla	fc0313f429	ESQL: Add aggregations testing base and docs (#110042 ) - Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation. - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable - Added a `TopListTests` example - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_ - Adapted Kibana docs to use `type: "agg"` (@drewdaemon) The current tests are very basic: Consume a page, generate an output, all in Single aggregation mode (No intermediates, no grouping). More complex testing will be added in future PRs Initial PR of https://github.com/elastic/elasticsearch/issues/109917	2024-06-27 21:21:55 +10:00
Craig Taverner	536d614694	ES\|QL ST_DISTANCE Function (#108764 ) * WIP Started refactoring in preparation for ST_DISTANCE * Initial evaluators for ST_DISTANCE * Update docs/changelog/108764.yaml * Fix invalid changelog generated by CI * Register function and get unit tests working * Fixed failing meta function description tests, and refined descriptions * Added initial CsvTests and calculate Geo differently to Cartesian * Added more csv-spec tests and changed to arcDistance for accuracy * Added generated docs files * Link to generated docs * Fix examples tag for linking from generated docs * Skip wrapper function And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class * Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN * Code style * Added more tests, this time for sorting on distance * Fixes after rebase on main * The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases. The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests. We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types. * Fixed function count after rebasing on main * Update docs/changelog/108764.yaml * Added generated docs for ST_DWITHIN * Connect docs for ST_DWITHIN * Add back issue link * Remove support for ST_DWITHIN * Update docs/changelog/108764.yaml * Bring back link to issue in changelog * Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java Co-authored-by: Ignacio Vera <iverase@gmail.com> * Revert reformatting of function descriptions We should put this into a separate PR * Github merged commit with incorrectly formatted whitespace --------- Co-authored-by: Ignacio Vera <iverase@gmail.com>	2024-06-21 11:59:44 +02:00
Parker Timmins	bb3ff8e924	ESQL: add REPEAT string function (#109220 ) Add support for the string manipulation function REPEAT(string, number). This function concatenates the string argument with itself the specified number of times. If number is 0 an empty string is returned. If number is less than 0, null is returned and a warning is logged. If number is less than 0 and is a constant, the query will fail without executing.	2024-06-04 16:32:43 -05:00
Luigi Dell'Aquila	5f6e8f687b	ES\|QL: add MV_APPEND function (#107001 ) Adding `MV_APPEND(value1, value2)` function, that appends two values creating a single multi-value. If one or both the inputs are multi-values, the result is the concatenation of all the values, eg. ``` MV_APPEND([a, b], [c, d]) -> [a, b, c, d] ``` ~I think for this specific case it makes sense to consider `null` values as empty arrays, so that~ ~MV_APPEND(value, null) -> value~ ~It is pretty uncommon for ESQL (all the other functions, apart from `COALESCE`, short-circuit to `null` when one of the values is null), so let's discuss this behavior.~ [EDIT] considering the feedback from Andrei, I changed this logic and made it consistent with the other functions: now if one of the parameters is null, the function returns null	2024-06-05 03:42:29 +10:00
Iván Cea Fontenla	f16f71e2a2	ESQL: Add ip_prefix function (#109070 ) Added ESQL function to get the prefix of an IP. It works now with both IPv4 and IPv6. For users planning to use it with mixed IPs, we may need to add a function like "is_ipv4()" first. About the skipped test: There's currently a "bug" in the evaluators//functions that return null. Evaluators can't handle them. We'll work on support for that in another PR. It affects other functions, like `substring()`. In this function, however, it only affects in "wrong" cases (Like an invalid prefix), so it has no impact. Fixes https://github.com/elastic/elasticsearch/issues/99064	2024-05-29 10:23:45 -04:00

1 2

76 commits