elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-07-01 02:43:45 -04:00

Author	SHA1	Message	Date
Luigi Dell'Aquila	bffaabb6f5	ES\|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320 )	2024-10-24 09:19:46 +02:00
Mark Tozzi	82f2fb554e	fix test to not run when the FF is disabled (#114260 ) Fixes #113661 Don't run the tests when the feature is disabled. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-22 13:41:17 -04:00
Carlos Delgado	7ad1a0c39c	Remove snapshot build restriction for match and qstr functions (#114482 )	2024-10-15 08:07:07 +02:00
Carlos Delgado	a262eb6dbd	Add ESQL match function (#113374 )	2024-10-14 07:31:55 +02:00
Larisa Motova	2155f1bed5	[ES\|QL] Add hypot function (#114382 ) Adds a hypotenuse function	2024-10-11 09:33:45 -10:00
Nik Everett	ebe3c0f10d	ESQL: Document MV_SLICE limitations (#114162 ) `MV_SLICE` is useful, but loading values from lucene frequently sorts them so `MV_SLICE` is not as useful as you think it is. It's mostly for after, say, a `SPLIT`. This documents that and adds a link to the section on multivalues. It also moves similar docs to a separate paragraph in the docs for easier reading.	2024-10-09 05:04:36 +11:00
Drew Tate	147461f5b1	[ES\|QL] add reverse function (#113297 ) Adds a REVERSE string function	2024-10-04 12:57:37 -05:00
Mark Tozzi	60ae7463a8	[ESQL] Support datetime data type in Least and Greatest functions (#113961 ) While working on Date Nanos, I noticed that Least and Greatest didn't have support for datetime. This PR corrects that and adds tests for it. It seems to me that resolveType() is doing the wrong thing for these functions, as it accepts types that then do not have evaluator mappings, but refactoring that seems out of scope right now. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-04 09:06:39 -04:00
Luigi Dell'Aquila	9a652829a3	ES\|QL: provide snapshot_only info for functions (Kibana) (#113544 )	2024-10-02 09:27:05 +02:00
Mark Tozzi	122e728820	[ESQL] Add TO_DATE_NANOS conversion function (#112150 ) Resolves #111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-09-26 12:03:01 -04:00
Luigi Dell'Aquila	7ba26892f3	ES\|QL: make CSV date tests more friendly for Java 23 (#113472 ) Following [this suggestion](https://github.com/elastic/elasticsearch/pull/113376#issuecomment-2370817089), switching date patterns from week years to calendar years, that have the same behavior in java <=22 and java 23.	2024-09-25 02:57:22 +10:00
Nik Everett	58021c3405	ESQL: TOP support for strings (#113183 ) Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes #109849	2024-09-24 03:00:18 +10:00
Pm Ching	d68f2fa4a6	fix a couple of docs typos (#112901 )	2024-09-20 18:34:24 +03:00
Bogdan Pintea	f7ff00f645	ESQL: Align year diffing to the rest of the units in DATE_DIFF: chronological (#113103 ) This will correct/switch "year" unit diffing from the current integer subtraction to a crono subtraction. Consequently, two dates are (at least) one year apart now if (at least) a full calendar year separates them. The previous implementation simply subtracted the year part of the dates. Note: this parts with ES SQL's implementation of the same function, which itself is aligned with MS SQL's implementation, which works equivalent to an integer subtraction. Fixes #112482.	2024-09-20 20:21:29 +10:00
Carlos Delgado	8d1b22e7bc	ESQL QSTR function (#112590 )	2024-09-19 16:34:42 +02:00
Carlos Delgado	838b5a860d	ESQL - generate docs for snapshot functions (#113080 )	2024-09-19 07:46:43 +02:00
Luigi Dell'Aquila	f7a0196b45	ES\|QL: Add 'preview' information to functions docs for Kibana (#112792 )	2024-09-12 16:49:55 +02:00
Fang Xing	e8569356ea	[ES\|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193 ) explicit cast to date_period and time_duration in arithmic operation	2024-09-09 14:56:43 -04:00
Nik Everett	ef3a5a1385	ESQL: Fix CASE when conditions are multivalued (#112401 ) When CASE hits a multivalued field it was previously either crashing on fold or evaluating it to the first value. Since booleans are loaded in sorted order from lucene that usually means `false`. This changes the behavior to line up with the rest of ESQL - now multivalued fields are treated as `false` with a warning. You might say "hey wait! multivalued fields usually become `null`, not `false`!". Yes, dear reader, you are right. Very right. But! `CASE`'s contract is to immediatly convert its values into `true` or `false` using the standard boolean tri-valued logic. So `null` just become `false` immediately. This is how PostgreSQL, MySQL, and SQLite behave: ``` > SELECT CASE WHEN null THEN 1 ELSE 2 END; 2 ``` They turn that `null` into a false. And we're right there with them. Except, of course, that we're turning `[false, false]` and the like into `null` first. See!? It's consitent. Consistently confusing, but sane at least. The warning message just says "treating multivalued field as false" rather than explaining all of that. This also fixes up a few of CASE's docs which I noticed were kind of busted while working on CASE. I think the docs generation is having a lot of trouble with CASE so I've manually hacked the right thing into place, but we should figure out a better solution eventually. Closes #112359	2024-09-10 02:32:19 +10:00
Nik Everett	cf98240950	Update docs from code	2024-09-09 11:28:31 -04:00
Chris Berkhout	fbaeb1ee61	[ESQL] Add `SPACE` function (#112350 ) Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).	2024-09-09 21:41:35 +10:00
Iván Cea Fontenla	fc2760cfd4	ESQL: mv_median_absolute_deviation function (#112055 ) - Added mv_median_absolute_deviation function - Added possibility of having a fixed param in Multivalue "ascending" functions - Add surrogate to MedianAbsoluteDeviation ### Calculations used to avoid overflows First, a quick recap of how the MAD is calculated: 1. Sort values, and get the median 2. Calculate the difference between each value with the median (`abs(median - value)`) 3. Sort the differences, and get their median Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`. To solve this, some types are up-casted as follow: - Int: Stored as longs, simple approach - Long: Stored as longs, but switched to unsigned long representation when calculating the differences - Unsigned long: No effect; the resulting range is the same - Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort Closes https://github.com/elastic/elasticsearch/issues/111590	2024-09-09 10:04:25 +02:00
Ioana Tagirta	90f1fb667c	[ES\|QL] Document return value for locate in case substring is not found (#112202 ) * Document return value for locate in case substring is not found * Add note that string positions start from 1	2024-09-03 12:46:20 +02:00
Nik Everett	d8e705d5da	ESQL: Document `date` instead of `datetime` (#111985 ) This changes the generated types tables in the docs to say `date` instead of `datetime`. That's the name of the field in Elasticsearch so it's a lot less confusing to call it that. Closes #111650	2024-08-21 01:59:13 +10:00
Iván Cea Fontenla	65ce50c60a	ESQL: Added mv_percentile function (#111749 ) - Added the `mv_percentile(values, percentile)` function - Used as a surrogate in the `percentile(column, percentile)` aggregation - Updated docs to specify that the surrogate _should_ be implemented if possible The same way as mv_median does, this yields exact results (Ignoring double operations error). For that, some decisions were made, specially in the long evaluator (Check the comments in context in `MvPercentile.java`) Closes https://github.com/elastic/elasticsearch/issues/111591	2024-08-20 15:29:19 +02:00
Iván Cea Fontenla	e3f378ebd2	ESQL: Strings support for MAX and MIN aggregations (#111544 ) Support Version, Keyword and Text in Max an Min aggregations. The current implementation of both max and min does: For non-grouping: - Store a BytesRef - When there's a max/min, copy it to the internal array. Grow it if needed For grouping: - Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string) - Each BytesRef stores their own array, which will be grown as needed to copy the new max/min Some notes: - It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again - It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker? Part of https://github.com/elastic/elasticsearch/issues/110346	2024-08-20 15:24:55 +02:00
Bogdan Pintea	dd49c33479	ESQL: BUCKET: allow numerical spans as whole numbers (#111874 ) This laxes the check on numerical spans to allow them be specified as whole numbers. So far it was required that they be provided as a double. This also expands the tests for date ranges to include string types. Resolves #109340, resolves #104646, resolves #105375.	2024-08-20 13:40:59 +02:00
Nik Everett	dc24003540	ESQL: Profile more timing information (#111855 ) This profiles additional timing information for each individual driver. To the results from `profile` it adds the start and stop time for each driver. That was already in the task status. To the profile and task status it also adds the number of times the driver slept and some more detailed history about a few of those times. Explanation time! The compute engine splits work into some number of `Drivers` per node. Each `Driver` is a single threaded entity - it runs on a thread for a while then does one of three things: 1. Finishes 2. Goes async because one of it's `Operator`s has gone async 3. Yields the thread pool because it has run for too long This PR measures the second two. At this point only three operators can go async: * ENRICH * Reading from an empty exchange * Writing to a full exchange We're quite interested the these sleeps at the moment because they think they may be slowing things down. Here's what it looks like when a driver goes async because it wants to read from an empty exchange: ``` ... the rest of the profile ... "sleeps" : { "counts" : { "exchange empty" : 2 }, "first" : [ { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:57.943Z", "sleep_millis" : 1723578357943, "wake" : "2024-08-13T19:45:58.159Z", "wake_millis" : 1723578358159 }, { "reason" : "exchange empty", "sleep" : "2024-08-13T19:45:58.164Z", "sleep_millis" : 1723578358164, "wake" : "2024-08-13T19:45:58.165Z", "wake_millis" : 1723578358165 } ], "last": [same as above] ``` Every time the driver goes async we count it in the `counts` map - grouped by the reason the driver slept. We also record the sleep and wake times for the first and last ten times the driver sleeps. In this case it only slept twice, so the `first` and `last` ten times is the same array. This should give us a good sense about why drivers sleep while using a limited amount of memory per driver.	2024-08-20 07:29:01 +10:00
Nik Everett	2e22e73cdf	ESQL: Remove date_nanos from generated docs (#111884 ) This removes date_nanos from the docs generated for all of our functions because it's still under construction. I've done so as a sort of one-off hack. My plan is to replace this in a follow up change with a centralized registry of "under construction" data types. So we can make new data types under a feature flag more easilly in the future. We're going to be doing that a fair bit.	2024-08-15 00:22:25 +10:00
Mark Tozzi	67c69bb224	[ESQL] Date nanos type (#110205 ) Resolves #109987 Add initial support for the date nanos data type. At this point, almost no functions are supported, including casting. This just covers loading and returning the values. Like millisecond dates, nanosecond dates are internally modeled as long values, so we don't need a new block type to support them. This has very patchwork function support. Ideally, I don't think I would have added any function support yet, but the five MV functions you see here declare that they accept any non-spatial type, and will error tests if not wired up for new types. There are other functions, like Values, which also claim to support all non-spatial types, but don't currently enforce that in testing, so I didn't add them yet. Finally, there are functions like == which should work for all types, but are implemented as a specific list. I've left those for a follow up ticket as well.	2024-08-07 13:17:26 -04:00
Nik Everett	cc294a1a0f	ESQL: Finish migration of null testing (#111563 ) This finishes the migration of `null` testing from a test method, namely `testSimpleWithNulls`. It migrates it to `anyNullIsNull` and hand rolled null cases.	2024-08-05 12:28:15 -04:00
Fang Xing	d87254369a	type = operator in kibana operator definition (#111436 )	2024-07-31 11:07:18 -04:00
Pablo Machado	f79c62157d	ESQL: Add `MV_PSERIES_WEIGHTED_SUM` for score calculations used by security solution (#109017 ) * Create MV_RIEMANN_ZETA scalar multivalue function --------- Co-authored-by: Nik Everett <nik9000@gmail.com>	2024-07-31 12:08:28 +02:00
Iván Cea Fontenla	bc69827e1e	ESQL: WEIGHTED_AVG aggregation tests and docs (#111449 )	2024-07-31 00:42:23 +10:00
Iván Cea Fontenla	735d80dffd	ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409 )	2024-07-30 03:07:15 +10:00
Iván Cea Fontenla	826d49448b	ESQL: Added Median and MedianAbsoluteDeviation aggregations tests and kibana docs (#111231 )	2024-07-26 22:11:01 +10:00
Iván Cea Fontenla	595d907f61	ESQL: SpatialCentroid aggregation tests and docs (#111236 )	2024-07-26 10:41:18 +02:00
Fang Xing	66dd2687d5	[ES\|QL] Generate docs for unregistered esql functions from annotations (#108749 ) * render docs for operators	2024-07-22 14:58:17 -04:00
Iván Cea Fontenla	195b916e2b	ESQL: TOP aggregation IP support (#111105 ) Added IP support to TOP() aggregation. Adapted a bit the stringtemplates organization for esql/compute to (also?) work with specific datatypes. Right now it may be a bit messy, but we need the specific support for cases like this.	2024-07-22 22:35:48 +10:00
Iván Cea Fontenla	101775b93d	Added Sum aggregation tests and docs (#110984 ) - Added SUM() agg tests (Which autogenerates docs) - Converted non-finite doubles to nulls in aggregator The complete set of tests depends on https://github.com/elastic/elasticsearch/issues/110437, as commented in code. After completion, the test can be uncommented and everything should work fine	2024-07-22 21:43:58 +10:00
Iván Cea Fontenla	96e1b15b9d	ESQL: Support IP fields in MAX and MIN aggregations (#110921 ) - Support IP in MAX() and MIN() - Used a custom IpArrayState for it, as it's quite different from the `X-ArrayState.java.st` generated ones - Add IP test cases for aggregation tests	2024-07-19 23:23:13 +10:00
Iván Cea Fontenla	0e68117935	Added Percentile aggregation tests and Kibana docs (#111050 ) - Added Percentile aggregation tests and autogen docs - Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.	2024-07-19 14:28:11 +02:00
Carlos Delgado	453b82706d	Add the EXP ES\|QL function (#110879 )	2024-07-16 16:36:01 +02:00
Iván Cea Fontenla	43a3af66e8	ESQL: Add boolean support to TOP aggregation (#110718 ) - Added a custom implementation of BooleanBucketedSort to keep the top booleans - Added boolean aggregator to TOP - Added tests (Boolean aggregator tests, Top tests for boolean, and added boolean fields to CSV cases)	2024-07-16 03:14:29 +10:00
Nik Everett	9f001169c6	ESQL: Document the pattern to count TRUE (#110820 ) This adds an example to the docs an example of counting the TRUE results of an expression. You do `COUNT(a > 0 OR NULL)`. That turns the `FALSE` into `NULL`. Which you need to do because `COUNT(false)` is `1` - because it's a value. But `COUNT(null)` is `0` - because it's the absence of values. We could like to make something more intuitive for this one day. But for now, this is what works.	2024-07-12 14:08:22 -04:00
Nik Everett	55532c8d6f	ESQL: All descriptions are a full sentence (#110791 ) This asserts that all functions have descriptions that are complete sentences.	2024-07-11 16:44:15 -04:00
Iván Cea Fontenla	2901711c46	ESQL: Add boolean support to Max and Min aggs (#110527 ) - Added support for Booleans on Max and Min - Added some helper methods to BitArray (`set(index, value)` and `fill(from, to, value)`). This way, the container is more similar to other BigArrays, and it's easier to work with Part of https://github.com/elastic/elasticsearch/issues/110346, as Max and Min are dependencies of Top.	2024-07-10 23:10:32 +10:00
Iván Cea Fontenla	5d3512fb33	ESQL: Fix Max doubles bug with negatives and add tests for Max and Min (#110586 ) `MAX()` currently doesn't work with doubles smaller than `Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest non-zero positive, not the smallest double). This PR adds tests for Max and Min, and fixes the bug (Detected by the tests). Also, as the tests now generate the docs, replaced the old docs with the generated ones, and updated the Max&Min examples.	2024-07-09 21:05:00 +10:00
Iván Cea Fontenla	38cd0b333e	ESQL: AVG aggregation tests and ignore complex surrogates (#110579 ) Some work around aggregation tests, with AVG as an example: - Added tests and autogenerated docs for AVG - As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests. The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)	2024-07-09 12:01:46 +02:00
Fang Xing	8abc8857f2	[ES\|QL] weighted_avg (#109993 ) * weighted_avg	2024-07-02 18:29:02 -04:00

1 2 3 4 5

218 commits