elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-28 17:34:17 -04:00

Author	SHA1	Message	Date
Carlos Delgado	ea4b41fca8	ESQL - match operator included in non-snapshot builds (#116819 )	2024-11-21 07:45:22 +01:00
Mark Tozzi	c3f73d0319	Esql Enable Date Nanos (#117080 ) This enables date nanos support as tech preview. Basic operations, like reading values, binary comparisons, and functions that don't care about type should work, but some functions are not yet supported. Most notably, Bucket is not yet supported, although Date_Trunc is and can be used for grouping. See the docs for the full list of limitations. relates to #109352	2024-11-20 09:31:01 -05:00
Costin Leau	bc785f5ca1	Esql/lookup join grammar (#116515 ) First PR for adding LOOKUP JOIN in ESQL. Introduces grammar and wires the main building blocks to execute a query; follow-ups are required (see #116208 for more details). Co-authored-by: Nik Everett <nik9000@users.noreply.github.com>	2024-11-19 17:52:24 -08:00
Craig Taverner	f3cd48209e	Added stricter range type checks and runtime warnings for ENRICH (#115091 ) It has been noted that strange or incorrect error messages are returned if the ENRICH command uses incompatible data types, for example a KEYWORD with value 'foo' using in an int_range match: https://github.com/elastic/elasticsearch/issues/107357 This error is thrown at runtime and contradicts the ES\|QL policy of only throwing errors at planning time, while at runtime we should instead set results to null and add a warning. However, we could make the planner stricter and block potentially mismatching types earlier. However runtime parsing of KEYWORD fields has been a feature of ES\|QL ENRICH since it's inception, in particular we even have tests asserting that KEYWORD fields containing parsable IP data can be joined to an ip_range ENRICH index. In order to not create a backwards compatibility problem, we have compromised with the following: * Strict range type checking at the planner time for incompatible range types, unless the incoming index field is KEYWORD * For KEYWORD fields, allow runtime parsing of the fields, but when parsing fails, set the result to null and add a warning Added extra tests to verify behaviour of match policies on non-keyword fields. They all behave as keywords (the enrich field is converted to keyword at policy execution time, and the input data is converted to keyword at lookup time).	2024-11-19 16:34:21 +01:00
Fang Xing	d33bff6468	[ES\|QL][DOCS] Add docs for date_period and time_duration (#116368 ) * add docs for date_period and time_duration	2024-11-19 07:48:35 -05:00
Bogdan Pintea	b5addca40a	ESQL: Docs: COUNT: add an explanation to the use of the 3VL (#116684 ) Add an explanation of why `... OR NULL` is needed with `COUNT(...)`. Fixes: #99954	2024-11-19 10:37:47 +01:00
Gal Lalouche	c45977a5fd	[ESQL] Update docs format (missing space before '=') (#116808 )	2024-11-14 16:05:28 +02:00
Gal Lalouche	591cd591ad	[ES\|QL] Update length docs (#116734 ) ESQL Update length docs (#116734)	2024-11-14 13:14:43 +02:00
Fang Xing	b37a829efa	[ES\|QL] Implicit casting string literal to intervals in EsqlScalarFunction and GroupingFunction (#115814 ) * implicit casting from string literals to datetime intervals	2024-11-13 18:25:06 -05:00
Gal Lalouche	b4898c959f	[ES\|QL] Add support BYTE_LENGTH scalar function (#116591 ) Also added documentation and examples for BIT_LENGTH and LENGTH regarding unicode.	2024-11-13 00:42:19 +02:00
Jack Pan	0914679225	Remove trailing semicolon in REPEAT function example (#116218 ) Remove trailing semicolon in REPEAT function example (Closes #116156 )	2024-11-11 11:10:05 +01:00
florent-leborgne	ba65914285	refresh ESQL kibana docs (#116441 )	2024-11-08 10:39:18 +01:00
Tim Grein	81fd1de76b	Add ES\|QL bit_length function (#115792 )	2024-11-07 08:51:26 +01:00
Mark Tozzi	744eb507f6	[ESQL] clean up date trunc tests (#116111 ) While working on #110008 I discovered that the Date Trunc tests were only running in folding mode, because the interval types are marked as not representable. The correct way to test this is to set the forceLiteral flag for those fields, which will (as the name suggests) force them to be literals even in non-folding tests. Doing that turned up errors in the evaluatorToString tests, which I fixed. There are two big changes here. First, the second parameter to the evaluator is a Rounding instance, not the actual interval. Since Rounding includes some information about the specific rounding in the toString results, I am just using a starts with matcher to validate the majority of the string, rather than trying to reconstruct the expected rounding string. Second, passing in a literal null for the interval parameter folds the whole expression to null, and thus a completely different toString. I added a clause in AnyNullIsNull to account for this. While I was in there, I moved some specific test cases to a different file. I know moving code is something we're trying to minimize right now, but this seemed worth it. The tests in question do not depend on the parameters of the test case, but all methods in the class get run for every set of parameters. This was causing these tests to be run many times with the same values, which bloats our test run time and test count. Moving them to a distinct class means they'll only be executed once per test run. I feel like this benefit outweighs the cost of git history complexity.	2024-11-04 15:32:53 +01:00
Craig Taverner	535ad91bdb	Refine ESQL limitations (full-text, TEXT fields, unassigned indexes) (#116098 ) * Refine ESQL limitations (full-text, TEXT fields, unassigned indexes) This PR refactors a section of the ES\|QL Limitations page to: * Refactor both full-text and text-behaves-as-keyword sections to better reflect the new behaviour (the old text implies that no full-text search of any kind exists anywhere, which immediately contradicts the statements directly above it). * Update text-behaves-as-keyword to include my recent work on making all functions return KEYWORD instead of TEXT or SEMANTIC_TEXT * Add a section on multi-index querying to cover two limitations (union types and unassigned indexes). * Fix full-text-search examples	2024-11-01 17:03:49 +01:00
Chris Hegarty	2275894ca0	ES\|QL Add full-text search to the functions docs page (#116024 ) Now that the match and qstr functions are Tech Previewing, we should add them to the top-level functions doc page. Co-authored-by: Craig Taverner <craig@amanzi.com>	2024-11-01 12:04:55 +00:00
Craig Taverner	c9c1765986	Remove duplicate 'the the' (#116023 ) There were many places where `the the` was typed, in comments, docs and messages. All were incorrect and replaces with a single `the`	2024-10-31 19:14:58 +01:00
Tim Grein	6a3a447f18	Remove double "the" from median absolute deviation description (#115826 )	2024-10-31 15:25:20 +01:00
Marci W	2b6828ddcd	Document ?_tstart and ?_tend in Kibana (#114965 ) * Document ?_tstart and ?_tend in Kibana * Edits: restructure, be clearer	2024-10-28 10:14:40 -04:00
Craig Taverner	3d307e0d78	Don't return TEXT type for functions that take TEXT (#114334 ) Always return `KEYWORD` for functions that previously returned `TEXT`, because any change to the value, no matter how small, is enough to render meaningless the original analyzer associated with the `TEXT` field value. In principle, if the attribute is no longer the original `FieldAttribute`, it can no longer claim to have the type `TEXT`. This has been done for all functions: conversion functions, aggregating functions, multi-value functions. There were several that already produced `KEYWORD` for `TEXT` input (eg. ToString, FromBase64 and ToBase64, MvZip, ToLower, ToUpper, DateFormat, Concat, Left, Repeat, Replace, Right, Split, Substring), but many others that incorrectly claimed to produce `TEXT`, while this was really a false claim. This PR makes that now strict, and includes changes to the functions' units tests to disallow the tests to expect any functions output to be `TEXT`. One side effect of this change is that methods that take multiple parameters that require all of them to have the same type, will now treat TEXT and KEYWORD the same. This was already the case for functions like `Concat`, but is now also the case for `Greatest`, `Least`, `Case`, `Coalesce` and `MvAppend`. An associated change is that the type casting operator `::text` has been entirely removed. It used to map onto the `ToString` function which returned type KEYWORD, and so `::text` really produced a `KEYWORD`, which is a lie, or at least a `bug`, which is now fixed. Should we ever wish to actually produce real `TEXT`, we might love the fact that this operator has been freed up for future use (although it seems likely that function will require parameters to specify the analyzer, so might never be an operator again). ### Backwards compatibility issues: This is a change that will fail BWC tests, since we have many tests that assert on TEXT output to functions. For this reason we needed to block two scenarios: * We used the capability `functions_never_emit_text` to prevent 7 csv-spec tests and 2 yaml tests from being run against older versions that still emit text. * We used `skipTest` to also block those two yaml tests from being run against the latest build, but using older yaml files downloaded (as far back as 8.14). In all cases the change observed in these tests was simply the results columns no longer having `text` type, and instead being `keyword`. --------- Co-authored-by: Luigi Dell'Aquila <luigi.dellaquila@gmail.com>	2024-10-25 10:09:53 +02:00
Luigi Dell'Aquila	bffaabb6f5	ES\|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (#115320 )	2024-10-24 09:19:46 +02:00
Mark Tozzi	82f2fb554e	fix test to not run when the FF is disabled (#114260 ) Fixes #113661 Don't run the tests when the feature is disabled. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-22 13:41:17 -04:00
Carlos Delgado	7ad1a0c39c	Remove snapshot build restriction for match and qstr functions (#114482 )	2024-10-15 08:07:07 +02:00
Kyle Thomas	ee74ce564f	[DOCS] ES\|QL: Adding a tip to the WHERE documentation (#114050 ) * Adding a tip to make null field behavior more apparent. * Update docs/reference/esql/processing-commands/where.asciidoc Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> * Update docs/reference/esql/processing-commands/where.asciidoc Rephrasing for clarity Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --------- Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com> Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2024-10-14 13:05:12 -05:00
Carlos Delgado	a262eb6dbd	Add ESQL match function (#113374 )	2024-10-14 07:31:55 +02:00
Larisa Motova	2155f1bed5	[ES\|QL] Add hypot function (#114382 ) Adds a hypotenuse function	2024-10-11 09:33:45 -10:00
Michael Peterson	fd9d7335c8	CCS metadata is opt-in in ESQL JSON responses (#114437 ) Since Kibana only needs CCS metadata in ESQL responses from certain well-defined locations, we are making CCS metadata opt-in. This feature is patterned after ESQL profiling, where you specify "profile": true in the ESQL body and if you asked for it will be present in the response always (it will be written to the .async-search index and you can’t turn it off in later async-search requests against this particular query ID) and if you didn’t ask for it at the beginning it will never be present (it will NOT be written to the .async-search index when it is persisted). The new option is "include_ccs_metadata": true/false.	2024-10-11 15:03:26 -04:00
Nik Everett	ebe3c0f10d	ESQL: Document MV_SLICE limitations (#114162 ) `MV_SLICE` is useful, but loading values from lucene frequently sorts them so `MV_SLICE` is not as useful as you think it is. It's mostly for after, say, a `SPLIT`. This documents that and adds a link to the section on multivalues. It also moves similar docs to a separate paragraph in the docs for easier reading.	2024-10-09 05:04:36 +11:00
Nik Everett	f633148d10	Docs: ESQL doesn't preserve `null`s in a list (#114335 ) The doc values don't preserve `null`s in a list so ESQL doesn't either. Closes #114324	2024-10-09 03:17:56 +11:00
Drew Tate	147461f5b1	[ES\|QL] add reverse function (#113297 ) Adds a REVERSE string function	2024-10-04 12:57:37 -05:00
Mark Tozzi	60ae7463a8	[ESQL] Support datetime data type in Least and Greatest functions (#113961 ) While working on Date Nanos, I noticed that Least and Greatest didn't have support for datetime. This PR corrects that and adds tests for it. It seems to me that resolveType() is doing the wrong thing for these functions, as it accepts types that then do not have evaluator mappings, but refactoring that seems out of scope right now. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-10-04 09:06:39 -04:00
Nik Everett	f4cb32991a	ESQL: change link to profile explanation (#114032 ) Let's use a vendor neutral link.	2024-10-04 01:30:03 +10:00
Nik Everett	fe6f8d4b37	ESQL: Mention EXPLAIN ANALYZE in profile docs (#114025 ) This mentions EXPLAIN ANALYZE and EXPLAIN PLAN in the docs for ESQL's `profile` option. Those are things that folks from PostgreSQL and Oracle are used to and might search for. And `profile` is the closest thing we have to them. EXPLAIN PLAN doesn't run the query - it just tells you what the plan is. ESQL's `profile` always runs the query. So that's different. But it's close! EXPLAIN ANALYZE does run the query. It's pretty much the same.	2024-10-03 10:37:34 -04:00
Luigi Dell'Aquila	9a652829a3	ES\|QL: provide snapshot_only info for functions (Kibana) (#113544 )	2024-10-02 09:27:05 +02:00
Michael Peterson	ddba47407d	Collect and display execution metadata for ES\|QL cross cluster searches (#112595 ) Enhance ES\|QL responses to include information about `took` time (search latency), shards, and clusters against which the query was executed. The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES\|QL. This PR adds the following features: - add overall `took` time to all ES\|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS. - add `_clusters` metadata to the final response for cross-cluster searches, for both async and sync search (see example below) - tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES\|QL processing - marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing) Out of scope for this PR: - honoring the `skip_unavailable` cluster setting - showing `_clusters` metadata in the async response while the search is still running - showing any shard failure messages (since any shard search failures in ES\|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the `failed` shard count is always 0 in ES\|QL `_clusters` section. Things changed with respect to behavior in `_search`: - the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts. - the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL response, since any shard failure causes the whole query to fail. Example output from ES\|QL CCS: ```es POST /_query { "query": "from blogs,remote2:bl,remote1:blogs\|\nkeep authors.first_name,publish_date\|\n limit 5" } ``` ```json { "took": 49, "columns": [ { "name": "authors.first_name", "type": "text" }, { "name": "publish_date", "type": "date" } ], "values": [ [ "Tammy", "2009-11-04T04:08:07.000Z" ], [ "Theresa", "2019-05-10T21:22:32.000Z" ], [ "Jason", "2021-11-23T00:57:30.000Z" ], [ "Craig", "2019-12-14T21:24:29.000Z" ], [ "Alexandra", "2013-02-15T18:13:24.000Z" ] ], "_clusters": { "total": 3, "successful": 2, "running": 0, "skipped": 1, "partial": 0, "failed": 0, "details": { "(local)": { "status": "successful", "indices": "blogs", "took": 43, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } }, "remote2": { "status": "skipped", // remote2 was offline when this query was run "indices": "remote2:bl", "took": 0, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 } }, "remote1": { "status": "successful", "indices": "remote1:blogs", "took": 47, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } } } } } ``` Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935	2024-09-30 16:03:39 -04:00
Mark Tozzi	122e728820	[ESQL] Add TO_DATE_NANOS conversion function (#112150 ) Resolves #111842 This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward. It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds. Making that assumption does, however, create some weirdness. Consider two comparisons: TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of. --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2024-09-26 12:03:01 -04:00
Luigi Dell'Aquila	7ba26892f3	ES\|QL: make CSV date tests more friendly for Java 23 (#113472 ) Following [this suggestion](https://github.com/elastic/elasticsearch/pull/113376#issuecomment-2370817089), switching date patterns from week years to calendar years, that have the same behavior in java <=22 and java 23.	2024-09-25 02:57:22 +10:00
Nik Everett	58021c3405	ESQL: TOP support for strings (#113183 ) Adds support to the `TOP` aggregation for `keyword` and `text` field types. Closes #109849	2024-09-24 03:00:18 +10:00
Pm Ching	d68f2fa4a6	fix a couple of docs typos (#112901 )	2024-09-20 18:34:24 +03:00
Bogdan Pintea	f7ff00f645	ESQL: Align year diffing to the rest of the units in DATE_DIFF: chronological (#113103 ) This will correct/switch "year" unit diffing from the current integer subtraction to a crono subtraction. Consequently, two dates are (at least) one year apart now if (at least) a full calendar year separates them. The previous implementation simply subtracted the year part of the dates. Note: this parts with ES SQL's implementation of the same function, which itself is aligned with MS SQL's implementation, which works equivalent to an integer subtraction. Fixes #112482.	2024-09-20 20:21:29 +10:00
Carlos Delgado	8d1b22e7bc	ESQL QSTR function (#112590 )	2024-09-19 16:34:42 +02:00
Carlos Delgado	838b5a860d	ESQL - generate docs for snapshot functions (#113080 )	2024-09-19 07:46:43 +02:00
Luigi Dell'Aquila	f7a0196b45	ES\|QL: Add 'preview' information to functions docs for Kibana (#112792 )	2024-09-12 16:49:55 +02:00
Fang Xing	e8569356ea	[ES\|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193 ) explicit cast to date_period and time_duration in arithmic operation	2024-09-09 14:56:43 -04:00
Nik Everett	ef3a5a1385	ESQL: Fix CASE when conditions are multivalued (#112401 ) When CASE hits a multivalued field it was previously either crashing on fold or evaluating it to the first value. Since booleans are loaded in sorted order from lucene that usually means `false`. This changes the behavior to line up with the rest of ESQL - now multivalued fields are treated as `false` with a warning. You might say "hey wait! multivalued fields usually become `null`, not `false`!". Yes, dear reader, you are right. Very right. But! `CASE`'s contract is to immediatly convert its values into `true` or `false` using the standard boolean tri-valued logic. So `null` just become `false` immediately. This is how PostgreSQL, MySQL, and SQLite behave: ``` > SELECT CASE WHEN null THEN 1 ELSE 2 END; 2 ``` They turn that `null` into a false. And we're right there with them. Except, of course, that we're turning `[false, false]` and the like into `null` first. See!? It's consitent. Consistently confusing, but sane at least. The warning message just says "treating multivalued field as false" rather than explaining all of that. This also fixes up a few of CASE's docs which I noticed were kind of busted while working on CASE. I think the docs generation is having a lot of trouble with CASE so I've manually hacked the right thing into place, but we should figure out a better solution eventually. Closes #112359	2024-09-10 02:32:19 +10:00
Nik Everett	cf98240950	Update docs from code	2024-09-09 11:28:31 -04:00
Chris Berkhout	fbaeb1ee61	[ESQL] Add `SPACE` function (#112350 ) Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).	2024-09-09 21:41:35 +10:00
Iván Cea Fontenla	fc2760cfd4	ESQL: mv_median_absolute_deviation function (#112055 ) - Added mv_median_absolute_deviation function - Added possibility of having a fixed param in Multivalue "ascending" functions - Add surrogate to MedianAbsoluteDeviation ### Calculations used to avoid overflows First, a quick recap of how the MAD is calculated: 1. Sort values, and get the median 2. Calculate the difference between each value with the median (`abs(median - value)`) 3. Sort the differences, and get their median Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`. To solve this, some types are up-casted as follow: - Int: Stored as longs, simple approach - Long: Stored as longs, but switched to unsigned long representation when calculating the differences - Unsigned long: No effect; the resulting range is the same - Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort Closes https://github.com/elastic/elasticsearch/issues/111590	2024-09-09 10:04:25 +02:00
Liam Thompson	04678e9a15	[DOCS][ESQL] Include bucket in agg functions list (#112513 )	2024-09-05 11:43:20 +02:00
Ioana Tagirta	90f1fb667c	[ES\|QL] Document return value for locate in case substring is not found (#112202 ) * Document return value for locate in case substring is not found * Add note that string positions start from 1	2024-09-03 12:46:20 +02:00

1 2 3 4 5 ...

383 commits