* Take match_phrase out of snapshot and make tech preview
* Update docs/changelog/128925.yaml
* PR feedback
* Adding regenerated test data
* Update docs/changelog/128925.yaml
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
* [CI] Auto commit changes from spotless
* Checkstyle
* Correct docs
* Hopefully fix docs build
* Found one more bad docs link - here's hoping this now fixes the doc build
* OMG bitten by - vs _
---------
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Aurélien FOUCRET <aurelien.foucret@gmail.com>
* Initial commit of match_phrase
* Add MatchPhraseQueryTests
* First pass at CSV specs
* Update docs/changelog/127661.yaml
* Refactor so MatchPhrase doesn't use all fulltext test cases, just text only
* Fix tests
* Add some CSV test cases
* Fix test
* Update changelog
* Update tests
* Comment out MATCH_PHRASE in search-functions Markdown
* Minor PR feedback
* PR feedback - refactor/consolidate code
* Add some more tests
* Fix some tests
* [CI] Auto commit changes from spotless
* Fix tests
* PR feedback - add tests, support boost and numeric data
* Revert "PR feedback - add tests, support boost and numeric data"
This reverts commit 4e7a699e3e.
* Apply testing/PR feedback outside numeric support only
* Regenerate docs
* Add negative test
* Update x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-phrase-function.csv-spec
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
* Update x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-phrase-function.csv-spec
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
* Update x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-phrase-function.csv-spec
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
* PR feedback
* Fix auto-commit error
* Regenerate docs
* Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/MatchPhrase.java
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
* Remove non text field types
* Fake test data
* Remove tests that no longer should pass without ip/date/version support
* Put real data in score tests now that I was able to engineer a failure
* Realized the scoring test might be flakey because how it was written, updated
* PR feedback
* PR feedback
* [CI] Auto commit changes from spotless
* Add check to MatchPhrase tests
* Fix merge errors
* [CI] Auto commit changes from spotless
* Test generated docs
* Add additional verifier tests
---------
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Added support for the three primary scalar grid functions:
* `ST_GEOHASH(geom, precision)`
* `ST_GEOTILE(geom, precision)`
* `ST_GEOHEX(geom, precision)`
As well as versions of these three that take an optional `geo_shape` boundary (must be a `BBOX` ie. `Rectangle`).
And also supporting conversion functions that convert the grid-id from long to string and back to long.
This work represents the core of the feature to support geo-grid aggregations in ES|QL.
Begins adding support for running "tagged queries" to the compute
engine. Here, it's just the `LuceneSourceOperator` because that's
useful and contained.
Example time! Say you are running:
```
FROM foo
| STATS MAX(v) BY ROUND_TO(g, 0, 100, 1000, 100000)
```
It's *often* faster to run this as four queries:
* The docs that round to `0`
* The docs that round to `100`
* The docs that round to `1000`
* The docs that round to `100000`
This creates an ESQL operator that can run these queries, one after the
other and attach those tags.
Aggs uses this trick and it's *way* faster when it can push down count
queries, but it's still faster when it pushes doc loading things. This
implementation in `LuceneSourceOperator` is quite similar to the doc
loading version in _search.
I don't have performance measurements yet because I haven't plugged this
into the language. In _search we call this `filter-by-filter` and enable
it when each group averages to more than 5000 documents and when there
isn't an `_doc_count` field. It's faster in those cases not to push. I
expect we'll be pretty similar.
Creates a `ROUND_TO` function that rounds it's input to one of the
provided values. Like so:
```
ROUND_TO(v, 0, 5000, 10000, 20000, 40000, 100000)
v | ROUND_TO
0 | 0
100 | 0
6000 | 5000
45001 | 40000
999999 | 100000
```
For some sequences of numbers you could do this with the `/` operator -
but for arbitrary sequences of numbers you needed `CASE` which is quite
slow. And hard to read!
Rewriting the example above would look like:
```
CASE (
v < 5000, 0,
v < 10000, 5000,
v < 20000, 10000,
v < 40000, 20000,
v < 100000, 40000,
100000
)
```
Even better, this is *fast*:
```
(operation) Mode Cnt Score Error Units
round_to_4_via_case avgt 7 138.124 ± 0.738 ns/op
round_to_4 avgt 7 0.805 ± 0.011 ns/op
round_to_3 avgt 7 0.739 ± 0.011 ns/op
round_to_2 avgt 7 0.651 ± 0.009 ns/op
date_trunc avgt 7 2.425 ± 0.018 ns/op
```
I've included a comparison to `DATE_TRUNC` above because we should be
able to rewrite `DATE_TRUNC` into `ROUND_TO` when we know the date range
of the index. This doesn't do it now, but it should be possible.
Documents that the VALUES aggregate function returns unique documents
and points folks to the TOP aggregate function if they want to keep
dupes.
Closes#128091
---------
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Initial Kibana definition files for commands, currently only providing License information. We leave the license field out if it works with BASIC, so the only two files that actually have a license line are:
* CHANGE_POINT: PLATINUM
* RRF: ENTERPRISE
Output function signature license requirements to Kibana definition files, and also test that this matches the actual licensing behaviour of the functions.
ES|QL functions that enforce license checks do so with the `LicenseAware` interface. This does not expose what that functions license level is, but only whether the current active license will be sufficient for that function and its current signature (data types passed in as fields). Rather than add to this interface, we've made the license level information test-only information. This means if a function implements LicenseAware, it also needs to add a method to its test class to specify the license level for the signature being called. All functions will be tested for compliance, so failing to add this will result in test failure. Also if the test license level does not match the enforced license, that will also cause a failure.
* [DOCS][9.0] Improve ESQL reference docs IA
- reorganized es|ql reference documentation from flat list to logical hierarchy
- created three main sections: syntax reference , special fields, advanced operations
- renamed pages with more consistent and task-oriented titles
- aligned navigation titles with page content
- improved introductory text for each section
- used parallel phrasing for similar concepts
- clarified the relationship between reference docs and conceptual docs
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
- I trimmed the KEEP query in my final iteration in https://github.com/elastic/elasticsearch/pull/127215 but neglected to update the query itself, only the response. This fixes that so the query matches the response.
- 🚘 I also updated the table response to match other ESQL response tables
* [DOCS][ESQL] Cleanup and cross-reference LOOKUP JOIN reference and landing pages
**lookup-join.md (syntax reference)**:
- removed tip formatting for simpler direct link to landing page
- improved parameter formatting and descriptions
- fixed template variable from `{esql}` to `{{esql}}`
**esql-lookup-join.md (landing page)**:
- added "compare with enrich" section header
- simplified "how the command works" with clearer parameter explanation
- added code example in how it works section
- improved image alt text for accessibility
- organized example section with better context and SQL comparison
- added dropdown for sample tables to reduce visual clutter
- added "query" subheading for clearer organization
- included reference to additional examples in command reference
- removed excessive whitespace
* Improve example, add setup code
replaced abstract employee/language example with security monitoring use case
added setup instructions for creating test indices
included sample data loading via bulk api
new practical query example joining firewall logs with threat data
simplified results output showing threat detection scenario
added note about left-join behavior
improved code comments and structure
added required index.mode: lookup setting info
While this change appears subtle at this point, I am using this in a later PR that adds a lot more spatial functions, where nesting them in related groups like this looks much better.
The main impact of this is that the On this page navigator on the right panel of the docs will show the nesting
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
The current LOOKUP JOIN docs include examples that are not tested by the ES|QL tests, unlike most other examples in the documentation. This PR fixes that, changing two examples to use existing tests, and adding a new csv-spec file for the remaining four examples. These four are not required to show results, so the tests have empty data and do not require any results. This means we are testing only the syntax (parsing and semantic analysis), which is sufficient for the docs.
* ES|QL change point docs
* Move ES|QL change_point to tech preview
* Update docs/reference/query-languages/esql/esql-commands.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
* different example + add it the csv tests
* Restructure change_point docs to new structure
* Added generated test examples to change_point docs
* Fixed a few README.md text mistakes and added more details
* fix grammar
* License check
* regen parser
* Update docs/reference/query-languages/esql/_snippets/commands/layout/change_point.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
---------
Co-authored-by: Craig Taverner <craig@amanzi.com>
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```
This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```
We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.
Closes#125460
This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (`BUCKET`) and those that don't (like those that that are evaluated through an agg operator, `CATEGORIZE`).
Closes#124608
While the internal structure of the docs is already split into many (over 1000) sub-pages, the final display for the `Functions and Operators` page is a single giant page, making navigation harder. This PR splits it into separate pages, one for each group of similar functions and one for the operators. Twelve new pages.
This PR also bundles a few other related changes. In total what is done is:
* Split functions/operators into 12 pages, one for each group, maintaining the existing split of each function/operator into a snippet with dynamically generated examples
* Split esql-commands.md into source-commands.md and processing-commands.md, each of which is split into individual snippets, one for each command
* Each command snippet has it's examples split out into separate files, if they were examples that were dynamically generated in the older asciidoc system
* The examples files are overwritten by the ES|QL unit tests, using a similar mechanism to the examples written for functions and operators)
* Some additional refinements to the Kibana definition and markdown files (nicer operator headings, and display text)
Originally, `DATE_TRUNC` only supported 1-month and 3-month intervals for months, and 1-year interval for years, while arbitrary intervals were supported for weeks and days. This PR adds support for `DATE_TRUNC` with arbitrary month and year intervals.
Closes#120094
Hides some of the "extra" lines from ESQL's documentation. These lines
are required to make the documentation into nice tests which is
important to make sure the docs don't get out of date. But readers don't
need to see them.
In particular:
* Remove all links (both asciidoc and markdown) from the JSON definition files.
* This required a two phase edit, from asciidoc links to markdown, and then removal of markdown (replace with markdown text). This is because the asciidoc does not have the display text, and because some links were already markdown.
* Split predicates into is_null and is_not_null
* We kept the old combined version because the main docs still use that, so now we have both combined and separate versions, and Kibana can select the version they want.