Commit graph

7 commits

Author SHA1 Message Date
Craig Taverner
a7d1bd8938
Refine .gitattributes to hide generated docs changes (#124742) 2025-03-13 15:32:50 +01:00
Nik Everett
dc4fa26174
Speed up COALESCE significantly (#120139)
```
                      before              after
     (operation)   Score   Error       Score   Error  Units
 coalesce_2_noop  75.949 ± 3.961  ->   0.010 ±  0.001 ns/op  99.9%
coalesce_2_eager  99.299 ± 6.959  ->   4.292 ±  0.227 ns/op  95.7%
 coalesce_2_lazy 113.118 ± 5.747  ->  26.746 ±  0.954 ns/op  76.4%
```

We tend to advise folks that "COALESCE is faster than CASE", but, as of
8.16.0/https://github.com/elastic/elasticsearch/pull/112295 that wasn't the true. I was working with someone a few
days ago to port a scripted_metric aggregation to ESQL and we saw
COALESCE taking ~60% of the time. That won't do.

The trouble is that CASE and COALESCE have to be *lazy*, meaning that
operations like:
```
COALESCE(a, 1 / b)
```
should never emit a warning if `a` is not `null`, even if `b` is `0`. In
8.16/https://github.com/elastic/elasticsearch/pull/112295 CASE grew an optimization where it could operate non-lazily
if it was flagged as "safe". This brings a similar optimization to
COALESCE, see it above as "case_2_eager", a 95.7% improvement.

It also brings and arguably more important optimization - entire-block
execution for COALESCE. The schort version is that, if the first
parameter of COALESCE returns no nulls we can return it without doing
anything lazily. There are a few more cases, but the upshot is that
COALESCE is pretty much *free* in cases where long strings of results
are `null` or not `null`. That's the `coalesce_2_noop` line.

Finally, when there mixed null and non-null values we were using a
single builder with some fairly inefficient paths. This specializes them
per type and skips some slow null-checking where possible. That's the
`coalesce_2_lazy` result, a more modest 76.4%.

NOTE: These %s of improvements on COALESCE itself, or COALESCE with some load-overhead operators like `+`. If COALESCE isn't taking a *ton* time in your query don't get particularly excited about this. It's fun though.

Closes #119953
2025-01-23 17:40:09 +00:00
Iván Cea Fontenla
2233349f76
ESQL: top_list aggregation (#109386)
Added `top_list(<field>, <limit>, <order>)` aggregation, that collect
top N values per bucket. Works with the same types as MAX/MIN.

- Added the aggregation function
- Added a template to generate the aggregators
- Added a template to generate the `<Type>BucketedSort` implementations per-type
  - This structure is based on the `BucketedSort` structure used on the original aggregations. It was modified to better fit the ESQL ecosystem (Blocks based, no docs...)

Also added a guide to create aggregations. Fixes
https://github.com/elastic/elasticsearch/issues/109213
2024-06-20 00:48:45 +10:00
Iván Cea Fontenla
f16f71e2a2
ESQL: Add ip_prefix function (#109070)
Added ESQL function to get the prefix of an IP. It works now with both
IPv4 and IPv6. For users planning to use it with mixed IPs, we may need
to add a function like "is_ipv4()" first.

**About the skipped test:** There's currently a "bug" in the
evaluators//functions that return null. Evaluators can't handle them.
We'll work on support for that in another PR. It affects other
functions, like `substring()`. In this function, however, it only
affects in "wrong" cases (Like an invalid prefix), so it has no impact.

Fixes https://github.com/elastic/elasticsearch/issues/99064
2024-05-29 10:23:45 -04:00
Nik Everett
e4cb2c9f6d
ESQL: Add parsing for a LOOKUP command (#109040)
This command will serve as a sort of "inline" enrich. This commit itself
is mostly antlr generated code and paranoid tests that the new `LOOKUP`
keyword doesn't clash with any variables named `lookup`.

I've also marked our ANTLR generated files as `linguist-generated` which
causes them to be hidden by default in github's UI. You can still click
a button to see them if you like. See
https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github
2024-05-28 13:32:30 -04:00
Rory Hunter
d6912ebd59
Assert no carriage returns in release notes test samples (#77238)
The expected output files for the generated changelogs should not contain carriage
returns (`\r`). Their presence was causing test failures on Windows. Fix by setting
the EOL character via `.gitattributes`
2021-09-07 20:45:23 +01:00
Paul Sanwald
3e7fccddaf
Add a CHANGELOG file for release notes. (#29450)
* Add a CHANGELOG file for 7.x release notes.

* update file to include 6.x

* remove confusing comment and small edit to section title

* moving CHANGELOG file under docs directory, as it pertains to release notes.
2018-04-18 07:42:05 -07:00