The current LOOKUP JOIN docs include examples that are not tested by the ES|QL tests, unlike most other examples in the documentation. This PR fixes that, changing two examples to use existing tests, and adding a new csv-spec file for the remaining four examples. These four are not required to show results, so the tests have empty data and do not require any results. This means we are testing only the syntax (parsing and semantic analysis), which is sufficient for the docs.
* ES|QL change point docs
* Move ES|QL change_point to tech preview
* Update docs/reference/query-languages/esql/esql-commands.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
* different example + add it the csv tests
* Restructure change_point docs to new structure
* Added generated test examples to change_point docs
* Fixed a few README.md text mistakes and added more details
* fix grammar
* License check
* regen parser
* Update docs/reference/query-languages/esql/_snippets/commands/layout/change_point.md
Co-authored-by: Craig Taverner <craig@amanzi.com>
---------
Co-authored-by: Craig Taverner <craig@amanzi.com>
This fixes an issue where if a Painless getter method return type
didn't match a Java getter method return type we add a cast.
Currentlythis is adding an extraneous cast.
Closes: #70682
Modifies TO_IP so it can handle leading `0`s in ipv4s. Here's how it
works now:
```
ROW ip = TO_IP("192.168.0.1") // OK!
ROW ip = TO_IP("192.168.010.1") // Fails
```
This adds
```
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "octal"})
ROW ip = TO_IP("192.168.010.1", {"leading_zeros": "decimal"})
```
We do this because there isn't a consensus on how to parse leading zeros
in ipv4s. The standard unix tools like `ping` and `ftp` interpret
leading zeros as octal. Java's built in ip parsing interprets them as
decimal. Because folks are using this for security rules we need to
support all the choices.
Closes#125460
This adds the interface for search online prewarming with a default NOOP
implementation. This also hooks the interface in the SearchService after
we fork the query phase to the search thread pool.
I suspect the test resets/closes the reference manager
between the refresh and the retrieval of the segment
generation after the refresh.
By executing segmentGenerationAfterRefresh while
holding the engine reset lock we make sure there
are no concurrent engine resets meanwhile.
In the future, we should also ensure that
IndexShard.refresh() uses withEngine.
Closes#126628
Patchers transform specific classes in some "broken" dependencies to ensure they behave correctly (fixing a bug, disabling some undesired or dangerous behaviour, updating calls to deprecated or removed method overloads).
If we upgrade one of the dependencies we patch, we have a concerns that the patchers may not work against the classes in the new version.
This PR addresses this concern by introducing a check on the SHA256 digest of the class, to ensure we are operating on the same bytes the patcher was designed for; if the digest changes that means the class has been changed (e.g. for a dependency update). If that happens, we break the build process with a specific error, so we can double check that the patchers still work against the new classes.
Extracted from #126326
Relates to ES-11279
Today we rely on registering the channel after registering the task to
be cancelled to ensure that the task is cancelled even if the channel is
closed concurrently. However the client may already have processed a
cancellable request on the channel and therefore this mechanism doesn't
work. With this change we make sure not to register another task after
draining the registrations in order to cancel them.
Closes#88201
This splits the grouping functions in two: those that can be evaluated independently through the EVAL operator (`BUCKET`) and those that don't (like those that that are evaluated through an agg operator, `CATEGORIZE`).
Closes#124608
and the other way around.
This doesn't make much sense. However, if a data stream's index mode differs from the index mode of most recent backing index, then this can cause confusion. Typically, misconfiguration is a reason this can happen.
Related to #126637
Adds heuristics to pick an efficient partitioning strategy based on the
index and rewritten query. This speeds up some queries by throwing more
cores at the problem:
```
FROM test | STATS SUM(b)
Before: took: 31 CPU: 222.3%
After: took: 15 CPU: 806.9%
```
It also lowers the overhead of simpler queries by throwing less cores at
the problem when it won't really speed anything up:
```
FROM test
Before: took: 1 CPU: 48.5%
After: took: 1 CPU: 70.4%
```
We have had a `pragma` to control our data partitioning for a long time,
this just looks at the query to pick a partitioning scheme. The
partitioning options:
* `shard`: use one core per shard
* `segment`: use one core per large segment
* `doc`: break each shard into as many segments as there are cores
`doc` is the fastest, but has a lot of overhead, especially for complex
Lucene queries. `segment` is fast, but doesn't make the most out of CPUs
when there are few segments. `shard` has the lowest overhead.
Previously we always used `segment` partitioning because it doesn't have
the terrible overhead but is fast. With this change we use `doc` when
the top level query matches all documents - those have very very low
overhead even in the `doc` partitioning. That's the twice as fast
example above.
This also uses the `shard` partitioning for queries that don't have to
do much work like `FROM foo` or `FROM foo | LIMIT 1` or
`FROM foo | SORT a`. That's the lower CPU example above.
This forking choice is taken very late on the data node. So queries like
this:
```
FROM test | WHERE @timestamp > "2025-01-01T00:00:00Z" | STATS SUM(b)
```
can also use the `doc` partitioning when all documents are after the
timestamp and all documents have `b`.
This PR fixes#119950 where an `IN` query includes `NULL` values with non-NULL `DataType` appearing within the query range. An expression is considered `NULL` when its `DataType` is `NULL` or it is a `Literal` with a value of `null`.
If updating the `index.time_series.end_time` fails for one data stream,
then UpdateTimeSeriesRangeService should continue updating this setting for other data streams.
The following error was observed in the wild:
```
[2025-04-07T08:50:39,698][WARN ][o.e.d.UpdateTimeSeriesRangeService] [node-01] failed to update tsdb data stream end times
java.lang.IllegalArgumentException: [index.time_series.end_time] requires [index.mode=time_series]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:636) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.index.IndexSettings$1.validate(IndexSettings.java:619) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:563) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.settings.Setting.get(Setting.java:535) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService.updateTimeSeriesTemporalRange(UpdateTimeSeriesRangeService.java:111) ~[?:?]
at org.elasticsearch.datastreams.UpdateTimeSeriesRangeService$UpdateTimeSeriesExecutor.execute(UpdateTimeSeriesRangeService.java:210) ~[?:?]
at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.17.3.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.17.3.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1575) ~[?:?]
```
Which resulted in a situation, that causes the `index.time_series.end_time` index setting not being updated for any data stream. This then caused data loss as metrics couldn't be indexed, because no suitable backing index could be resolved:
```
the document timestamp [2025-03-26T15:26:10.000Z] is outside of ranges of currently writable indices [[2025-01-31T07:22:43.000Z,2025-02-15T07:24:06.000Z][2025-02-15T07:24:06.000Z,2025-03-02T07:34:07.000Z][2025-03-02T07:34:07.000Z,2025-03-10T12:45:37.000Z][2025-03-10T12:45:37.000Z,2025-03-10T14:30:37.000Z][2025-03-10T14:30:37.000Z,2025-03-25T12:50:40.000Z][2025-03-25T12:50:40.000Z,2025-03-25T14:35:40.000Z
```
This removes all non-test usage of
Metadata.Builder.put(IndexMetadata.Builder)
And replaces it with appropriate calls to the equivalent method on
`ProjectMetadata.Builder`
In most cases this _does not_ make the code project aware, but does
reduce the number of deprecated methods in use