Enhance ES|QL responses to include information about `took` time (search latency), shards, and
clusters against which the query was executed.
The goal of this PR is to begin to provide parity between the metadata displayed for
cross-cluster searches in _search and ES|QL.
This PR adds the following features:
- add overall `took` time to all ES|QL query responses. And to emphasize: "all" here
means: async search, sync search, local-only and cross-cluster searches, so it goes
beyond just CCS.
- add `_clusters` metadata to the final response for cross-cluster searches, for both
async and sync search (see example below)
- tracking/reporting counts of skipped shards from the can_match (SearchShards API)
phase of ES|QL processing
- marking clusters as skipped if they cannot be connected to (during the field-caps
phase of processing)
Out of scope for this PR:
- honoring the `skip_unavailable` cluster setting
- showing `_clusters` metadata in the async response **while** the search is still running
- showing any shard failure messages (since any shard search failures in ES|QL are
automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that
this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section.
Things changed with respect to behavior in `_search`:
- the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL
response, since ESQL does not support timeouts. It could be added back later
if/when ESQL supports timeouts.
- the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL
response, since any shard failure causes the whole query to fail.
Example output from ES|QL CCS:
```es
POST /_query
{
"query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}
```
```json
{
"took": 49,
"columns": [
{
"name": "authors.first_name",
"type": "text"
},
{
"name": "publish_date",
"type": "date"
}
],
"values": [
[
"Tammy",
"2009-11-04T04:08:07.000Z"
],
[
"Theresa",
"2019-05-10T21:22:32.000Z"
],
[
"Jason",
"2021-11-23T00:57:30.000Z"
],
[
"Craig",
"2019-12-14T21:24:29.000Z"
],
[
"Alexandra",
"2013-02-15T18:13:24.000Z"
]
],
"_clusters": {
"total": 3,
"successful": 2,
"running": 0,
"skipped": 1,
"partial": 0,
"failed": 0,
"details": {
"(local)": {
"status": "successful",
"indices": "blogs",
"took": 43,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
},
"remote2": {
"status": "skipped", // remote2 was offline when this query was run
"indices": "remote2:bl*",
"took": 0,
"_shards": {
"total": 0,
"successful": 0,
"skipped": 0,
"failed": 0
}
},
"remote1": {
"status": "successful",
"indices": "remote1:blogs",
"took": 47,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
}
}
}
}
}
```
Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935
Allow nested expressions to be used both for grouping or inside
aggregate functions inside the stats command.
As such the grammar has been tweaked to allow the stats group to have
optional aliasing.
As part of this fix, preserve the original field declaration (including
spaces) for implicit aliases.
Improve validation for incorrect aggregate function use (as arguments,
grouping or inside evals).
Fix#99828
When encountering a multi-value, a single-value function (i.e. all
non-`mv_xxx()`) returns a `null`. This behaviour is opaque to the user.
This PR adds the functionality for these functions to emit a `Warning`
header, so the user is informed about the cause for the `null`s.
Within testing, there are some differences between the emulated
CSV-based tests (`TestPhysical*`) and the REST CSV-tests and thus the
exact messages in the warnings: * The REST ones can push operations to
Lucene; when this happens, a query containing a negation, `not
<predicate>`, can be translated to a `must_not` query, that will include
the `not` in the `Source`. But outside of Lucene, the execution would
consider the predicate first, then the negation. So when the predicate
contains a SV function, only this part's `Source` will show up in the
warning. * When pushed to Lucene, a query is wrapped within the
`SingleValueQuery`. This emits now warnings when encountering MVs (and
returning no match). However, this only happens once the query that it
wraps returns something itself. Comparatively, the `TestPhysical*`
filters will issue a warning for every encountered MV (irrespective of
sigle values within the MV matching or not).
To differentiate between the slightly differing values of the warnings,
one can now append the `#[Emulated:` prefix to a warning, followed by
the value of the warning for the emulated checks, then a corresponding
`]`. Example: `warning:Line 1:24: evaluation of [not(salary_change <
1)] failed, treating result as null. Only first 20 failures
recorded.#[Emulated:Line 1:28: evaluation of [salary_change < 1] failed,
treating result as null. Only first 20 failures recorded.]`
Closes#98743.
* Break out 'Limitations' into separate page
* Add REST API docs
* Restructure commands, functions, and operators refs
* Add placeholder for getting started guide
* Group 'Syntax', 'Metafields', and 'MV fields' under 'Language'
* Add placeholder for Kibana page
* Add link from landing page
* Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs
* Reword default LIMIT
* Add support for COUNT(*)
* Move 'Commands' and 'Functions and operators' to individual pages
---------
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds a new ES|QL endpoint, `_query`, to replace the now deprecated
`_esql`. The latter is still kept for a while, emitting a deprecation
warning.
Fixes ESQL-1379.
Describes how we fetch multivalued fields by default, return them as
json arrays, how the internal sort order is not guaranteed, how most
functions will turn them into null, and how some fields remove
duplicates on save.
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>