mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
Enhance ES|QL responses to include information about `took` time (search latency), shards, and clusters against which the query was executed. The goal of this PR is to begin to provide parity between the metadata displayed for cross-cluster searches in _search and ES|QL. This PR adds the following features: - add overall `took` time to all ES|QL query responses. And to emphasize: "all" here means: async search, sync search, local-only and cross-cluster searches, so it goes beyond just CCS. - add `_clusters` metadata to the final response for cross-cluster searches, for both async and sync search (see example below) - tracking/reporting counts of skipped shards from the can_match (SearchShards API) phase of ES|QL processing - marking clusters as skipped if they cannot be connected to (during the field-caps phase of processing) Out of scope for this PR: - honoring the `skip_unavailable` cluster setting - showing `_clusters` metadata in the async response **while** the search is still running - showing any shard failure messages (since any shard search failures in ES|QL are automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section. Things changed with respect to behavior in `_search`: - the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL response, since ESQL does not support timeouts. It could be added back later if/when ESQL supports timeouts. - the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL response, since any shard failure causes the whole query to fail. Example output from ES|QL CCS: ```es POST /_query { "query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5" } ``` ```json { "took": 49, "columns": [ { "name": "authors.first_name", "type": "text" }, { "name": "publish_date", "type": "date" } ], "values": [ [ "Tammy", "2009-11-04T04:08:07.000Z" ], [ "Theresa", "2019-05-10T21:22:32.000Z" ], [ "Jason", "2021-11-23T00:57:30.000Z" ], [ "Craig", "2019-12-14T21:24:29.000Z" ], [ "Alexandra", "2013-02-15T18:13:24.000Z" ] ], "_clusters": { "total": 3, "successful": 2, "running": 0, "skipped": 1, "partial": 0, "failed": 0, "details": { "(local)": { "status": "successful", "indices": "blogs", "took": 43, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } }, "remote2": { "status": "skipped", // remote2 was offline when this query was run "indices": "remote2:bl*", "took": 0, "_shards": { "total": 0, "successful": 0, "skipped": 0, "failed": 0 } }, "remote1": { "status": "successful", "indices": "remote1:blogs", "took": 47, "_shards": { "total": 13, "successful": 13, "skipped": 0, "failed": 0 } } } } } ``` Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935
262 lines
4.8 KiB
Text
262 lines
4.8 KiB
Text
[[esql-multivalued-fields]]
|
|
=== {esql} multivalued fields
|
|
|
|
++++
|
|
<titleabbrev>Multivalued fields</titleabbrev>
|
|
++++
|
|
|
|
{esql} is fine reading from multivalued fields:
|
|
|
|
[source,console,id=esql-multivalued-fields-reorders]
|
|
----
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": 3 }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
Multivalued fields come back as a JSON array:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2]],
|
|
[2, 3]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|
|
|
|
|
|
The relative order of values in a multivalued field is undefined. They'll frequently be in
|
|
ascending order but don't rely on that.
|
|
|
|
[discrete]
|
|
[[esql-multivalued-fields-dups]]
|
|
==== Duplicate values
|
|
|
|
Some field types, like <<keyword-field-type,`keyword`>> remove duplicate values on write:
|
|
|
|
[source,console,id=esql-multivalued-fields-kwdups]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "keyword"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": ["foo", "foo", "bar"] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": ["bar", "bar"] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
And {esql} sees that removal:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "keyword"}
|
|
],
|
|
"values": [
|
|
[1, ["bar", "foo"]],
|
|
[2, "bar"]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|
|
|
|
|
|
But other types, like `long` don't remove duplicates.
|
|
|
|
[source,console,id=esql-multivalued-fields-longdups]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "long"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": [1, 1] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
And {esql} also sees that:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2, 2]],
|
|
[2, [1, 1]]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|
|
|
|
|
|
This is all at the storage layer. If you store duplicate `long`s and then
|
|
convert them to strings the duplicates will stay:
|
|
|
|
[source,console,id=esql-multivalued-fields-longdups-tostring]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "long"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": [1, 1] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b=TO_STRING(b) | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "keyword"}
|
|
],
|
|
"values": [
|
|
[1, ["1", "2", "2"]],
|
|
[2, ["1", "1"]]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|
|
|
|
[discrete]
|
|
[[esql-multivalued-fields-functions]]
|
|
==== Functions
|
|
|
|
Unless otherwise documented functions will return `null` when applied to a multivalued
|
|
field.
|
|
|
|
[source,console,id=esql-multivalued-fields-mv-into-null]
|
|
----
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": 3 }
|
|
----
|
|
|
|
[source,console]
|
|
----
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b + 2, a + b | LIMIT 4"
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
// TEST[warning:Line 1:16: evaluation of [b + 2] failed, treating result as null. Only first 20 failures recorded.]
|
|
// TEST[warning:Line 1:16: java.lang.IllegalArgumentException: single-value function encountered multi-value]
|
|
// TEST[warning:Line 1:23: evaluation of [a + b] failed, treating result as null. Only first 20 failures recorded.]
|
|
// TEST[warning:Line 1:23: java.lang.IllegalArgumentException: single-value function encountered multi-value]
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"},
|
|
{ "name": "b + 2", "type": "long"},
|
|
{ "name": "a + b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2], null, null],
|
|
[2, 3, 5, 5]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|
|
|
|
Work around this limitation by converting the field to single value with one of:
|
|
|
|
* <<esql-mv_avg>>
|
|
* <<esql-mv_concat>>
|
|
* <<esql-mv_count>>
|
|
* <<esql-mv_max>>
|
|
* <<esql-mv_median>>
|
|
* <<esql-mv_min>>
|
|
* <<esql-mv_sum>>
|
|
|
|
[source,console,esql-multivalued-fields-mv-into-null]
|
|
----
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b=MV_MIN(b) | EVAL b + 2, a + b | LIMIT 4"
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"took": 28,
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"},
|
|
{ "name": "b + 2", "type": "long"},
|
|
{ "name": "a + b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, 1, 3, 2],
|
|
[2, 3, 5, 5]
|
|
]
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
|