elasticsearch/docs/reference/esql/multivalued-fields.asciidoc
Stanislav Malyshev f27f74666f
ES|QL async queries: Partial result on demand (#118122)
Add capability to stop async query on demand
The theory:

- User initiates async search request
- User sends the stop request (POST _query/async/<ID>/stop)
- If the async is finished by that time, it's like regular async get
- If it's not finished, the sinks are closed and the request is forcefully finished
2025-01-23 10:21:52 -07:00

300 lines
5.5 KiB
Text

[[esql-multivalued-fields]]
=== {esql} multivalued fields
++++
<titleabbrev>Multivalued fields</titleabbrev>
++++
{esql} is fine reading from multivalued fields:
[source,console,id=esql-multivalued-fields-reorders]
----
POST /mv/_bulk?refresh
{ "index" : {} }
{ "a": 1, "b": [2, 1] }
{ "index" : {} }
{ "a": 2, "b": 3 }
POST /_query
{
"query": "FROM mv | LIMIT 2"
}
----
Multivalued fields come back as a JSON array:
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "long"}
],
"values": [
[1, [1, 2]],
[2, 3]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
The relative order of values in a multivalued field is undefined. They'll frequently be in
ascending order but don't rely on that.
[discrete]
[[esql-multivalued-fields-dups]]
==== Duplicate values
Some field types, like <<keyword-field-type,`keyword`>> remove duplicate values on write:
[source,console,id=esql-multivalued-fields-kwdups]
----
PUT /mv
{
"mappings": {
"properties": {
"b": {"type": "keyword"}
}
}
}
POST /mv/_bulk?refresh
{ "index" : {} }
{ "a": 1, "b": ["foo", "foo", "bar"] }
{ "index" : {} }
{ "a": 2, "b": ["bar", "bar"] }
POST /_query
{
"query": "FROM mv | LIMIT 2"
}
----
And {esql} sees that removal:
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "keyword"}
],
"values": [
[1, ["bar", "foo"]],
[2, "bar"]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
But other types, like `long` don't remove duplicates.
[source,console,id=esql-multivalued-fields-longdups]
----
PUT /mv
{
"mappings": {
"properties": {
"b": {"type": "long"}
}
}
}
POST /mv/_bulk?refresh
{ "index" : {} }
{ "a": 1, "b": [2, 2, 1] }
{ "index" : {} }
{ "a": 2, "b": [1, 1] }
POST /_query
{
"query": "FROM mv | LIMIT 2"
}
----
And {esql} also sees that:
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "long"}
],
"values": [
[1, [1, 2, 2]],
[2, [1, 1]]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
This is all at the storage layer. If you store duplicate `long`s and then
convert them to strings the duplicates will stay:
[source,console,id=esql-multivalued-fields-longdups-tostring]
----
PUT /mv
{
"mappings": {
"properties": {
"b": {"type": "long"}
}
}
}
POST /mv/_bulk?refresh
{ "index" : {} }
{ "a": 1, "b": [2, 2, 1] }
{ "index" : {} }
{ "a": 2, "b": [1, 1] }
POST /_query
{
"query": "FROM mv | EVAL b=TO_STRING(b) | LIMIT 2"
}
----
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "keyword"}
],
"values": [
[1, ["1", "2", "2"]],
[2, ["1", "1"]]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
[discrete]
[[esql-multivalued-nulls]]
==== `null` in a list
`null` values in a list are not preserved at the storage layer:
[source,console,id=esql-multivalued-fields-multivalued-nulls]
----
POST /mv/_doc?refresh
{ "a": [2, null, 1] }
POST /_query
{
"query": "FROM mv | LIMIT 1"
}
----
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
],
"values": [
[[1, 2]],
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
[discrete]
[[esql-multivalued-fields-functions]]
==== Functions
Unless otherwise documented functions will return `null` when applied to a multivalued
field.
[source,console,id=esql-multivalued-fields-mv-into-null]
----
POST /mv/_bulk?refresh
{ "index" : {} }
{ "a": 1, "b": [2, 1] }
{ "index" : {} }
{ "a": 2, "b": 3 }
----
[source,console]
----
POST /_query
{
"query": "FROM mv | EVAL b + 2, a + b | LIMIT 4"
}
----
// TEST[continued]
// TEST[warning:Line 1:16: evaluation of [b + 2] failed, treating result as null. Only first 20 failures recorded.]
// TEST[warning:Line 1:16: java.lang.IllegalArgumentException: single-value function encountered multi-value]
// TEST[warning:Line 1:23: evaluation of [a + b] failed, treating result as null. Only first 20 failures recorded.]
// TEST[warning:Line 1:23: java.lang.IllegalArgumentException: single-value function encountered multi-value]
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "long"},
{ "name": "b + 2", "type": "long"},
{ "name": "a + b", "type": "long"}
],
"values": [
[1, [1, 2], null, null],
[2, 3, 5, 5]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]
Work around this limitation by converting the field to single value with one of:
* <<esql-mv_avg>>
* <<esql-mv_concat>>
* <<esql-mv_count>>
* <<esql-mv_max>>
* <<esql-mv_median>>
* <<esql-mv_min>>
* <<esql-mv_sum>>
[source,console,esql-multivalued-fields-mv-into-null]
----
POST /_query
{
"query": "FROM mv | EVAL b=MV_MIN(b) | EVAL b + 2, a + b | LIMIT 4"
}
----
// TEST[continued]
[source,console-result]
----
{
"took": 28,
"is_partial": false,
"columns": [
{ "name": "a", "type": "long"},
{ "name": "b", "type": "long"},
{ "name": "b + 2", "type": "long"},
{ "name": "a + b", "type": "long"}
],
"values": [
[1, 1, 3, 2],
[2, 3, 5, 5]
]
}
----
// TESTRESPONSE[s/"took": 28/"took": "$body.took"/]