mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-30 18:33:26 -04:00
When encountering a multi-value, a single-value function (i.e. all non-`mv_xxx()`) returns a `null`. This behaviour is opaque to the user. This PR adds the functionality for these functions to emit a `Warning` header, so the user is informed about the cause for the `null`s. Within testing, there are some differences between the emulated CSV-based tests (`TestPhysical*`) and the REST CSV-tests and thus the exact messages in the warnings: * The REST ones can push operations to Lucene; when this happens, a query containing a negation, `not <predicate>`, can be translated to a `must_not` query, that will include the `not` in the `Source`. But outside of Lucene, the execution would consider the predicate first, then the negation. So when the predicate contains a SV function, only this part's `Source` will show up in the warning. * When pushed to Lucene, a query is wrapped within the `SingleValueQuery`. This emits now warnings when encountering MVs (and returning no match). However, this only happens once the query that it wraps returns something itself. Comparatively, the `TestPhysical*` filters will issue a warning for every encountered MV (irrespective of sigle values within the MV matching or not). To differentiate between the slightly differing values of the warnings, one can now append the `#[Emulated:` prefix to a warning, followed by the value of the warning for the emulated checks, then a corresponding `]`. Example: `warning:Line 1:24: evaluation of [not(salary_change < 1)] failed, treating result as null. Only first 20 failures recorded.#[Emulated:Line 1:28: evaluation of [salary_change < 1] failed, treating result as null. Only first 20 failures recorded.]` Closes #98743.
248 lines
4.5 KiB
Text
248 lines
4.5 KiB
Text
[[esql-multivalued-fields]]
|
|
=== {esql} multivalued fields
|
|
|
|
++++
|
|
<titleabbrev>Multivalued fields</titleabbrev>
|
|
++++
|
|
|
|
{esql} is fine reading from multivalued fields:
|
|
|
|
[source,console,id=esql-multivalued-fields-reorders]
|
|
----
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": 3 }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
Multivalued fields come back as a JSON array:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2]],
|
|
[2, 3]
|
|
]
|
|
}
|
|
----
|
|
|
|
The relative order of values in a multivalued field is undefined. They'll frequently be in
|
|
ascending order but don't rely on that.
|
|
|
|
[discrete]
|
|
[[esql-multivalued-fields-dups]]
|
|
==== Duplicate values
|
|
|
|
Some field types, like <<keyword-field-type,`keyword`>> remove duplicate values on write:
|
|
|
|
[source,console,id=esql-multivalued-fields-kwdups]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "keyword"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": ["foo", "foo", "bar"] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": ["bar", "bar"] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
And {esql} sees that removal:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "keyword"}
|
|
],
|
|
"values": [
|
|
[1, ["bar", "foo"]],
|
|
[2, "bar"]
|
|
]
|
|
}
|
|
----
|
|
|
|
But other types, like `long` don't remove duplicates.
|
|
|
|
[source,console,id=esql-multivalued-fields-longdups]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "long"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": [1, 1] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
And {esql} also sees that:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2, 2]],
|
|
[2, [1, 1]]
|
|
]
|
|
}
|
|
----
|
|
|
|
This is all at the storage layer. If you store duplicate `long`s and then
|
|
convert them to strings the duplicates will stay:
|
|
|
|
[source,console,id=esql-multivalued-fields-longdups-tostring]
|
|
----
|
|
PUT /mv
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"b": {"type": "long"}
|
|
}
|
|
}
|
|
}
|
|
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": [1, 1] }
|
|
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b=TO_STRING(b) | LIMIT 2"
|
|
}
|
|
----
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "keyword"}
|
|
],
|
|
"values": [
|
|
[1, ["1", "2", "2"]],
|
|
[2, ["1", "1"]]
|
|
]
|
|
}
|
|
----
|
|
|
|
[discrete]
|
|
[[esql-multivalued-fields-functions]]
|
|
==== Functions
|
|
|
|
Unless otherwise documented functions will return `null` when applied to a multivalued
|
|
field. This behavior may change in a later version.
|
|
|
|
[source,console,id=esql-multivalued-fields-mv-into-null]
|
|
----
|
|
POST /mv/_bulk?refresh
|
|
{ "index" : {} }
|
|
{ "a": 1, "b": [2, 1] }
|
|
{ "index" : {} }
|
|
{ "a": 2, "b": 3 }
|
|
----
|
|
|
|
[source,console]
|
|
----
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b + 2, a + b | LIMIT 4"
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
// TEST[warning:Line 1:16: evaluation of [b + 2] failed, treating result as null. Only first 20 failures recorded.]
|
|
// TEST[warning:Line 1:16: java.lang.IllegalArgumentException: single-value function encountered multi-value]
|
|
// TEST[warning:Line 1:23: evaluation of [a + b] failed, treating result as null. Only first 20 failures recorded.]
|
|
// TEST[warning:Line 1:23: java.lang.IllegalArgumentException: single-value function encountered multi-value]
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"},
|
|
{ "name": "b+2", "type": "long"},
|
|
{ "name": "a+b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, [1, 2], null, null],
|
|
[2, 3, 5, 5]
|
|
]
|
|
}
|
|
----
|
|
|
|
Work around this limitation by converting the field to single value with one of:
|
|
|
|
* <<esql-mv_avg>>
|
|
* <<esql-mv_concat>>
|
|
* <<esql-mv_count>>
|
|
* <<esql-mv_max>>
|
|
* <<esql-mv_median>>
|
|
* <<esql-mv_min>>
|
|
* <<esql-mv_sum>>
|
|
|
|
[source,console,esql-multivalued-fields-mv-into-null]
|
|
----
|
|
POST /_query
|
|
{
|
|
"query": "FROM mv | EVAL b=MV_MIN(b) | EVAL b + 2, a + b | LIMIT 4"
|
|
}
|
|
----
|
|
// TEST[continued]
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"columns": [
|
|
{ "name": "a", "type": "long"},
|
|
{ "name": "b", "type": "long"},
|
|
{ "name": "b+2", "type": "long"},
|
|
{ "name": "a+b", "type": "long"}
|
|
],
|
|
"values": [
|
|
[1, 1, 3, 2],
|
|
[2, 3, 5, 5]
|
|
]
|
|
}
|
|
----
|
|
|