mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-04-24 15:17:30 -04:00
ESQL: Add BlockHash#lookup
(#107762)
Adds a `lookup` method to `BlockHash` which finds keys that were already in the hash without modifying it and returns the "ordinal" that the `BlockHash` produced when that key had been called with `add`. For multi-column keys this can change the number of values pretty drastically. You get a combinatorial explosion of values. So if you have three columns with 2 values each the most values you can get is 2*2*2=8. If you have five columns with ten values each you can have 100,000 values in a single position! That's too many. Let's do an example! This one has a two row block containing three colunms. One row has two values in each column so it could produce at most 8 values. In this case one of the values is missing from the hash, so it only produces 7. Block: | a | b | c | | ----:| ----:| ----:| | 1 | 4 | 6 | | 1, 2 | 3, 4 | 5, 6 | BlockHash contents: | a | b | c | | -:| -:| -:| | 1 | 3 | 5 | | 1 | 3 | 6 | | 1 | 4 | 5 | | 1 | 4 | 6 | | 2 | 3 | 5 | | 2 | 3 | 6 | | 2 | 4 | 6 | Results: | ord | | -------------------:| | 3 | | 0, 1, 2, 3, 4, 5, 6 | The `add` method has a fairly fool-proof mechanism to work around this, it calls it's consumers with a callback that can split positions into multiple calls. It calls the callback in batches of like 16,000 positions at a time. And aggs uses the callback. So you can aggregate over five colunms with ten values each. It's slow, but the callbacks let us get through it. Unlike `add`, `lookup` can't use a callback. We're going to need it to return `Iterator` of `IntBlock`s containing ordinals. That's just how we're going to use it. That'd be ok, but we can't split a single position across multiple `Block`s. That's just not how `Block` works. So, instead, we fail the query if we produce more than 100,000 entries in a single position. We'd like to stop collecting and emit a warning, but that's a problem for another change. That's a single 400kb array which is quite big. Anyway! If we're not bumping into massive rows we emit `IntBlocks` targeting a particular size in memory. Likely we'll also want to plug in a target number of rows as well, but for now this'll do.
This commit is contained in:
parent
52af16adb1
commit
0f68c673f7
27 changed files with 811 additions and 104 deletions
|
@ -0,0 +1,49 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||
* in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||
* Side Public License, v 1.
|
||||
*/
|
||||
|
||||
package org.elasticsearch.core;
|
||||
|
||||
import java.util.Iterator;
|
||||
import java.util.Objects;
|
||||
|
||||
/**
|
||||
* An {@link Iterator} with state that must be {@link #close() released}.
|
||||
*/
|
||||
public interface ReleasableIterator<T> extends Releasable, Iterator<T> {
|
||||
/**
|
||||
* Returns a single element iterator over the supplied value.
|
||||
*/
|
||||
static <T extends Releasable> ReleasableIterator<T> single(T element) {
|
||||
return new ReleasableIterator<>() {
|
||||
private T value = Objects.requireNonNull(element);
|
||||
|
||||
@Override
|
||||
public boolean hasNext() {
|
||||
return value != null;
|
||||
}
|
||||
|
||||
@Override
|
||||
public T next() {
|
||||
final T res = value;
|
||||
value = null;
|
||||
return res;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() {
|
||||
Releasables.close(value);
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return "ReleasableIterator[" + value + "]";
|
||||
}
|
||||
|
||||
};
|
||||
}
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue