mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 09:28:55 -04:00
Update sparse_vector field mapping to include default setting for token pruning (#129089)
* Initial checkin of refactored index_options code * [CI] Auto commit changes from spotless * initial unit testing * complete unit tests; add yaml tests * [CI] Auto commit changes from spotless * register test feature for sparse vector * Update docs/changelog/129089.yaml * update changelog * add docs * explicit set default index_options if null * [CI] Auto commit changes from spotless * update yaml tests; update docs * fix yaml tests * readd auth for teardown * only serialize index options if not default * [CI] Auto commit changes from spotless * serialization refactor; pass index version around * [CI] Auto commit changes from spotless * fix transport versions merge * fix up docs * [CI] Auto commit changes from spotless * fix docs; add include_defaults unit and yaml test * [CI] Auto commit changes from spotless * override getIndexReaderManager for SemanticQueryBuilderTests * [CI] Auto commit changes from spotless * cleanup mapper/builder/tests; index vers. in type still need to refactor / clean YAML tests * [CI] Auto commit changes from spotless * cleanups to mapper tests for clarity * [CI] Auto commit changes from spotless * move feature into mappers; fix yaml tests * cleanups; add comments; remove redundant test * [CI] Auto commit changes from spotless * escape more periods in the YAML tests * cleanup mapper and type tests * [CI] Auto commit changes from spotless * rename mapping for previous index test * set explicit number of shards for yaml test --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
This commit is contained in:
parent
a324853d43
commit
a671505c8a
17 changed files with 2408 additions and 50 deletions
5
docs/changelog/129089.yaml
Normal file
5
docs/changelog/129089.yaml
Normal file
|
@ -0,0 +1,5 @@
|
||||||
|
pr: 129089
|
||||||
|
summary: Update `sparse_vector` field mapping to include default setting for token pruning
|
||||||
|
area: Mapping
|
||||||
|
type: enhancement
|
||||||
|
issues: []
|
|
@ -24,6 +24,33 @@ PUT my-index
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Token pruning
|
||||||
|
```{applies_to}
|
||||||
|
stack: preview 9.1
|
||||||
|
```
|
||||||
|
|
||||||
|
With any new indices created, token pruning will be turned on by default with appropriate defaults. You can control this behaviour using the optional `index_options` parameters for the field:
|
||||||
|
|
||||||
|
```console
|
||||||
|
PUT my-index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"properties": {
|
||||||
|
"text.tokens": {
|
||||||
|
"type": "sparse_vector",
|
||||||
|
"index_options": {
|
||||||
|
"prune": true,
|
||||||
|
"pruning_config": {
|
||||||
|
"tokens_freq_ratio_threshold": 5,
|
||||||
|
"tokens_weight_threshold": 0.4
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
See [semantic search with ELSER](docs-content://solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.
|
See [semantic search with ELSER](docs-content://solutions/search/semantic-search/semantic-search-elser-ingest-pipelines.md) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.
|
||||||
|
|
||||||
## Parameters for `sparse_vector` fields [sparse-vectors-params]
|
## Parameters for `sparse_vector` fields [sparse-vectors-params]
|
||||||
|
@ -36,6 +63,38 @@ The following parameters are accepted by `sparse_vector` fields:
|
||||||
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
|
* Exclude the field from [_source](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#source-filtering).
|
||||||
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).
|
* Use [synthetic `_source`](/reference/elasticsearch/mapping-reference/mapping-source-field.md#synthetic-source).
|
||||||
|
|
||||||
|
index_options {applies_to}`stack: preview 9.1`
|
||||||
|
: (Optional, object) You can set index options for your `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your [`sparse_vector` query](/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md), Elasticsearch will use the default options configured for the field, if any.
|
||||||
|
|
||||||
|
Parameters for `index_options` are:
|
||||||
|
|
||||||
|
`prune` {applies_to}`stack: preview 9.1`
|
||||||
|
: (Optional, boolean) Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: true.
|
||||||
|
|
||||||
|
`pruning_config` {applies_to}`stack: preview 9.1`
|
||||||
|
: (Optional, object) Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used. If `prune` is set to false but `pruning_config` is specified, an exception will occur.
|
||||||
|
|
||||||
|
Parameters for `pruning_config` include:
|
||||||
|
|
||||||
|
`tokens_freq_ratio_threshold` {applies_to}`stack: preview 9.1`
|
||||||
|
: (Optional, integer) Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.
|
||||||
|
|
||||||
|
`tokens_weight_threshold` {applies_to}`stack: preview 9.1`
|
||||||
|
: (Optional, float) Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.
|
||||||
|
|
||||||
|
::::{note}
|
||||||
|
The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the most optimal results.
|
||||||
|
::::
|
||||||
|
|
||||||
|
When token pruning is applied, non-significant tokens will be pruned from the query.
|
||||||
|
Non-significant tokens can be defined as tokens that meet both of the following criteria:
|
||||||
|
* The token appears much more frequently than most tokens, indicating that it is a very common word and may not benefit the overall search results much.
|
||||||
|
* The weight/score is so low that the token is likely not very relevant to the original term
|
||||||
|
|
||||||
|
Both the token frequency threshold and weight threshold must show the token is non-significant in order for the token to be pruned.
|
||||||
|
This ensures that:
|
||||||
|
* The tokens that are kept are frequent enough and have significant scoring.
|
||||||
|
* Very infrequent tokens that may not have as high of a score are removed.
|
||||||
|
|
||||||
|
|
||||||
## Multi-value sparse vectors [index-multi-value-sparse-vectors]
|
## Multi-value sparse vectors [index-multi-value-sparse-vectors]
|
||||||
|
|
|
@ -203,7 +203,7 @@ public class TransportVersions {
|
||||||
public static final TransportVersion ML_INFERENCE_CUSTOM_SERVICE_INPUT_TYPE_8_19 = def(8_841_0_55);
|
public static final TransportVersion ML_INFERENCE_CUSTOM_SERVICE_INPUT_TYPE_8_19 = def(8_841_0_55);
|
||||||
public static final TransportVersion RANDOM_SAMPLER_QUERY_BUILDER_8_19 = def(8_841_0_56);
|
public static final TransportVersion RANDOM_SAMPLER_QUERY_BUILDER_8_19 = def(8_841_0_56);
|
||||||
public static final TransportVersion ML_INFERENCE_SAGEMAKER_ELASTIC_8_19 = def(8_841_0_57);
|
public static final TransportVersion ML_INFERENCE_SAGEMAKER_ELASTIC_8_19 = def(8_841_0_57);
|
||||||
|
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19 = def(8_841_0_58);
|
||||||
public static final TransportVersion V_9_0_0 = def(9_000_0_09);
|
public static final TransportVersion V_9_0_0 = def(9_000_0_09);
|
||||||
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
|
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_1 = def(9_000_0_10);
|
||||||
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
|
public static final TransportVersion INITIAL_ELASTICSEARCH_9_0_2 = def(9_000_0_11);
|
||||||
|
@ -313,6 +313,7 @@ public class TransportVersions {
|
||||||
public static final TransportVersion STREAMS_LOGS_SUPPORT = def(9_104_0_00);
|
public static final TransportVersion STREAMS_LOGS_SUPPORT = def(9_104_0_00);
|
||||||
public static final TransportVersion ML_INFERENCE_CUSTOM_SERVICE_INPUT_TYPE = def(9_105_0_00);
|
public static final TransportVersion ML_INFERENCE_CUSTOM_SERVICE_INPUT_TYPE = def(9_105_0_00);
|
||||||
public static final TransportVersion ML_INFERENCE_SAGEMAKER_ELASTIC = def(9_106_0_00);
|
public static final TransportVersion ML_INFERENCE_SAGEMAKER_ELASTIC = def(9_106_0_00);
|
||||||
|
public static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS = def(9_107_0_00);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* STOP! READ THIS FIRST! No, really,
|
* STOP! READ THIS FIRST! No, really,
|
||||||
|
|
|
@ -144,6 +144,7 @@ public class IndexVersions {
|
||||||
public static final IndexVersion INDEX_INT_SORT_INT_TYPE_8_19 = def(8_532_0_00, Version.LUCENE_9_12_1);
|
public static final IndexVersion INDEX_INT_SORT_INT_TYPE_8_19 = def(8_532_0_00, Version.LUCENE_9_12_1);
|
||||||
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED_8_19 = def(8_533_0_00, Version.LUCENE_9_12_1);
|
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED_8_19 = def(8_533_0_00, Version.LUCENE_9_12_1);
|
||||||
public static final IndexVersion UPGRADE_TO_LUCENE_9_12_2 = def(8_534_0_00, Version.LUCENE_9_12_2);
|
public static final IndexVersion UPGRADE_TO_LUCENE_9_12_2 = def(8_534_0_00, Version.LUCENE_9_12_2);
|
||||||
|
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT_BACKPORT_8_X = def(8_535_0_00, Version.LUCENE_9_12_2);
|
||||||
public static final IndexVersion UPGRADE_TO_LUCENE_10_0_0 = def(9_000_0_00, Version.LUCENE_10_0_0);
|
public static final IndexVersion UPGRADE_TO_LUCENE_10_0_0 = def(9_000_0_00, Version.LUCENE_10_0_0);
|
||||||
public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = def(9_001_0_00, Version.LUCENE_10_0_0);
|
public static final IndexVersion LOGSDB_DEFAULT_IGNORE_DYNAMIC_BEYOND_LIMIT = def(9_001_0_00, Version.LUCENE_10_0_0);
|
||||||
public static final IndexVersion TIME_BASED_K_ORDERED_DOC_ID = def(9_002_0_00, Version.LUCENE_10_0_0);
|
public static final IndexVersion TIME_BASED_K_ORDERED_DOC_ID = def(9_002_0_00, Version.LUCENE_10_0_0);
|
||||||
|
@ -175,6 +176,7 @@ public class IndexVersions {
|
||||||
public static final IndexVersion INDEX_INT_SORT_INT_TYPE = def(9_028_0_00, Version.LUCENE_10_2_1);
|
public static final IndexVersion INDEX_INT_SORT_INT_TYPE = def(9_028_0_00, Version.LUCENE_10_2_1);
|
||||||
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED = def(9_029_0_00, Version.LUCENE_10_2_1);
|
public static final IndexVersion MAPPER_TEXT_MATCH_ONLY_MULTI_FIELDS_DEFAULT_NOT_STORED = def(9_029_0_00, Version.LUCENE_10_2_1);
|
||||||
public static final IndexVersion UPGRADE_TO_LUCENE_10_2_2 = def(9_030_0_00, Version.LUCENE_10_2_2);
|
public static final IndexVersion UPGRADE_TO_LUCENE_10_2_2 = def(9_030_0_00, Version.LUCENE_10_2_2);
|
||||||
|
public static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT = def(9_031_0_00, Version.LUCENE_10_2_2);
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* STOP! READ THIS FIRST! No, really,
|
* STOP! READ THIS FIRST! No, really,
|
||||||
|
|
|
@ -17,6 +17,7 @@ import java.util.Set;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.RESCORE_VECTOR_QUANTIZED_VECTOR_MAPPING;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.RESCORE_VECTOR_QUANTIZED_VECTOR_MAPPING;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.RESCORE_ZERO_VECTOR_QUANTIZED_VECTOR_MAPPING;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.RESCORE_ZERO_VECTOR_QUANTIZED_VECTOR_MAPPING;
|
||||||
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.USE_DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ;
|
import static org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper.USE_DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ;
|
||||||
|
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.SPARSE_VECTOR_INDEX_OPTIONS_FEATURE;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Spec for mapper-related features.
|
* Spec for mapper-related features.
|
||||||
|
@ -74,7 +75,8 @@ public class MapperFeatures implements FeatureSpecification {
|
||||||
USE_DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ,
|
USE_DEFAULT_OVERSAMPLE_VALUE_FOR_BBQ,
|
||||||
IVF_FORMAT_CLUSTER_FEATURE,
|
IVF_FORMAT_CLUSTER_FEATURE,
|
||||||
IVF_NESTED_SUPPORT,
|
IVF_NESTED_SUPPORT,
|
||||||
SEARCH_LOAD_PER_SHARD
|
SEARCH_LOAD_PER_SHARD,
|
||||||
|
SPARSE_VECTOR_INDEX_OPTIONS_FEATURE
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -22,6 +22,9 @@ import org.apache.lucene.search.Query;
|
||||||
import org.apache.lucene.util.BytesRef;
|
import org.apache.lucene.util.BytesRef;
|
||||||
import org.elasticsearch.common.logging.DeprecationCategory;
|
import org.elasticsearch.common.logging.DeprecationCategory;
|
||||||
import org.elasticsearch.common.lucene.Lucene;
|
import org.elasticsearch.common.lucene.Lucene;
|
||||||
|
import org.elasticsearch.common.xcontent.support.XContentMapValues;
|
||||||
|
import org.elasticsearch.core.Nullable;
|
||||||
|
import org.elasticsearch.features.NodeFeature;
|
||||||
import org.elasticsearch.index.IndexVersion;
|
import org.elasticsearch.index.IndexVersion;
|
||||||
import org.elasticsearch.index.IndexVersions;
|
import org.elasticsearch.index.IndexVersions;
|
||||||
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
||||||
|
@ -31,6 +34,7 @@ import org.elasticsearch.index.mapper.DocumentParserContext;
|
||||||
import org.elasticsearch.index.mapper.FieldMapper;
|
import org.elasticsearch.index.mapper.FieldMapper;
|
||||||
import org.elasticsearch.index.mapper.MappedFieldType;
|
import org.elasticsearch.index.mapper.MappedFieldType;
|
||||||
import org.elasticsearch.index.mapper.MapperBuilderContext;
|
import org.elasticsearch.index.mapper.MapperBuilderContext;
|
||||||
|
import org.elasticsearch.index.mapper.MappingParserContext;
|
||||||
import org.elasticsearch.index.mapper.SourceLoader;
|
import org.elasticsearch.index.mapper.SourceLoader;
|
||||||
import org.elasticsearch.index.mapper.SourceValueFetcher;
|
import org.elasticsearch.index.mapper.SourceValueFetcher;
|
||||||
import org.elasticsearch.index.mapper.TextSearchInfo;
|
import org.elasticsearch.index.mapper.TextSearchInfo;
|
||||||
|
@ -40,17 +44,27 @@ import org.elasticsearch.inference.WeightedToken;
|
||||||
import org.elasticsearch.inference.WeightedTokensUtils;
|
import org.elasticsearch.inference.WeightedTokensUtils;
|
||||||
import org.elasticsearch.search.fetch.StoredFieldsSpec;
|
import org.elasticsearch.search.fetch.StoredFieldsSpec;
|
||||||
import org.elasticsearch.search.lookup.Source;
|
import org.elasticsearch.search.lookup.Source;
|
||||||
|
import org.elasticsearch.xcontent.ConstructingObjectParser;
|
||||||
|
import org.elasticsearch.xcontent.DeprecationHandler;
|
||||||
|
import org.elasticsearch.xcontent.NamedXContentRegistry;
|
||||||
|
import org.elasticsearch.xcontent.ParseField;
|
||||||
|
import org.elasticsearch.xcontent.ToXContent;
|
||||||
import org.elasticsearch.xcontent.XContentBuilder;
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
|
import org.elasticsearch.xcontent.XContentParser;
|
||||||
import org.elasticsearch.xcontent.XContentParser.Token;
|
import org.elasticsearch.xcontent.XContentParser.Token;
|
||||||
|
import org.elasticsearch.xcontent.XContentType;
|
||||||
|
import org.elasticsearch.xcontent.support.MapXContentParser;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.io.UncheckedIOException;
|
import java.io.UncheckedIOException;
|
||||||
import java.util.LinkedHashMap;
|
import java.util.LinkedHashMap;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
|
import java.util.Objects;
|
||||||
import java.util.stream.Stream;
|
import java.util.stream.Stream;
|
||||||
|
|
||||||
import static org.elasticsearch.index.query.AbstractQueryBuilder.DEFAULT_BOOST;
|
import static org.elasticsearch.index.query.AbstractQueryBuilder.DEFAULT_BOOST;
|
||||||
|
import static org.elasticsearch.xcontent.ConstructingObjectParser.optionalConstructorArg;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* A {@link FieldMapper} that exposes Lucene's {@link FeatureField} as a sparse
|
* A {@link FieldMapper} that exposes Lucene's {@link FeatureField} as a sparse
|
||||||
|
@ -59,6 +73,7 @@ import static org.elasticsearch.index.query.AbstractQueryBuilder.DEFAULT_BOOST;
|
||||||
public class SparseVectorFieldMapper extends FieldMapper {
|
public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
|
|
||||||
public static final String CONTENT_TYPE = "sparse_vector";
|
public static final String CONTENT_TYPE = "sparse_vector";
|
||||||
|
public static final String SPARSE_VECTOR_INDEX_OPTIONS = "index_options";
|
||||||
|
|
||||||
static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to "
|
static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to "
|
||||||
+ "contain [sparse_vector] fields, but they cannot be indexed or searched.";
|
+ "contain [sparse_vector] fields, but they cannot be indexed or searched.";
|
||||||
|
@ -67,17 +82,34 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
|
|
||||||
static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersions.NEW_SPARSE_VECTOR;
|
static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersions.NEW_SPARSE_VECTOR;
|
||||||
static final IndexVersion SPARSE_VECTOR_IN_FIELD_NAMES_INDEX_VERSION = IndexVersions.SPARSE_VECTOR_IN_FIELD_NAMES_SUPPORT;
|
static final IndexVersion SPARSE_VECTOR_IN_FIELD_NAMES_INDEX_VERSION = IndexVersions.SPARSE_VECTOR_IN_FIELD_NAMES_SUPPORT;
|
||||||
|
static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION = IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT;
|
||||||
|
static final IndexVersion SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION_8_X =
|
||||||
|
IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT_BACKPORT_8_X;
|
||||||
|
|
||||||
|
public static final NodeFeature SPARSE_VECTOR_INDEX_OPTIONS_FEATURE = new NodeFeature("sparse_vector.index_options_supported");
|
||||||
|
|
||||||
private static SparseVectorFieldMapper toType(FieldMapper in) {
|
private static SparseVectorFieldMapper toType(FieldMapper in) {
|
||||||
return (SparseVectorFieldMapper) in;
|
return (SparseVectorFieldMapper) in;
|
||||||
}
|
}
|
||||||
|
|
||||||
public static class Builder extends FieldMapper.Builder {
|
public static class Builder extends FieldMapper.Builder {
|
||||||
|
private final IndexVersion indexVersionCreated;
|
||||||
|
|
||||||
private final Parameter<Boolean> stored = Parameter.storeParam(m -> toType(m).fieldType().isStored(), false);
|
private final Parameter<Boolean> stored = Parameter.storeParam(m -> toType(m).fieldType().isStored(), false);
|
||||||
private final Parameter<Map<String, String>> meta = Parameter.metaParam();
|
private final Parameter<Map<String, String>> meta = Parameter.metaParam();
|
||||||
|
private final Parameter<IndexOptions> indexOptions = new Parameter<>(
|
||||||
|
SPARSE_VECTOR_INDEX_OPTIONS,
|
||||||
|
true,
|
||||||
|
() -> null,
|
||||||
|
(n, c, o) -> parseIndexOptions(c, o),
|
||||||
|
m -> toType(m).fieldType().indexOptions,
|
||||||
|
XContentBuilder::field,
|
||||||
|
Objects::toString
|
||||||
|
).acceptsNull().setSerializerCheck(this::indexOptionsSerializerCheck);
|
||||||
|
|
||||||
public Builder(String name) {
|
public Builder(String name, IndexVersion indexVersionCreated) {
|
||||||
super(name);
|
super(name);
|
||||||
|
this.indexVersionCreated = indexVersionCreated;
|
||||||
}
|
}
|
||||||
|
|
||||||
public Builder setStored(boolean value) {
|
public Builder setStored(boolean value) {
|
||||||
|
@ -87,17 +119,74 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected Parameter<?>[] getParameters() {
|
protected Parameter<?>[] getParameters() {
|
||||||
return new Parameter<?>[] { stored, meta };
|
return new Parameter<?>[] { stored, meta, indexOptions };
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public SparseVectorFieldMapper build(MapperBuilderContext context) {
|
public SparseVectorFieldMapper build(MapperBuilderContext context) {
|
||||||
|
IndexOptions builderIndexOptions = indexOptions.getValue();
|
||||||
|
if (builderIndexOptions == null) {
|
||||||
|
builderIndexOptions = getDefaultIndexOptions(indexVersionCreated);
|
||||||
|
}
|
||||||
|
|
||||||
return new SparseVectorFieldMapper(
|
return new SparseVectorFieldMapper(
|
||||||
leafName(),
|
leafName(),
|
||||||
new SparseVectorFieldType(context.buildFullName(leafName()), stored.getValue(), meta.getValue()),
|
new SparseVectorFieldType(
|
||||||
|
indexVersionCreated,
|
||||||
|
context.buildFullName(leafName()),
|
||||||
|
stored.getValue(),
|
||||||
|
meta.getValue(),
|
||||||
|
builderIndexOptions
|
||||||
|
),
|
||||||
builderParams(this, context)
|
builderParams(this, context)
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private IndexOptions getDefaultIndexOptions(IndexVersion indexVersion) {
|
||||||
|
return (indexVersion.onOrAfter(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION)
|
||||||
|
|| indexVersion.between(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION_8_X, IndexVersions.UPGRADE_TO_LUCENE_10_0_0))
|
||||||
|
? IndexOptions.DEFAULT_PRUNING_INDEX_OPTIONS
|
||||||
|
: null;
|
||||||
|
}
|
||||||
|
|
||||||
|
private boolean indexOptionsSerializerCheck(boolean includeDefaults, boolean isConfigured, IndexOptions value) {
|
||||||
|
return includeDefaults || (IndexOptions.isDefaultOptions(value, indexVersionCreated) == false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public IndexOptions getIndexOptions() {
|
||||||
|
return fieldType().getIndexOptions();
|
||||||
|
}
|
||||||
|
|
||||||
|
private static final ConstructingObjectParser<IndexOptions, Void> INDEX_OPTIONS_PARSER = new ConstructingObjectParser<>(
|
||||||
|
SPARSE_VECTOR_INDEX_OPTIONS,
|
||||||
|
args -> new IndexOptions((Boolean) args[0], (TokenPruningConfig) args[1])
|
||||||
|
);
|
||||||
|
|
||||||
|
static {
|
||||||
|
INDEX_OPTIONS_PARSER.declareBoolean(optionalConstructorArg(), IndexOptions.PRUNE_FIELD_NAME);
|
||||||
|
INDEX_OPTIONS_PARSER.declareObject(optionalConstructorArg(), TokenPruningConfig.PARSER, IndexOptions.PRUNING_CONFIG_FIELD_NAME);
|
||||||
|
}
|
||||||
|
|
||||||
|
private static SparseVectorFieldMapper.IndexOptions parseIndexOptions(MappingParserContext context, Object propNode) {
|
||||||
|
if (propNode == null) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
Map<String, Object> indexOptionsMap = XContentMapValues.nodeMapValue(propNode, SPARSE_VECTOR_INDEX_OPTIONS);
|
||||||
|
|
||||||
|
XContentParser parser = new MapXContentParser(
|
||||||
|
NamedXContentRegistry.EMPTY,
|
||||||
|
DeprecationHandler.IGNORE_DEPRECATIONS,
|
||||||
|
indexOptionsMap,
|
||||||
|
XContentType.JSON
|
||||||
|
);
|
||||||
|
|
||||||
|
try {
|
||||||
|
return INDEX_OPTIONS_PARSER.parse(parser, null);
|
||||||
|
} catch (IOException e) {
|
||||||
|
throw new UncheckedIOException(e);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
public static final TypeParser PARSER = new TypeParser((n, c) -> {
|
public static final TypeParser PARSER = new TypeParser((n, c) -> {
|
||||||
|
@ -107,13 +196,31 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
throw new IllegalArgumentException(ERROR_MESSAGE_8X);
|
throw new IllegalArgumentException(ERROR_MESSAGE_8X);
|
||||||
}
|
}
|
||||||
|
|
||||||
return new Builder(n);
|
return new Builder(n, c.indexVersionCreated());
|
||||||
}, notInMultiFields(CONTENT_TYPE));
|
}, notInMultiFields(CONTENT_TYPE));
|
||||||
|
|
||||||
public static final class SparseVectorFieldType extends MappedFieldType {
|
public static final class SparseVectorFieldType extends MappedFieldType {
|
||||||
|
private final IndexVersion indexVersionCreated;
|
||||||
|
private final IndexOptions indexOptions;
|
||||||
|
|
||||||
public SparseVectorFieldType(String name, boolean isStored, Map<String, String> meta) {
|
public SparseVectorFieldType(IndexVersion indexVersionCreated, String name, boolean isStored, Map<String, String> meta) {
|
||||||
|
this(indexVersionCreated, name, isStored, meta, null);
|
||||||
|
}
|
||||||
|
|
||||||
|
public SparseVectorFieldType(
|
||||||
|
IndexVersion indexVersionCreated,
|
||||||
|
String name,
|
||||||
|
boolean isStored,
|
||||||
|
Map<String, String> meta,
|
||||||
|
@Nullable SparseVectorFieldMapper.IndexOptions indexOptions
|
||||||
|
) {
|
||||||
super(name, true, isStored, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);
|
super(name, true, isStored, false, TextSearchInfo.SIMPLE_MATCH_ONLY, meta);
|
||||||
|
this.indexVersionCreated = indexVersionCreated;
|
||||||
|
this.indexOptions = indexOptions;
|
||||||
|
}
|
||||||
|
|
||||||
|
public IndexOptions getIndexOptions() {
|
||||||
|
return indexOptions;
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -160,11 +267,30 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
SearchExecutionContext context,
|
SearchExecutionContext context,
|
||||||
String fieldName,
|
String fieldName,
|
||||||
List<WeightedToken> queryVectors,
|
List<WeightedToken> queryVectors,
|
||||||
boolean shouldPruneTokens,
|
Boolean shouldPruneTokensFromQuery,
|
||||||
TokenPruningConfig tokenPruningConfig
|
TokenPruningConfig tokenPruningConfigFromQuery
|
||||||
) throws IOException {
|
) throws IOException {
|
||||||
return (shouldPruneTokens)
|
Boolean shouldPruneTokens = shouldPruneTokensFromQuery;
|
||||||
? WeightedTokensUtils.queryBuilderWithPrunedTokens(fieldName, tokenPruningConfig, queryVectors, this, context)
|
TokenPruningConfig tokenPruningConfig = tokenPruningConfigFromQuery;
|
||||||
|
|
||||||
|
if (indexOptions != null) {
|
||||||
|
if (shouldPruneTokens == null && indexOptions.prune != null) {
|
||||||
|
shouldPruneTokens = indexOptions.prune;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (tokenPruningConfig == null && indexOptions.pruningConfig != null) {
|
||||||
|
tokenPruningConfig = indexOptions.pruningConfig;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return (shouldPruneTokens != null && shouldPruneTokens)
|
||||||
|
? WeightedTokensUtils.queryBuilderWithPrunedTokens(
|
||||||
|
fieldName,
|
||||||
|
tokenPruningConfig == null ? new TokenPruningConfig() : tokenPruningConfig,
|
||||||
|
queryVectors,
|
||||||
|
this,
|
||||||
|
context
|
||||||
|
)
|
||||||
: WeightedTokensUtils.queryBuilderWithAllTokens(fieldName, queryVectors, this, context);
|
: WeightedTokensUtils.queryBuilderWithAllTokens(fieldName, queryVectors, this, context);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -195,7 +321,7 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public FieldMapper.Builder getMergeBuilder() {
|
public FieldMapper.Builder getMergeBuilder() {
|
||||||
return new Builder(leafName()).init(this);
|
return new Builder(leafName(), this.fieldType().indexVersionCreated).init(this);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -273,6 +399,12 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
return CONTENT_TYPE;
|
return CONTENT_TYPE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static boolean indexVersionSupportsDefaultPruningConfig(IndexVersion indexVersion) {
|
||||||
|
// default pruning for 9.1.0+ or 8.19.0+ is true for this index
|
||||||
|
return (indexVersion.onOrAfter(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION)
|
||||||
|
|| indexVersion.between(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION_8_X, IndexVersions.UPGRADE_TO_LUCENE_10_0_0));
|
||||||
|
}
|
||||||
|
|
||||||
private static class SparseVectorValueFetcher implements ValueFetcher {
|
private static class SparseVectorValueFetcher implements ValueFetcher {
|
||||||
private final String fieldName;
|
private final String fieldName;
|
||||||
private TermVectors termVectors;
|
private TermVectors termVectors;
|
||||||
|
@ -383,4 +515,79 @@ public class SparseVectorFieldMapper extends FieldMapper {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public static class IndexOptions implements ToXContent {
|
||||||
|
public static final ParseField PRUNE_FIELD_NAME = new ParseField("prune");
|
||||||
|
public static final ParseField PRUNING_CONFIG_FIELD_NAME = new ParseField("pruning_config");
|
||||||
|
public static final IndexOptions DEFAULT_PRUNING_INDEX_OPTIONS = new IndexOptions(true, new TokenPruningConfig());
|
||||||
|
|
||||||
|
final Boolean prune;
|
||||||
|
final TokenPruningConfig pruningConfig;
|
||||||
|
|
||||||
|
IndexOptions(@Nullable Boolean prune, @Nullable TokenPruningConfig pruningConfig) {
|
||||||
|
if (pruningConfig != null && (prune == null || prune == false)) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"["
|
||||||
|
+ SPARSE_VECTOR_INDEX_OPTIONS
|
||||||
|
+ "] field ["
|
||||||
|
+ PRUNING_CONFIG_FIELD_NAME.getPreferredName()
|
||||||
|
+ "] should only be set if ["
|
||||||
|
+ PRUNE_FIELD_NAME.getPreferredName()
|
||||||
|
+ "] is set to true"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
this.prune = prune;
|
||||||
|
this.pruningConfig = pruningConfig;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static boolean isDefaultOptions(IndexOptions indexOptions, IndexVersion indexVersion) {
|
||||||
|
IndexOptions defaultIndexOptions = indexVersionSupportsDefaultPruningConfig(indexVersion)
|
||||||
|
? DEFAULT_PRUNING_INDEX_OPTIONS
|
||||||
|
: null;
|
||||||
|
|
||||||
|
return Objects.equals(indexOptions, defaultIndexOptions);
|
||||||
|
}
|
||||||
|
|
||||||
|
public Boolean getPrune() {
|
||||||
|
return prune;
|
||||||
|
}
|
||||||
|
|
||||||
|
public TokenPruningConfig getPruningConfig() {
|
||||||
|
return pruningConfig;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
|
||||||
|
builder.startObject();
|
||||||
|
|
||||||
|
if (prune != null) {
|
||||||
|
builder.field(PRUNE_FIELD_NAME.getPreferredName(), prune);
|
||||||
|
}
|
||||||
|
if (pruningConfig != null) {
|
||||||
|
builder.field(PRUNING_CONFIG_FIELD_NAME.getPreferredName(), pruningConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
builder.endObject();
|
||||||
|
return builder;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public final boolean equals(Object other) {
|
||||||
|
if (other == this) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (other == null || getClass() != other.getClass()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
IndexOptions otherAsIndexOptions = (IndexOptions) other;
|
||||||
|
return Objects.equals(prune, otherAsIndexOptions.prune) && Objects.equals(pruningConfig, otherAsIndexOptions.pruningConfig);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public final int hashCode() {
|
||||||
|
return Objects.hash(prune, pruningConfig);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -14,16 +14,25 @@ import org.elasticsearch.common.io.stream.StreamInput;
|
||||||
import org.elasticsearch.common.io.stream.StreamOutput;
|
import org.elasticsearch.common.io.stream.StreamOutput;
|
||||||
import org.elasticsearch.common.io.stream.Writeable;
|
import org.elasticsearch.common.io.stream.Writeable;
|
||||||
import org.elasticsearch.index.query.QueryBuilder;
|
import org.elasticsearch.index.query.QueryBuilder;
|
||||||
|
import org.elasticsearch.xcontent.ConstructingObjectParser;
|
||||||
|
import org.elasticsearch.xcontent.DeprecationHandler;
|
||||||
|
import org.elasticsearch.xcontent.NamedXContentRegistry;
|
||||||
import org.elasticsearch.xcontent.ParseField;
|
import org.elasticsearch.xcontent.ParseField;
|
||||||
import org.elasticsearch.xcontent.ToXContentObject;
|
import org.elasticsearch.xcontent.ToXContentObject;
|
||||||
import org.elasticsearch.xcontent.XContentBuilder;
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
import org.elasticsearch.xcontent.XContentParser;
|
import org.elasticsearch.xcontent.XContentParser;
|
||||||
|
import org.elasticsearch.xcontent.XContentType;
|
||||||
|
import org.elasticsearch.xcontent.support.MapXContentParser;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
|
import java.io.UncheckedIOException;
|
||||||
import java.util.Locale;
|
import java.util.Locale;
|
||||||
|
import java.util.Map;
|
||||||
import java.util.Objects;
|
import java.util.Objects;
|
||||||
import java.util.Set;
|
import java.util.Set;
|
||||||
|
|
||||||
|
import static org.elasticsearch.xcontent.ConstructingObjectParser.optionalConstructorArg;
|
||||||
|
|
||||||
public class TokenPruningConfig implements Writeable, ToXContentObject {
|
public class TokenPruningConfig implements Writeable, ToXContentObject {
|
||||||
public static final String PRUNING_CONFIG_FIELD = "pruning_config";
|
public static final String PRUNING_CONFIG_FIELD = "pruning_config";
|
||||||
public static final ParseField TOKENS_FREQ_RATIO_THRESHOLD = new ParseField("tokens_freq_ratio_threshold");
|
public static final ParseField TOKENS_FREQ_RATIO_THRESHOLD = new ParseField("tokens_freq_ratio_threshold");
|
||||||
|
@ -176,4 +185,38 @@ public class TokenPruningConfig implements Writeable, ToXContentObject {
|
||||||
}
|
}
|
||||||
return new TokenPruningConfig(ratioThreshold, weightThreshold, onlyScorePrunedTokens);
|
return new TokenPruningConfig(ratioThreshold, weightThreshold, onlyScorePrunedTokens);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public static final ConstructingObjectParser<TokenPruningConfig, Void> PARSER = new ConstructingObjectParser<>(
|
||||||
|
PRUNING_CONFIG_FIELD,
|
||||||
|
args -> new TokenPruningConfig(
|
||||||
|
args[0] == null ? DEFAULT_TOKENS_FREQ_RATIO_THRESHOLD : (Float) args[0],
|
||||||
|
args[1] == null ? DEFAULT_TOKENS_WEIGHT_THRESHOLD : (Float) args[1],
|
||||||
|
args[2] != null && (Boolean) args[2]
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
static {
|
||||||
|
PARSER.declareFloat(optionalConstructorArg(), TOKENS_FREQ_RATIO_THRESHOLD);
|
||||||
|
PARSER.declareFloat(optionalConstructorArg(), TOKENS_WEIGHT_THRESHOLD);
|
||||||
|
PARSER.declareBoolean(optionalConstructorArg(), ONLY_SCORE_PRUNED_TOKENS_FIELD);
|
||||||
|
}
|
||||||
|
|
||||||
|
public static TokenPruningConfig parseFromMap(Map<String, Object> pruningConfigMap) {
|
||||||
|
if (pruningConfigMap == null) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
XContentParser parser = new MapXContentParser(
|
||||||
|
NamedXContentRegistry.EMPTY,
|
||||||
|
DeprecationHandler.IGNORE_DEPRECATIONS,
|
||||||
|
pruningConfigMap,
|
||||||
|
XContentType.JSON
|
||||||
|
);
|
||||||
|
|
||||||
|
return PARSER.parse(parser, null);
|
||||||
|
} catch (IOException ioEx) {
|
||||||
|
throw new UncheckedIOException(ioEx);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -14,11 +14,17 @@ import org.apache.lucene.analysis.tokenattributes.TermFrequencyAttribute;
|
||||||
import org.apache.lucene.index.DirectoryReader;
|
import org.apache.lucene.index.DirectoryReader;
|
||||||
import org.apache.lucene.index.IndexableField;
|
import org.apache.lucene.index.IndexableField;
|
||||||
import org.apache.lucene.index.LeafReader;
|
import org.apache.lucene.index.LeafReader;
|
||||||
|
import org.apache.lucene.search.BooleanClause;
|
||||||
|
import org.apache.lucene.search.BooleanQuery;
|
||||||
import org.apache.lucene.search.IndexSearcher;
|
import org.apache.lucene.search.IndexSearcher;
|
||||||
|
import org.apache.lucene.search.Query;
|
||||||
import org.apache.lucene.store.Directory;
|
import org.apache.lucene.store.Directory;
|
||||||
import org.apache.lucene.tests.index.RandomIndexWriter;
|
import org.apache.lucene.tests.index.RandomIndexWriter;
|
||||||
import org.elasticsearch.common.Strings;
|
import org.elasticsearch.common.Strings;
|
||||||
|
import org.elasticsearch.common.bytes.BytesReference;
|
||||||
import org.elasticsearch.common.compress.CompressedXContent;
|
import org.elasticsearch.common.compress.CompressedXContent;
|
||||||
|
import org.elasticsearch.core.CheckedConsumer;
|
||||||
|
import org.elasticsearch.core.Nullable;
|
||||||
import org.elasticsearch.index.IndexVersion;
|
import org.elasticsearch.index.IndexVersion;
|
||||||
import org.elasticsearch.index.IndexVersions;
|
import org.elasticsearch.index.IndexVersions;
|
||||||
import org.elasticsearch.index.mapper.DocumentMapper;
|
import org.elasticsearch.index.mapper.DocumentMapper;
|
||||||
|
@ -28,20 +34,32 @@ import org.elasticsearch.index.mapper.MapperParsingException;
|
||||||
import org.elasticsearch.index.mapper.MapperService;
|
import org.elasticsearch.index.mapper.MapperService;
|
||||||
import org.elasticsearch.index.mapper.MapperTestCase;
|
import org.elasticsearch.index.mapper.MapperTestCase;
|
||||||
import org.elasticsearch.index.mapper.ParsedDocument;
|
import org.elasticsearch.index.mapper.ParsedDocument;
|
||||||
|
import org.elasticsearch.index.query.SearchExecutionContext;
|
||||||
|
import org.elasticsearch.inference.WeightedToken;
|
||||||
import org.elasticsearch.search.lookup.Source;
|
import org.elasticsearch.search.lookup.Source;
|
||||||
|
import org.elasticsearch.search.vectors.SparseVectorQueryWrapper;
|
||||||
import org.elasticsearch.test.index.IndexVersionUtils;
|
import org.elasticsearch.test.index.IndexVersionUtils;
|
||||||
|
import org.elasticsearch.xcontent.ToXContent;
|
||||||
import org.elasticsearch.xcontent.XContentBuilder;
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
|
import org.elasticsearch.xcontent.XContentParseException;
|
||||||
|
import org.elasticsearch.xcontent.XContentType;
|
||||||
|
import org.elasticsearch.xcontent.json.JsonXContent;
|
||||||
import org.hamcrest.Matchers;
|
import org.hamcrest.Matchers;
|
||||||
import org.junit.AssumptionViolatedException;
|
import org.junit.AssumptionViolatedException;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.util.Arrays;
|
import java.util.Arrays;
|
||||||
|
import java.util.Collection;
|
||||||
import java.util.LinkedHashMap;
|
import java.util.LinkedHashMap;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
|
|
||||||
|
import static org.elasticsearch.index.IndexVersions.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT;
|
||||||
|
import static org.elasticsearch.index.IndexVersions.UPGRADE_TO_LUCENE_10_0_0;
|
||||||
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.NEW_SPARSE_VECTOR_INDEX_VERSION;
|
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.NEW_SPARSE_VECTOR_INDEX_VERSION;
|
||||||
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.PREVIOUS_SPARSE_VECTOR_INDEX_VERSION;
|
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.PREVIOUS_SPARSE_VECTOR_INDEX_VERSION;
|
||||||
|
import static org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper.SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION;
|
||||||
|
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertToXContentEquivalent;
|
||||||
import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder;
|
import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder;
|
||||||
import static org.hamcrest.Matchers.containsString;
|
import static org.hamcrest.Matchers.containsString;
|
||||||
import static org.hamcrest.Matchers.equalTo;
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
|
@ -67,6 +85,98 @@ public class SparseVectorFieldMapperTests extends MapperTestCase {
|
||||||
b.field("type", "sparse_vector");
|
b.field("type", "sparse_vector");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
protected void minimalFieldMappingPreviousIndexDefaultsIncluded(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.field("store", false);
|
||||||
|
|
||||||
|
b.startObject("meta");
|
||||||
|
b.endObject();
|
||||||
|
|
||||||
|
b.field("index_options", (Object) null);
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void minimalMappingWithExplicitDefaults(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.field("store", false);
|
||||||
|
|
||||||
|
b.startObject("meta");
|
||||||
|
b.endObject();
|
||||||
|
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
{
|
||||||
|
b.field("tokens_freq_ratio_threshold", TokenPruningConfig.DEFAULT_TOKENS_FREQ_RATIO_THRESHOLD);
|
||||||
|
b.field("tokens_weight_threshold", TokenPruningConfig.DEFAULT_TOKENS_WEIGHT_THRESHOLD);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void minimalMappingWithExplicitIndexOptions(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
{
|
||||||
|
b.field("tokens_freq_ratio_threshold", 3.0f);
|
||||||
|
b.field("tokens_weight_threshold", 0.5f);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void serializedMappingWithSomeIndexOptions(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
{
|
||||||
|
b.field("tokens_freq_ratio_threshold", 3.0f);
|
||||||
|
b.field("tokens_weight_threshold", TokenPruningConfig.DEFAULT_TOKENS_WEIGHT_THRESHOLD);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void minimalMappingWithSomeExplicitIndexOptions(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
{
|
||||||
|
b.field("tokens_freq_ratio_threshold", 3.0f);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void mappingWithIndexOptionsOnlyPruneTrue(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", true);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected void mappingWithIndexOptionsPruneFalse(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
{
|
||||||
|
b.field("prune", false);
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected boolean supportsStoredFields() {
|
protected boolean supportsStoredFields() {
|
||||||
return false;
|
return false;
|
||||||
|
@ -120,6 +230,84 @@ public class SparseVectorFieldMapperTests extends MapperTestCase {
|
||||||
assertTrue(freq1 < freq2);
|
assertTrue(freq1 < freq2);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public void testDefaultsWithAndWithoutIncludeDefaults() throws Exception {
|
||||||
|
XContentBuilder orig = JsonXContent.contentBuilder().startObject();
|
||||||
|
createMapperService(fieldMapping(this::minimalMapping)).mappingLookup().getMapper("field").toXContent(orig, INCLUDE_DEFAULTS);
|
||||||
|
orig.endObject();
|
||||||
|
|
||||||
|
XContentBuilder withDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
withDefaults.startObject("field");
|
||||||
|
minimalMappingWithExplicitDefaults(withDefaults);
|
||||||
|
withDefaults.endObject();
|
||||||
|
withDefaults.endObject();
|
||||||
|
|
||||||
|
assertToXContentEquivalent(BytesReference.bytes(withDefaults), BytesReference.bytes(orig), XContentType.JSON);
|
||||||
|
|
||||||
|
XContentBuilder origWithoutDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
createMapperService(fieldMapping(this::minimalMapping)).mappingLookup()
|
||||||
|
.getMapper("field")
|
||||||
|
.toXContent(origWithoutDefaults, ToXContent.EMPTY_PARAMS);
|
||||||
|
origWithoutDefaults.endObject();
|
||||||
|
|
||||||
|
XContentBuilder withoutDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
withoutDefaults.startObject("field");
|
||||||
|
minimalMapping(withoutDefaults);
|
||||||
|
withoutDefaults.endObject();
|
||||||
|
withoutDefaults.endObject();
|
||||||
|
|
||||||
|
assertToXContentEquivalent(BytesReference.bytes(withoutDefaults), BytesReference.bytes(origWithoutDefaults), XContentType.JSON);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testDefaultsWithAndWithoutIncludeDefaultsOlderIndexVersion() throws Exception {
|
||||||
|
IndexVersion indexVersion = IndexVersionUtils.randomVersionBetween(
|
||||||
|
random(),
|
||||||
|
UPGRADE_TO_LUCENE_10_0_0,
|
||||||
|
IndexVersionUtils.getPreviousVersion(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_VERSION)
|
||||||
|
);
|
||||||
|
|
||||||
|
XContentBuilder orig = JsonXContent.contentBuilder().startObject();
|
||||||
|
createMapperService(indexVersion, fieldMapping(this::minimalMapping)).mappingLookup()
|
||||||
|
.getMapper("field")
|
||||||
|
.toXContent(orig, INCLUDE_DEFAULTS);
|
||||||
|
orig.endObject();
|
||||||
|
|
||||||
|
XContentBuilder withDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
withDefaults.startObject("field");
|
||||||
|
minimalFieldMappingPreviousIndexDefaultsIncluded(withDefaults);
|
||||||
|
withDefaults.endObject();
|
||||||
|
withDefaults.endObject();
|
||||||
|
|
||||||
|
assertToXContentEquivalent(BytesReference.bytes(withDefaults), BytesReference.bytes(orig), XContentType.JSON);
|
||||||
|
|
||||||
|
XContentBuilder origWithoutDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
createMapperService(indexVersion, fieldMapping(this::minimalMapping)).mappingLookup()
|
||||||
|
.getMapper("field")
|
||||||
|
.toXContent(origWithoutDefaults, ToXContent.EMPTY_PARAMS);
|
||||||
|
origWithoutDefaults.endObject();
|
||||||
|
|
||||||
|
XContentBuilder withoutDefaults = JsonXContent.contentBuilder().startObject();
|
||||||
|
withoutDefaults.startObject("field");
|
||||||
|
minimalMapping(withoutDefaults);
|
||||||
|
withoutDefaults.endObject();
|
||||||
|
withoutDefaults.endObject();
|
||||||
|
|
||||||
|
assertToXContentEquivalent(BytesReference.bytes(withoutDefaults), BytesReference.bytes(origWithoutDefaults), XContentType.JSON);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testMappingWithExplicitIndexOptions() throws Exception {
|
||||||
|
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMappingWithExplicitIndexOptions));
|
||||||
|
assertEquals(Strings.toString(fieldMapping(this::minimalMappingWithExplicitIndexOptions)), mapper.mappingSource().toString());
|
||||||
|
|
||||||
|
mapper = createDocumentMapper(fieldMapping(this::mappingWithIndexOptionsPruneFalse));
|
||||||
|
assertEquals(Strings.toString(fieldMapping(this::mappingWithIndexOptionsPruneFalse)), mapper.mappingSource().toString());
|
||||||
|
|
||||||
|
mapper = createDocumentMapper(fieldMapping(this::minimalMappingWithSomeExplicitIndexOptions));
|
||||||
|
assertEquals(Strings.toString(fieldMapping(this::serializedMappingWithSomeIndexOptions)), mapper.mappingSource().toString());
|
||||||
|
|
||||||
|
mapper = createDocumentMapper(fieldMapping(this::mappingWithIndexOptionsOnlyPruneTrue));
|
||||||
|
assertEquals(Strings.toString(fieldMapping(this::mappingWithIndexOptionsOnlyPruneTrue)), mapper.mappingSource().toString());
|
||||||
|
}
|
||||||
|
|
||||||
public void testDotInFieldName() throws Exception {
|
public void testDotInFieldName() throws Exception {
|
||||||
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
|
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
|
||||||
ParsedDocument parsedDocument = mapper.parse(source(b -> b.field("field", Map.of("foo.bar", 10, "foobar", 20))));
|
ParsedDocument parsedDocument = mapper.parse(source(b -> b.field("field", Map.of("foo.bar", 10, "foobar", 20))));
|
||||||
|
@ -306,7 +494,7 @@ public class SparseVectorFieldMapperTests extends MapperTestCase {
|
||||||
return NEW_SPARSE_VECTOR_INDEX_VERSION;
|
return NEW_SPARSE_VECTOR_INDEX_VERSION;
|
||||||
}
|
}
|
||||||
|
|
||||||
public void testSparseVectorUnsupportedIndex() throws Exception {
|
public void testSparseVectorUnsupportedIndex() {
|
||||||
IndexVersion version = IndexVersionUtils.randomVersionBetween(
|
IndexVersion version = IndexVersionUtils.randomVersionBetween(
|
||||||
random(),
|
random(),
|
||||||
PREVIOUS_SPARSE_VECTOR_INDEX_VERSION,
|
PREVIOUS_SPARSE_VECTOR_INDEX_VERSION,
|
||||||
|
@ -318,6 +506,393 @@ public class SparseVectorFieldMapperTests extends MapperTestCase {
|
||||||
assertThat(e.getMessage(), containsString(SparseVectorFieldMapper.ERROR_MESSAGE_8X));
|
assertThat(e.getMessage(), containsString(SparseVectorFieldMapper.ERROR_MESSAGE_8X));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public void testPruneMustBeBoolean() {
|
||||||
|
Exception e = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", "othervalue");
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(e.getMessage(), containsString("[index_options] failed to parse field [prune]"));
|
||||||
|
assertThat(e.getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
e.getCause().getCause().getMessage(),
|
||||||
|
containsString("Failed to parse value [othervalue] as only [true] or [false] are allowed.")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testPruningConfigurationIsMap() {
|
||||||
|
Exception e = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.field("pruning_config", "this_is_not_a_map");
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(e.getMessage(), containsString("[index_options] pruning_config doesn't support values of type:"));
|
||||||
|
assertThat(e.getCause(), instanceOf(XContentParseException.class));
|
||||||
|
assertThat(
|
||||||
|
e.getCause().getMessage(),
|
||||||
|
containsString("[index_options] pruning_config doesn't support values of type: VALUE_STRING")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testWithIndexOptionsPruningConfigPruneRequired() throws Exception {
|
||||||
|
|
||||||
|
Exception eTestPruneIsFalse = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", false);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_freq_ratio_threshold", 5.0);
|
||||||
|
b.field("tokens_weight_threshold", 0.4);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestPruneIsFalse.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestPruneIsFalse.getCause().getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestPruneIsFalse.getCause().getCause().getCause().getMessage(),
|
||||||
|
containsString("[index_options] field [pruning_config] should only be set if [prune] is set to true")
|
||||||
|
);
|
||||||
|
|
||||||
|
Exception eTestPruneIsMissing = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_freq_ratio_threshold", 5.0);
|
||||||
|
b.field("tokens_weight_threshold", 0.4);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(
|
||||||
|
eTestPruneIsMissing.getMessage(),
|
||||||
|
containsString("Failed to parse mapping: Failed to build [index_options] after last required field arrived")
|
||||||
|
);
|
||||||
|
assertThat(eTestPruneIsMissing.getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestPruneIsMissing.getCause().getCause().getMessage(),
|
||||||
|
containsString("[index_options] field [pruning_config] should only be set if [prune] is set to true")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTokensFreqRatioCorrect() {
|
||||||
|
Exception eTestInteger = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_freq_ratio_threshold", "notaninteger");
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(
|
||||||
|
eTestInteger.getMessage(),
|
||||||
|
containsString("Failed to parse mapping: [0:0] [index_options] failed to parse field [pruning_config]")
|
||||||
|
);
|
||||||
|
assertThat(eTestInteger.getCause().getCause(), instanceOf(XContentParseException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestInteger.getCause().getCause().getMessage(),
|
||||||
|
containsString("[pruning_config] failed to parse field [tokens_freq_ratio_threshold]")
|
||||||
|
);
|
||||||
|
assertThat(eTestInteger.getCause().getCause().getCause(), instanceOf(NumberFormatException.class));
|
||||||
|
assertThat(eTestInteger.getCause().getCause().getCause().getMessage(), containsString("For input string: \"notaninteger\""));
|
||||||
|
|
||||||
|
Exception eTestRangeLower = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_freq_ratio_threshold", -2);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestRangeLower.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestRangeLower.getCause().getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestRangeLower.getCause().getCause().getCause().getMessage(),
|
||||||
|
containsString("[tokens_freq_ratio_threshold] must be between [1] and [100], got -2.0")
|
||||||
|
);
|
||||||
|
|
||||||
|
Exception eTestRangeHigher = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_freq_ratio_threshold", 101);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestRangeHigher.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestRangeHigher.getCause().getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestRangeHigher.getCause().getCause().getCause().getMessage(),
|
||||||
|
containsString("[tokens_freq_ratio_threshold] must be between [1] and [100], got 101.0")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTokensWeightThresholdCorrect() {
|
||||||
|
Exception eTestDouble = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_weight_threshold", "notadouble");
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestDouble.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestDouble.getCause().getCause().getCause(), instanceOf(NumberFormatException.class));
|
||||||
|
assertThat(eTestDouble.getCause().getCause().getCause().getMessage(), containsString("For input string: \"notadouble\""));
|
||||||
|
|
||||||
|
Exception eTestRangeLower = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_weight_threshold", -0.1);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestRangeLower.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestRangeLower.getCause().getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestRangeLower.getCause().getCause().getCause().getMessage(),
|
||||||
|
containsString("[tokens_weight_threshold] must be between 0 and 1")
|
||||||
|
);
|
||||||
|
|
||||||
|
Exception eTestRangeHigher = expectThrows(MapperParsingException.class, () -> createMapperService(fieldMapping(b -> {
|
||||||
|
b.field("type", "sparse_vector");
|
||||||
|
b.startObject("index_options");
|
||||||
|
b.field("prune", true);
|
||||||
|
b.startObject("pruning_config");
|
||||||
|
b.field("tokens_weight_threshold", 1.1);
|
||||||
|
b.endObject();
|
||||||
|
b.endObject();
|
||||||
|
})));
|
||||||
|
assertThat(eTestRangeHigher.getMessage(), containsString("[index_options] failed to parse field [pruning_config]"));
|
||||||
|
assertThat(eTestRangeHigher.getCause().getCause().getCause(), instanceOf(IllegalArgumentException.class));
|
||||||
|
assertThat(
|
||||||
|
eTestRangeHigher.getCause().getCause().getCause().getMessage(),
|
||||||
|
containsString("[tokens_weight_threshold] must be between 0 and 1")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void withSearchExecutionContext(MapperService mapperService, CheckedConsumer<SearchExecutionContext, IOException> consumer)
|
||||||
|
throws IOException {
|
||||||
|
var mapper = mapperService.documentMapper();
|
||||||
|
try (Directory directory = newDirectory()) {
|
||||||
|
RandomIndexWriter iw = new RandomIndexWriter(random(), directory);
|
||||||
|
var sourceToParse = source(this::writeField);
|
||||||
|
ParsedDocument doc1 = mapper.parse(sourceToParse);
|
||||||
|
iw.addDocument(doc1.rootDoc());
|
||||||
|
iw.close();
|
||||||
|
|
||||||
|
try (DirectoryReader reader = wrapInMockESDirectoryReader(DirectoryReader.open(directory))) {
|
||||||
|
var searchContext = createSearchExecutionContext(mapperService, newSearcher(reader));
|
||||||
|
consumer.accept(searchContext);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationWithRandomOptions() throws Exception {
|
||||||
|
for (int i = 0; i < 20; i++) {
|
||||||
|
runTestTypeQueryFinalization(
|
||||||
|
randomBoolean(), // useIndexVersionBeforeIndexOptions
|
||||||
|
randomBoolean(), // useMapperDefaultIndexOptions
|
||||||
|
randomBoolean(), // setMapperIndexOptionsPruneToFalse
|
||||||
|
randomBoolean(), // queryOverridesPruningConfig
|
||||||
|
randomBoolean() // queryOverridesPruneToBeFalse
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationDefaultsCurrentVersion() throws Exception {
|
||||||
|
IndexVersion version = IndexVersion.current();
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::minimalMapping));
|
||||||
|
|
||||||
|
// query should be pruned by default on newer index versions
|
||||||
|
performTypeQueryFinalizationTest(mapperService, null, null, true);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationDefaultsPreviousVersion() throws Exception {
|
||||||
|
IndexVersion version = IndexVersionUtils.randomVersionBetween(
|
||||||
|
random(),
|
||||||
|
UPGRADE_TO_LUCENE_10_0_0,
|
||||||
|
IndexVersionUtils.getPreviousVersion(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT)
|
||||||
|
);
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::minimalMapping));
|
||||||
|
|
||||||
|
// query should _not_ be pruned by default on older index versions
|
||||||
|
performTypeQueryFinalizationTest(mapperService, null, null, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationWithIndexExplicit() throws Exception {
|
||||||
|
IndexVersion version = IndexVersion.current();
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::minimalMapping));
|
||||||
|
|
||||||
|
// query should be pruned via explicit index options
|
||||||
|
performTypeQueryFinalizationTest(mapperService, null, null, true);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationWithIndexExplicitDoNotPrune() throws Exception {
|
||||||
|
IndexVersion version = IndexVersion.current();
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::mappingWithIndexOptionsPruneFalse));
|
||||||
|
|
||||||
|
// query should be pruned via explicit index options
|
||||||
|
performTypeQueryFinalizationTest(mapperService, null, null, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationQueryOverridesPruning() throws Exception {
|
||||||
|
IndexVersion version = IndexVersion.current();
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::mappingWithIndexOptionsPruneFalse));
|
||||||
|
|
||||||
|
// query should still be pruned due to query builder setting it
|
||||||
|
performTypeQueryFinalizationTest(mapperService, true, new TokenPruningConfig(), true);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void testTypeQueryFinalizationQueryOverridesPruningOff() throws Exception {
|
||||||
|
IndexVersion version = IndexVersion.current();
|
||||||
|
MapperService mapperService = createMapperService(version, fieldMapping(this::mappingWithIndexOptionsPruneFalse));
|
||||||
|
|
||||||
|
// query should not pruned due to query builder setting it
|
||||||
|
performTypeQueryFinalizationTest(mapperService, false, null, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void performTypeQueryFinalizationTest(
|
||||||
|
MapperService mapperService,
|
||||||
|
@Nullable Boolean queryPrune,
|
||||||
|
@Nullable TokenPruningConfig queryTokenPruningConfig,
|
||||||
|
boolean queryShouldBePruned
|
||||||
|
) throws IOException {
|
||||||
|
withSearchExecutionContext(mapperService, (context) -> {
|
||||||
|
SparseVectorFieldMapper.SparseVectorFieldType ft = (SparseVectorFieldMapper.SparseVectorFieldType) mapperService.fieldType(
|
||||||
|
"field"
|
||||||
|
);
|
||||||
|
Query finalizedQuery = ft.finalizeSparseVectorQuery(context, "field", QUERY_VECTORS, queryPrune, queryTokenPruningConfig);
|
||||||
|
|
||||||
|
if (queryShouldBePruned) {
|
||||||
|
assertQueryWasPruned(finalizedQuery);
|
||||||
|
} else {
|
||||||
|
assertQueryWasNotPruned(finalizedQuery);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
private void assertQueryWasPruned(Query query) {
|
||||||
|
assertQueryHasClauseCount(query, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void assertQueryWasNotPruned(Query query) {
|
||||||
|
assertQueryHasClauseCount(query, QUERY_VECTORS.size());
|
||||||
|
}
|
||||||
|
|
||||||
|
private void assertQueryHasClauseCount(Query query, int clauseCount) {
|
||||||
|
SparseVectorQueryWrapper queryWrapper = (SparseVectorQueryWrapper) query;
|
||||||
|
var termsQuery = queryWrapper.getTermsQuery();
|
||||||
|
assertNotNull(termsQuery);
|
||||||
|
var booleanQuery = (BooleanQuery) termsQuery;
|
||||||
|
Collection<Query> clauses = booleanQuery.getClauses(BooleanClause.Occur.SHOULD);
|
||||||
|
assertThat(clauses.size(), equalTo(clauseCount));
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Runs a test of the query finalization based on various parameters
|
||||||
|
* that provides
|
||||||
|
* @param useIndexVersionBeforeIndexOptions set to true to use a previous index version before mapper index_options
|
||||||
|
* @param useMapperDefaultIndexOptions set to false to use an explicit, non-default mapper index_options
|
||||||
|
* @param setMapperIndexOptionsPruneToFalse set to true to use prune:false in the mapper index_options
|
||||||
|
* @param queryOverridesPruningConfig set to true to designate the query will provide a pruning_config
|
||||||
|
* @param queryOverridesPruneToBeFalse if true and queryOverridesPruningConfig is true, the query will provide prune:false
|
||||||
|
* @throws IOException
|
||||||
|
*/
|
||||||
|
private void runTestTypeQueryFinalization(
|
||||||
|
boolean useIndexVersionBeforeIndexOptions,
|
||||||
|
boolean useMapperDefaultIndexOptions,
|
||||||
|
boolean setMapperIndexOptionsPruneToFalse,
|
||||||
|
boolean queryOverridesPruningConfig,
|
||||||
|
boolean queryOverridesPruneToBeFalse
|
||||||
|
) throws IOException {
|
||||||
|
MapperService mapperService = getMapperServiceForTest(
|
||||||
|
useIndexVersionBeforeIndexOptions,
|
||||||
|
useMapperDefaultIndexOptions,
|
||||||
|
setMapperIndexOptionsPruneToFalse
|
||||||
|
);
|
||||||
|
|
||||||
|
// check and see if the query should explicitly override the index_options
|
||||||
|
Boolean shouldQueryPrune = queryOverridesPruningConfig ? (queryOverridesPruneToBeFalse == false) : null;
|
||||||
|
|
||||||
|
// get the pruning configuration for the query if it's overriding
|
||||||
|
TokenPruningConfig queryPruningConfig = Boolean.TRUE.equals(shouldQueryPrune) ? new TokenPruningConfig() : null;
|
||||||
|
|
||||||
|
// our logic if the results should be pruned or not
|
||||||
|
// we should _not_ prune if any of the following:
|
||||||
|
// - the query explicitly overrides the options and `prune` is set to false
|
||||||
|
// - the query does not override the pruning options and:
|
||||||
|
// - either we are using a previous index version
|
||||||
|
// - or the index_options explicitly sets `prune` to false
|
||||||
|
boolean resultShouldNotBePruned = ((queryOverridesPruningConfig && queryOverridesPruneToBeFalse)
|
||||||
|
|| (queryOverridesPruningConfig == false && (useIndexVersionBeforeIndexOptions || setMapperIndexOptionsPruneToFalse)));
|
||||||
|
|
||||||
|
try {
|
||||||
|
performTypeQueryFinalizationTest(mapperService, shouldQueryPrune, queryPruningConfig, resultShouldNotBePruned == false);
|
||||||
|
} catch (AssertionError e) {
|
||||||
|
String message = "performTypeQueryFinalizationTest failed using parameters: "
|
||||||
|
+ "useIndexVersionBeforeIndexOptions: "
|
||||||
|
+ useIndexVersionBeforeIndexOptions
|
||||||
|
+ ", useMapperDefaultIndexOptions: "
|
||||||
|
+ useMapperDefaultIndexOptions
|
||||||
|
+ ", setMapperIndexOptionsPruneToFalse: "
|
||||||
|
+ setMapperIndexOptionsPruneToFalse
|
||||||
|
+ ", queryOverridesPruningConfig: "
|
||||||
|
+ queryOverridesPruningConfig
|
||||||
|
+ ", queryOverridesPruneToBeFalse: "
|
||||||
|
+ queryOverridesPruneToBeFalse;
|
||||||
|
throw new AssertionError(message, e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private IndexVersion getIndexVersionForTest(boolean usePreviousIndex) {
|
||||||
|
return usePreviousIndex
|
||||||
|
? IndexVersionUtils.randomVersionBetween(
|
||||||
|
random(),
|
||||||
|
UPGRADE_TO_LUCENE_10_0_0,
|
||||||
|
IndexVersionUtils.getPreviousVersion(SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT)
|
||||||
|
)
|
||||||
|
: IndexVersionUtils.randomVersionBetween(random(), SPARSE_VECTOR_PRUNING_INDEX_OPTIONS_SUPPORT, IndexVersion.current());
|
||||||
|
}
|
||||||
|
|
||||||
|
private MapperService getMapperServiceForTest(
|
||||||
|
boolean usePreviousIndex,
|
||||||
|
boolean useIndexOptionsDefaults,
|
||||||
|
boolean explicitIndexOptionsDoNotPrune
|
||||||
|
) throws IOException {
|
||||||
|
// get the index version of the test to use
|
||||||
|
// either a current version that supports index options, or a previous version that does not
|
||||||
|
IndexVersion indexVersion = getIndexVersionForTest(usePreviousIndex);
|
||||||
|
|
||||||
|
// if it's using the old index, we always use the minimal mapping without index_options
|
||||||
|
if (usePreviousIndex) {
|
||||||
|
return createMapperService(indexVersion, fieldMapping(this::minimalMapping));
|
||||||
|
}
|
||||||
|
|
||||||
|
// if we set explicitIndexOptionsDoNotPrune, the index_options (if present) will explicitly include "prune: false"
|
||||||
|
if (explicitIndexOptionsDoNotPrune) {
|
||||||
|
return createMapperService(indexVersion, fieldMapping(this::mappingWithIndexOptionsPruneFalse));
|
||||||
|
}
|
||||||
|
|
||||||
|
// either return the default (minimal) mapping or one with an explicit pruning_config
|
||||||
|
return useIndexOptionsDefaults
|
||||||
|
? createMapperService(indexVersion, fieldMapping(this::minimalMapping))
|
||||||
|
: createMapperService(indexVersion, fieldMapping(this::minimalMappingWithExplicitIndexOptions));
|
||||||
|
}
|
||||||
|
|
||||||
|
private static List<WeightedToken> QUERY_VECTORS = List.of(
|
||||||
|
new WeightedToken("pugs", 0.5f),
|
||||||
|
new WeightedToken("cats", 0.4f),
|
||||||
|
new WeightedToken("is", 0.1f)
|
||||||
|
);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Handles float/double conversion when reading/writing with xcontent by converting all numbers to floats.
|
* Handles float/double conversion when reading/writing with xcontent by converting all numbers to floats.
|
||||||
*/
|
*/
|
||||||
|
|
|
@ -9,22 +9,35 @@
|
||||||
|
|
||||||
package org.elasticsearch.index.mapper.vectors;
|
package org.elasticsearch.index.mapper.vectors;
|
||||||
|
|
||||||
|
import org.elasticsearch.index.IndexVersion;
|
||||||
|
import org.elasticsearch.index.IndexVersions;
|
||||||
import org.elasticsearch.index.fielddata.FieldDataContext;
|
import org.elasticsearch.index.fielddata.FieldDataContext;
|
||||||
import org.elasticsearch.index.mapper.FieldTypeTestCase;
|
import org.elasticsearch.index.mapper.FieldTypeTestCase;
|
||||||
import org.elasticsearch.index.mapper.MappedFieldType;
|
import org.elasticsearch.index.mapper.MappedFieldType;
|
||||||
|
import org.elasticsearch.test.index.IndexVersionUtils;
|
||||||
|
|
||||||
import java.util.Collections;
|
import java.util.Collections;
|
||||||
|
|
||||||
public class SparseVectorFieldTypeTests extends FieldTypeTestCase {
|
public class SparseVectorFieldTypeTests extends FieldTypeTestCase {
|
||||||
|
|
||||||
public void testDocValuesDisabled() {
|
public void testDocValuesDisabled() {
|
||||||
MappedFieldType fieldType = new SparseVectorFieldMapper.SparseVectorFieldType("field", false, Collections.emptyMap());
|
IndexVersion indexVersion = IndexVersionUtils.randomVersionBetween(
|
||||||
|
random(),
|
||||||
|
IndexVersions.NEW_SPARSE_VECTOR,
|
||||||
|
IndexVersion.current()
|
||||||
|
);
|
||||||
|
MappedFieldType fieldType = new SparseVectorFieldMapper.SparseVectorFieldType(indexVersion, "field", false, Collections.emptyMap());
|
||||||
assertFalse(fieldType.hasDocValues());
|
assertFalse(fieldType.hasDocValues());
|
||||||
expectThrows(IllegalArgumentException.class, () -> fieldType.fielddataBuilder(FieldDataContext.noRuntimeFields("test")));
|
expectThrows(IllegalArgumentException.class, () -> fieldType.fielddataBuilder(FieldDataContext.noRuntimeFields("test")));
|
||||||
}
|
}
|
||||||
|
|
||||||
public void testIsNotAggregatable() {
|
public void testIsNotAggregatable() {
|
||||||
MappedFieldType fieldType = new SparseVectorFieldMapper.SparseVectorFieldType("field", false, Collections.emptyMap());
|
IndexVersion indexVersion = IndexVersionUtils.randomVersionBetween(
|
||||||
|
random(),
|
||||||
|
IndexVersions.NEW_SPARSE_VECTOR,
|
||||||
|
IndexVersion.current()
|
||||||
|
);
|
||||||
|
MappedFieldType fieldType = new SparseVectorFieldMapper.SparseVectorFieldType(indexVersion, "field", false, Collections.emptyMap());
|
||||||
assertFalse(fieldType.isAggregatable());
|
assertFalse(fieldType.isAggregatable());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -999,7 +999,7 @@ public abstract class AbstractQueryTestCase<QB extends AbstractQueryBuilder<QB>>
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
protected void initIndexWriter(RandomIndexWriter indexWriter) {}
|
protected void initIndexWriter(RandomIndexWriter indexWriter) throws IOException {}
|
||||||
}
|
}
|
||||||
|
|
||||||
public static class NullIndexReaderManager extends IndexReaderManager {
|
public static class NullIndexReaderManager extends IndexReaderManager {
|
||||||
|
|
|
@ -60,11 +60,14 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
|
|
||||||
private static final boolean DEFAULT_PRUNE = false;
|
private static final boolean DEFAULT_PRUNE = false;
|
||||||
|
|
||||||
|
static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19 = TransportVersions.SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19;
|
||||||
|
static final TransportVersion SPARSE_VECTOR_FIELD_PRUNING_OPTIONS = TransportVersions.SPARSE_VECTOR_FIELD_PRUNING_OPTIONS;
|
||||||
|
|
||||||
private final String fieldName;
|
private final String fieldName;
|
||||||
private final List<WeightedToken> queryVectors;
|
private final List<WeightedToken> queryVectors;
|
||||||
private final String inferenceId;
|
private final String inferenceId;
|
||||||
private final String query;
|
private final String query;
|
||||||
private final boolean shouldPruneTokens;
|
private final Boolean shouldPruneTokens;
|
||||||
|
|
||||||
private final SetOnce<TextExpansionResults> weightedTokensSupplier;
|
private final SetOnce<TextExpansionResults> weightedTokensSupplier;
|
||||||
|
|
||||||
|
@ -84,13 +87,11 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
@Nullable TokenPruningConfig tokenPruningConfig
|
@Nullable TokenPruningConfig tokenPruningConfig
|
||||||
) {
|
) {
|
||||||
this.fieldName = Objects.requireNonNull(fieldName, "[" + NAME + "] requires a [" + FIELD_FIELD.getPreferredName() + "]");
|
this.fieldName = Objects.requireNonNull(fieldName, "[" + NAME + "] requires a [" + FIELD_FIELD.getPreferredName() + "]");
|
||||||
this.shouldPruneTokens = (shouldPruneTokens != null ? shouldPruneTokens : DEFAULT_PRUNE);
|
this.shouldPruneTokens = shouldPruneTokens;
|
||||||
this.queryVectors = queryVectors;
|
this.queryVectors = queryVectors;
|
||||||
this.inferenceId = inferenceId;
|
this.inferenceId = inferenceId;
|
||||||
this.query = query;
|
this.query = query;
|
||||||
this.tokenPruningConfig = (tokenPruningConfig != null
|
this.tokenPruningConfig = tokenPruningConfig;
|
||||||
? tokenPruningConfig
|
|
||||||
: (this.shouldPruneTokens ? new TokenPruningConfig() : null));
|
|
||||||
this.weightedTokensSupplier = null;
|
this.weightedTokensSupplier = null;
|
||||||
|
|
||||||
// Preserve BWC error messaging
|
// Preserve BWC error messaging
|
||||||
|
@ -127,7 +128,14 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
public SparseVectorQueryBuilder(StreamInput in) throws IOException {
|
public SparseVectorQueryBuilder(StreamInput in) throws IOException {
|
||||||
super(in);
|
super(in);
|
||||||
this.fieldName = in.readString();
|
this.fieldName = in.readString();
|
||||||
this.shouldPruneTokens = in.readBoolean();
|
|
||||||
|
if (in.getTransportVersion().isPatchFrom(SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19)
|
||||||
|
|| in.getTransportVersion().onOrAfter(SPARSE_VECTOR_FIELD_PRUNING_OPTIONS)) {
|
||||||
|
this.shouldPruneTokens = in.readOptionalBoolean();
|
||||||
|
} else {
|
||||||
|
this.shouldPruneTokens = in.readBoolean();
|
||||||
|
}
|
||||||
|
|
||||||
this.queryVectors = in.readOptionalCollectionAsList(WeightedToken::new);
|
this.queryVectors = in.readOptionalCollectionAsList(WeightedToken::new);
|
||||||
this.inferenceId = in.readOptionalString();
|
this.inferenceId = in.readOptionalString();
|
||||||
this.query = in.readOptionalString();
|
this.query = in.readOptionalString();
|
||||||
|
@ -161,7 +169,7 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
return query;
|
return query;
|
||||||
}
|
}
|
||||||
|
|
||||||
public boolean shouldPruneTokens() {
|
public Boolean shouldPruneTokens() {
|
||||||
return shouldPruneTokens;
|
return shouldPruneTokens;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -176,7 +184,14 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
}
|
}
|
||||||
|
|
||||||
out.writeString(fieldName);
|
out.writeString(fieldName);
|
||||||
out.writeBoolean(shouldPruneTokens);
|
|
||||||
|
if (out.getTransportVersion().isPatchFrom(SPARSE_VECTOR_FIELD_PRUNING_OPTIONS_8_19)
|
||||||
|
|| out.getTransportVersion().onOrAfter(SPARSE_VECTOR_FIELD_PRUNING_OPTIONS)) {
|
||||||
|
out.writeOptionalBoolean(shouldPruneTokens);
|
||||||
|
} else {
|
||||||
|
out.writeBoolean(shouldPruneTokens);
|
||||||
|
}
|
||||||
|
|
||||||
out.writeOptionalCollection(queryVectors);
|
out.writeOptionalCollection(queryVectors);
|
||||||
out.writeOptionalString(inferenceId);
|
out.writeOptionalString(inferenceId);
|
||||||
out.writeOptionalString(query);
|
out.writeOptionalString(query);
|
||||||
|
@ -199,7 +214,9 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
}
|
}
|
||||||
builder.field(QUERY_FIELD.getPreferredName(), query);
|
builder.field(QUERY_FIELD.getPreferredName(), query);
|
||||||
}
|
}
|
||||||
builder.field(PRUNE_FIELD.getPreferredName(), shouldPruneTokens);
|
if (shouldPruneTokens != null) {
|
||||||
|
builder.field(PRUNE_FIELD.getPreferredName(), shouldPruneTokens);
|
||||||
|
}
|
||||||
if (tokenPruningConfig != null) {
|
if (tokenPruningConfig != null) {
|
||||||
builder.field(PRUNING_CONFIG_FIELD.getPreferredName(), tokenPruningConfig);
|
builder.field(PRUNING_CONFIG_FIELD.getPreferredName(), tokenPruningConfig);
|
||||||
}
|
}
|
||||||
|
@ -231,7 +248,9 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) {
|
protected QueryBuilder doRewrite(QueryRewriteContext queryRewriteContext) {
|
||||||
if (queryVectors != null) {
|
if (queryVectors != null) {
|
||||||
return this;
|
return this;
|
||||||
} else if (weightedTokensSupplier != null) {
|
}
|
||||||
|
|
||||||
|
if (weightedTokensSupplier != null) {
|
||||||
TextExpansionResults textExpansionResults = weightedTokensSupplier.get();
|
TextExpansionResults textExpansionResults = weightedTokensSupplier.get();
|
||||||
if (textExpansionResults == null) {
|
if (textExpansionResults == null) {
|
||||||
return this; // No results yet
|
return this; // No results yet
|
||||||
|
@ -245,7 +264,9 @@ public class SparseVectorQueryBuilder extends AbstractQueryBuilder<SparseVectorQ
|
||||||
shouldPruneTokens,
|
shouldPruneTokens,
|
||||||
tokenPruningConfig
|
tokenPruningConfig
|
||||||
);
|
);
|
||||||
} else if (inferenceId == null) {
|
}
|
||||||
|
|
||||||
|
if (inferenceId == null) {
|
||||||
// Edge case, where inference_id was not specified in the request,
|
// Edge case, where inference_id was not specified in the request,
|
||||||
// but we did not intercept this and rewrite to a query o field with
|
// but we did not intercept this and rewrite to a query o field with
|
||||||
// pre-configured inference. So we trap here and output a nicer error message.
|
// pre-configured inference. So we trap here and output a nicer error message.
|
||||||
|
|
|
@ -1144,7 +1144,9 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie
|
||||||
boolean useLegacyFormat
|
boolean useLegacyFormat
|
||||||
) {
|
) {
|
||||||
return switch (modelSettings.taskType()) {
|
return switch (modelSettings.taskType()) {
|
||||||
case SPARSE_EMBEDDING -> new SparseVectorFieldMapper.Builder(CHUNKED_EMBEDDINGS_FIELD).setStored(useLegacyFormat == false);
|
case SPARSE_EMBEDDING -> new SparseVectorFieldMapper.Builder(CHUNKED_EMBEDDINGS_FIELD, indexVersionCreated).setStored(
|
||||||
|
useLegacyFormat == false
|
||||||
|
);
|
||||||
case TEXT_EMBEDDING -> {
|
case TEXT_EMBEDDING -> {
|
||||||
DenseVectorFieldMapper.Builder denseVectorMapperBuilder = new DenseVectorFieldMapper.Builder(
|
DenseVectorFieldMapper.Builder denseVectorMapperBuilder = new DenseVectorFieldMapper.Builder(
|
||||||
CHUNKED_EMBEDDINGS_FIELD,
|
CHUNKED_EMBEDDINGS_FIELD,
|
||||||
|
|
|
@ -140,7 +140,7 @@ public class SemanticTextHighlighterTests extends MapperServiceTestCase {
|
||||||
tokens,
|
tokens,
|
||||||
null,
|
null,
|
||||||
null,
|
null,
|
||||||
null,
|
false,
|
||||||
null
|
null
|
||||||
);
|
);
|
||||||
NestedQueryBuilder nestedQueryBuilder = new NestedQueryBuilder(fieldType.getChunksField().fullPath(), sparseQuery, ScoreMode.Max);
|
NestedQueryBuilder nestedQueryBuilder = new NestedQueryBuilder(fieldType.getChunksField().fullPath(), sparseQuery, ScoreMode.Max);
|
||||||
|
@ -183,7 +183,7 @@ public class SemanticTextHighlighterTests extends MapperServiceTestCase {
|
||||||
tokens,
|
tokens,
|
||||||
null,
|
null,
|
||||||
null,
|
null,
|
||||||
null,
|
false,
|
||||||
null
|
null
|
||||||
);
|
);
|
||||||
var query = new BoolQueryBuilder().should(sparseQuery).should(new MatchAllQueryBuilder());
|
var query = new BoolQueryBuilder().should(sparseQuery).should(new MatchAllQueryBuilder());
|
||||||
|
|
|
@ -9,14 +9,17 @@ package org.elasticsearch.xpack.inference.queries;
|
||||||
|
|
||||||
import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
|
import com.carrotsearch.randomizedtesting.annotations.ParametersFactory;
|
||||||
|
|
||||||
|
import org.apache.lucene.document.Document;
|
||||||
|
import org.apache.lucene.document.Field;
|
||||||
|
import org.apache.lucene.document.TextField;
|
||||||
import org.apache.lucene.search.BooleanClause;
|
import org.apache.lucene.search.BooleanClause;
|
||||||
import org.apache.lucene.search.BooleanQuery;
|
import org.apache.lucene.search.BooleanQuery;
|
||||||
import org.apache.lucene.search.BoostQuery;
|
|
||||||
import org.apache.lucene.search.KnnByteVectorQuery;
|
import org.apache.lucene.search.KnnByteVectorQuery;
|
||||||
import org.apache.lucene.search.KnnFloatVectorQuery;
|
import org.apache.lucene.search.KnnFloatVectorQuery;
|
||||||
import org.apache.lucene.search.MatchNoDocsQuery;
|
import org.apache.lucene.search.MatchNoDocsQuery;
|
||||||
import org.apache.lucene.search.Query;
|
import org.apache.lucene.search.Query;
|
||||||
import org.apache.lucene.search.join.ScoreMode;
|
import org.apache.lucene.search.join.ScoreMode;
|
||||||
|
import org.apache.lucene.tests.index.RandomIndexWriter;
|
||||||
import org.elasticsearch.action.ActionListener;
|
import org.elasticsearch.action.ActionListener;
|
||||||
import org.elasticsearch.action.ActionRequest;
|
import org.elasticsearch.action.ActionRequest;
|
||||||
import org.elasticsearch.action.ActionType;
|
import org.elasticsearch.action.ActionType;
|
||||||
|
@ -30,6 +33,7 @@ import org.elasticsearch.common.compress.CompressedXContent;
|
||||||
import org.elasticsearch.common.io.stream.NamedWriteableRegistry;
|
import org.elasticsearch.common.io.stream.NamedWriteableRegistry;
|
||||||
import org.elasticsearch.common.settings.Settings;
|
import org.elasticsearch.common.settings.Settings;
|
||||||
import org.elasticsearch.core.IOUtils;
|
import org.elasticsearch.core.IOUtils;
|
||||||
|
import org.elasticsearch.core.Nullable;
|
||||||
import org.elasticsearch.index.IndexVersion;
|
import org.elasticsearch.index.IndexVersion;
|
||||||
import org.elasticsearch.index.mapper.InferenceMetadataFieldsMapper;
|
import org.elasticsearch.index.mapper.InferenceMetadataFieldsMapper;
|
||||||
import org.elasticsearch.index.mapper.MapperService;
|
import org.elasticsearch.index.mapper.MapperService;
|
||||||
|
@ -81,7 +85,6 @@ import java.util.function.Supplier;
|
||||||
|
|
||||||
import static org.apache.lucene.search.BooleanClause.Occur.FILTER;
|
import static org.apache.lucene.search.BooleanClause.Occur.FILTER;
|
||||||
import static org.apache.lucene.search.BooleanClause.Occur.MUST;
|
import static org.apache.lucene.search.BooleanClause.Occur.MUST;
|
||||||
import static org.apache.lucene.search.BooleanClause.Occur.SHOULD;
|
|
||||||
import static org.elasticsearch.xpack.core.ml.inference.trainedmodel.InferenceConfig.DEFAULT_RESULTS_FIELD;
|
import static org.elasticsearch.xpack.core.ml.inference.trainedmodel.InferenceConfig.DEFAULT_RESULTS_FIELD;
|
||||||
import static org.hamcrest.Matchers.equalTo;
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
import static org.hamcrest.Matchers.instanceOf;
|
import static org.hamcrest.Matchers.instanceOf;
|
||||||
|
@ -180,6 +183,22 @@ public class SemanticQueryBuilderTests extends AbstractQueryTestCase<SemanticQue
|
||||||
applyRandomInferenceResults(mapperService);
|
applyRandomInferenceResults(mapperService);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected IndexReaderManager getIndexReaderManager() {
|
||||||
|
// note that because token pruning for sparse vector types are on by default now
|
||||||
|
// we have to have at least one document with the `semantic.inference.chunks.embeddings`
|
||||||
|
// populated or else the weightedTokenUtils will return a MatchNoDocsQuery instead of the
|
||||||
|
// expected BooleanQuery.
|
||||||
|
return new IndexReaderManager() {
|
||||||
|
@Override
|
||||||
|
protected void initIndexWriter(RandomIndexWriter indexWriter) throws IOException {
|
||||||
|
Document document = new Document();
|
||||||
|
document.add(new TextField("semantic.inference.chunks.embeddings", "a b x y", Field.Store.NO));
|
||||||
|
indexWriter.addDocument(document);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
private void applyRandomInferenceResults(MapperService mapperService) throws IOException {
|
private void applyRandomInferenceResults(MapperService mapperService) throws IOException {
|
||||||
// Parse random inference results (or no inference results) to set up the dynamic inference result mappings under the semantic text
|
// Parse random inference results (or no inference results) to set up the dynamic inference result mappings under the semantic text
|
||||||
// field
|
// field
|
||||||
|
@ -240,12 +259,8 @@ public class SemanticQueryBuilderTests extends AbstractQueryTestCase<SemanticQue
|
||||||
assertThat(sparseQuery.getTermsQuery(), instanceOf(BooleanQuery.class));
|
assertThat(sparseQuery.getTermsQuery(), instanceOf(BooleanQuery.class));
|
||||||
|
|
||||||
BooleanQuery innerBooleanQuery = (BooleanQuery) sparseQuery.getTermsQuery();
|
BooleanQuery innerBooleanQuery = (BooleanQuery) sparseQuery.getTermsQuery();
|
||||||
assertThat(innerBooleanQuery.clauses().size(), equalTo(queryTokenCount));
|
// no clauses as tokens would be pruned
|
||||||
innerBooleanQuery.forEach(c -> {
|
assertThat(innerBooleanQuery.clauses().size(), equalTo(0));
|
||||||
assertThat(c.occur(), equalTo(SHOULD));
|
|
||||||
assertThat(c.query(), instanceOf(BoostQuery.class));
|
|
||||||
assertThat(((BoostQuery) c.query()).getBoost(), equalTo(TOKEN_WEIGHT));
|
|
||||||
});
|
|
||||||
}
|
}
|
||||||
|
|
||||||
private void assertTextEmbeddingLuceneQuery(Query query) {
|
private void assertTextEmbeddingLuceneQuery(Query query) {
|
||||||
|
@ -376,18 +391,7 @@ public class SemanticQueryBuilderTests extends AbstractQueryTestCase<SemanticQue
|
||||||
DenseVectorFieldMapper.ElementType denseVectorElementType,
|
DenseVectorFieldMapper.ElementType denseVectorElementType,
|
||||||
boolean useLegacyFormat
|
boolean useLegacyFormat
|
||||||
) throws IOException {
|
) throws IOException {
|
||||||
var modelSettings = switch (inferenceResultType) {
|
var modelSettings = getModelSettingsForInferenceResultType(inferenceResultType, denseVectorElementType);
|
||||||
case NONE -> null;
|
|
||||||
case SPARSE_EMBEDDING -> new MinimalServiceSettings("my-service", TaskType.SPARSE_EMBEDDING, null, null, null);
|
|
||||||
case TEXT_EMBEDDING -> new MinimalServiceSettings(
|
|
||||||
"my-service",
|
|
||||||
TaskType.TEXT_EMBEDDING,
|
|
||||||
TEXT_EMBEDDING_DIMENSION_COUNT,
|
|
||||||
// l2_norm similarity is required for bit embeddings
|
|
||||||
denseVectorElementType == DenseVectorFieldMapper.ElementType.BIT ? SimilarityMeasure.L2_NORM : SimilarityMeasure.COSINE,
|
|
||||||
denseVectorElementType
|
|
||||||
);
|
|
||||||
};
|
|
||||||
|
|
||||||
SourceToParse sourceToParse = null;
|
SourceToParse sourceToParse = null;
|
||||||
if (modelSettings != null) {
|
if (modelSettings != null) {
|
||||||
|
@ -414,6 +418,24 @@ public class SemanticQueryBuilderTests extends AbstractQueryTestCase<SemanticQue
|
||||||
return sourceToParse;
|
return sourceToParse;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private static MinimalServiceSettings getModelSettingsForInferenceResultType(
|
||||||
|
InferenceResultType inferenceResultType,
|
||||||
|
@Nullable DenseVectorFieldMapper.ElementType denseVectorElementType
|
||||||
|
) {
|
||||||
|
return switch (inferenceResultType) {
|
||||||
|
case NONE -> null;
|
||||||
|
case SPARSE_EMBEDDING -> new MinimalServiceSettings("my-service", TaskType.SPARSE_EMBEDDING, null, null, null);
|
||||||
|
case TEXT_EMBEDDING -> new MinimalServiceSettings(
|
||||||
|
"my-service",
|
||||||
|
TaskType.TEXT_EMBEDDING,
|
||||||
|
TEXT_EMBEDDING_DIMENSION_COUNT,
|
||||||
|
// l2_norm similarity is required for bit embeddings
|
||||||
|
denseVectorElementType == DenseVectorFieldMapper.ElementType.BIT ? SimilarityMeasure.L2_NORM : SimilarityMeasure.COSINE,
|
||||||
|
denseVectorElementType
|
||||||
|
);
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
public static class FakeMlPlugin extends Plugin {
|
public static class FakeMlPlugin extends Plugin {
|
||||||
@Override
|
@Override
|
||||||
public List<NamedWriteableRegistry.Entry> getNamedWriteables() {
|
public List<NamedWriteableRegistry.Entry> getNamedWriteables() {
|
||||||
|
|
|
@ -113,6 +113,20 @@ teardown:
|
||||||
model_id: "text_expansion_model"
|
model_id: "text_expansion_model"
|
||||||
ignore: 404
|
ignore: 404
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.delete:
|
||||||
|
index: ["sparse_vector_pruning_test", "test-sparse-vector-without-pruning", "test-sparse-vector-with-pruning"]
|
||||||
|
ignore: 404
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.refresh: { }
|
||||||
|
|
||||||
---
|
---
|
||||||
"Test sparse_vector search":
|
"Test sparse_vector search":
|
||||||
- do:
|
- do:
|
||||||
|
@ -184,3 +198,459 @@ teardown:
|
||||||
|
|
||||||
- match: { hits.total.value: 5 }
|
- match: { hits.total.value: 5 }
|
||||||
- match: { hits.hits.0._source.source_text: "the octopus comforter smells" }
|
- match: { hits.hits.0._source.source_text: "the octopus comforter smells" }
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 1.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings defaults":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
include_defaults: true
|
||||||
|
|
||||||
|
# the index_options with pruning defaults will be serialized here explicitly
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 5.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune missing do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune false do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens freq out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_freq_ratio_threshold\] must be between \[1\] and \[100\]/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 101.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens weight out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_weight_threshold\] must be between 0 and 1/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 5.0
|
||||||
|
tokens_weight_threshold: 3.5
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options in query":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector should prune by default":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
settings:
|
||||||
|
number_of_shards: 1
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.14600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
{"index": { "_id": "4" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "5" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "6" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "7" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "8" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "9" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "10" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "11" }}
|
||||||
|
{"content_embedding":{"is": 0.6, "pugs": 0.6 }}
|
||||||
|
{"index": { "_id": "12" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394, "pugs": 0.1 }}
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "11" }
|
||||||
|
- match: { hits.hits.1._id: "12" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 12 }
|
||||||
|
|
|
@ -112,6 +112,20 @@ teardown:
|
||||||
model_id: "text_expansion_model"
|
model_id: "text_expansion_model"
|
||||||
ignore: 404
|
ignore: 404
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.delete:
|
||||||
|
index: ["sparse_vector_pruning_test", "test-sparse-vector-without-pruning", "test-sparse-vector-with-pruning"]
|
||||||
|
ignore: 404
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.refresh: { }
|
||||||
|
|
||||||
---
|
---
|
||||||
"Test sparse_vector search":
|
"Test sparse_vector search":
|
||||||
- do:
|
- do:
|
||||||
|
@ -183,3 +197,459 @@ teardown:
|
||||||
|
|
||||||
- match: { hits.total.value: 5 }
|
- match: { hits.total.value: 5 }
|
||||||
- match: { hits.hits.0._source.source_text: "the octopus comforter smells" }
|
- match: { hits.hits.0._source.source_text: "the octopus comforter smells" }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 1.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings defaults":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
include_defaults: true
|
||||||
|
|
||||||
|
# the index_options with pruning defaults will be serialized here explicitly
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 5.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune missing do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune false do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens freq out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_freq_ratio_threshold\] must be between \[1\] and \[100\]/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 101.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens weight out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_weight_threshold\] must be between 0 and 1/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 5.0
|
||||||
|
tokens_weight_threshold: 3.5
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options in query":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector should prune by default":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
settings:
|
||||||
|
number_of_shards: 1
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic dGVzdF91c2VyOngtcGFjay10ZXN0LXBhc3N3b3Jk" #test_user credentials
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.14600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
{"index": { "_id": "4" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "5" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "6" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "7" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "8" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "9" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "10" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "11" }}
|
||||||
|
{"content_embedding":{"is": 0.6, "pugs": 0.6 }}
|
||||||
|
{"index": { "_id": "12" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394, "pugs": 0.1 }}
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "11" }
|
||||||
|
- match: { hits.hits.1._id: "12" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 12 }
|
||||||
|
|
||||||
|
|
|
@ -89,6 +89,24 @@ setup:
|
||||||
model_id: text_expansion_model
|
model_id: text_expansion_model
|
||||||
wait_for: started
|
wait_for: started
|
||||||
|
|
||||||
|
---
|
||||||
|
teardown:
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic eF9wYWNrX3Jlc3RfdXNlcjp4LXBhY2stdGVzdC1wYXNzd29yZA==" # run as x_pack_rest_user, i.e. the test setup superuser
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.delete:
|
||||||
|
index: ["sparse_vector_pruning_test", "test-sparse-vector-without-pruning", "test-sparse-vector-with-pruning"]
|
||||||
|
ignore: 404
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic eF9wYWNrX3Jlc3RfdXNlcjp4LXBhY2stdGVzdC1wYXNzd29yZA==" # run as x_pack_rest_user, i.e. the test setup superuser
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.refresh: { }
|
||||||
|
|
||||||
---
|
---
|
||||||
"Test sparse_vector search":
|
"Test sparse_vector search":
|
||||||
- do:
|
- do:
|
||||||
|
@ -510,3 +528,451 @@ setup:
|
||||||
- match: { hits.hits.0._score: 4.0 }
|
- match: { hits.hits.0._score: 4.0 }
|
||||||
- match: { hits.hits.1._id: "parent-foo-bar" }
|
- match: { hits.hits.1._id: "parent-foo-bar" }
|
||||||
- match: { hits.hits.1._score: 2.0 }
|
- match: { hits.hits.1._score: 2.0 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 1.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.properties.ml.properties.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options mappings defaults":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
include_defaults: true
|
||||||
|
|
||||||
|
# the index_options with pruning defaults will be serialized here explicitly
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune: true }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold: 5.0 }
|
||||||
|
- match: { sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold: 0.4 }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.get_field_mapping:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
fields: ml.tokens
|
||||||
|
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.prune
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_freq_ratio_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options.pruning_config.tokens_weight_threshold
|
||||||
|
- not_exists: sparse_vector_pruning_test.mappings.ml\.tokens.mapping.tokens.index_options
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune missing do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Authorization: "Basic eF9wYWNrX3Jlc3RfdXNlcjp4LXBhY2stdGVzdC1wYXNzd29yZA==" # run as x_pack_rest_user, i.e. the test setup superuser
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options prune false do not allow config":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[index_options\] field \[pruning_config\] should only be set if \[prune\] is set to true/
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens freq out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_freq_ratio_threshold\] must be between \[1\] and \[100\]/
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 101.0
|
||||||
|
tokens_weight_threshold: 0.4
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options tokens weight out of bounds":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
catch: /\[tokens_weight_threshold\] must be between 0 and 1/
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: sparse_vector_pruning_test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
text:
|
||||||
|
type: text
|
||||||
|
ml.tokens:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 5.0
|
||||||
|
tokens_weight_threshold: 3.5
|
||||||
|
|
||||||
|
- match: { status: 400 }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector token pruning index_options in query":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
settings:
|
||||||
|
number_of_shards: 1
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
index_options:
|
||||||
|
prune: false
|
||||||
|
settings:
|
||||||
|
number_of_shards: 1
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.54600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.6891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-without-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: true
|
||||||
|
pruning_config:
|
||||||
|
tokens_freq_ratio_threshold: 1
|
||||||
|
tokens_weight_threshold: 1.0
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-with-pruning
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
cheese: 0.5
|
||||||
|
comet: 0.5
|
||||||
|
globe: 0.484035
|
||||||
|
ocean: 0.080102935
|
||||||
|
underground: 0.053516876
|
||||||
|
is: 0.54600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 3 }
|
||||||
|
- match: { hits.hits.0._id: "1" }
|
||||||
|
- match: { hits.hits.1._id: "3" }
|
||||||
|
- match: { hits.hits.2._id: "2" }
|
||||||
|
|
||||||
|
---
|
||||||
|
"Check sparse_vector should prune by default":
|
||||||
|
|
||||||
|
- requires:
|
||||||
|
cluster_features: 'sparse_vector.index_options_supported'
|
||||||
|
reason: "sparse_vector token pruning index options added support in 8.19"
|
||||||
|
- skip:
|
||||||
|
features: headers
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
indices.create:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
settings:
|
||||||
|
number_of_shards: 1
|
||||||
|
mappings:
|
||||||
|
properties:
|
||||||
|
content_embedding:
|
||||||
|
type: sparse_vector
|
||||||
|
|
||||||
|
- match: { acknowledged: true }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
headers:
|
||||||
|
Content-Type: application/json
|
||||||
|
bulk:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
refresh: true
|
||||||
|
body: |
|
||||||
|
{"index": { "_id": "1" }}
|
||||||
|
{"content_embedding":{"cheese": 2.671405,"is": 0.11809908,"comet": 0.26088917}}
|
||||||
|
{"index": { "_id": "2" }}
|
||||||
|
{"content_embedding":{"planet": 2.3438394,"is": 0.14600334,"astronomy": 0.36015007,"moon": 0.20022368}}
|
||||||
|
{"index": { "_id": "3" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394,"globe": 0.484035,"ocean": 0.080102935,"underground": 0.053516876}}
|
||||||
|
{"index": { "_id": "4" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "5" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "6" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "7" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "8" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "9" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "10" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394}}
|
||||||
|
{"index": { "_id": "11" }}
|
||||||
|
{"content_embedding":{"is": 0.6, "pugs": 0.6 }}
|
||||||
|
{"index": { "_id": "12" }}
|
||||||
|
{"content_embedding":{"is": 0.1891394, "pugs": 0.1 }}
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
|
||||||
|
- match: { hits.total.value: 2 }
|
||||||
|
- match: { hits.hits.0._id: "11" }
|
||||||
|
- match: { hits.hits.1._id: "12" }
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test-sparse-vector-pruning-default
|
||||||
|
body:
|
||||||
|
query:
|
||||||
|
sparse_vector:
|
||||||
|
field: content_embedding
|
||||||
|
query_vector:
|
||||||
|
pugs: 0.5
|
||||||
|
cats: 0.5
|
||||||
|
is: 0.04600334
|
||||||
|
prune: false
|
||||||
|
|
||||||
|
- match: { hits.total.value: 12 }
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue