Implement synthetic source support for annotated text field (#107735)

This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible.

Contributes to #106460, #78744.
This commit is contained in:
Oleksandr Kolomiiets 2024-04-25 10:31:27 -07:00 committed by GitHub
parent 4ef8b3825e
commit e1d902d33b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
16 changed files with 824 additions and 300 deletions

View file

@ -0,0 +1,5 @@
pr: 107735
summary: Implement synthetic source support for annotated text field
area: Mapping
type: feature
issues: []

View file

@ -6,7 +6,7 @@ experimental[]
The mapper-annotated-text plugin provides the ability to index text that is a
combination of free-text and special markup that is typically used to identify
items of interest such as people or organisations (see NER or Named Entity Recognition
tools).
tools).
The elasticsearch markup allows one or more additional tokens to be injected, unchanged, into the token
@ -18,7 +18,7 @@ include::install_remove.asciidoc[]
[[mapper-annotated-text-usage]]
==== Using the `annotated-text` field
The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see
The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see
"limitations" below) but also injects any marked-up annotation tokens directly into
the search index:
@ -49,7 +49,7 @@ in the search index:
--------------------------
GET my-index-000001/_analyze
{
"field": "my_field",
"field": "my_field",
"text":"Investors in [Apple](Apple+Inc.) rejoiced."
}
--------------------------
@ -76,7 +76,7 @@ Response:
"position": 1
},
{
"token": "Apple Inc.", <1>
"token": "Apple Inc.", <1>
"start_offset": 13,
"end_offset": 18,
"type": "annotation",
@ -106,7 +106,7 @@ the token stream and at the same position (position 2) as the text token (`apple
We can now perform searches for annotations using regular `term` queries that don't tokenize
the provided search values. Annotations are a more precise way of matching as can be seen
the provided search values. Annotations are a more precise way of matching as can be seen
in this example where a search for `Beck` will not match `Jeff Beck` :
[source,console]
@ -133,18 +133,119 @@ GET my-index-000001/_search
}
--------------------------
<1> As well as tokenising the plain text into single words e.g. `beck`, here we
<1> As well as tokenising the plain text into single words e.g. `beck`, here we
inject the single token value `Beck` at the same position as `beck` in the token stream.
<2> Note annotations can inject multiple tokens at the same position - here we inject both
the very specific value `Jeff Beck` and the broader term `Guitarist`. This enables
broader positional queries e.g. finding mentions of a `Guitarist` near to `strat`.
<3> A benefit of searching with these carefully defined annotation tokens is that a query for
<3> A benefit of searching with these carefully defined annotation tokens is that a query for
`Beck` will not match document 2 that contains the tokens `jeff`, `beck` and `Jeff Beck`
WARNING: Any use of `=` signs in annotation values eg `[Prince](person=Prince)` will
WARNING: Any use of `=` signs in annotation values eg `[Prince](person=Prince)` will
cause the document to be rejected with a parse failure. In future we hope to have a use for
the equals signs so wil actively reject documents that contain this today.
[[annotated-text-synthetic-source]]
===== Synthetic `_source`
IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
(indices that have `index.mode` set to `time_series`). For other indices
synthetic `_source` is in technical preview. Features in technical preview may
be changed or removed in a future release. Elastic will work to fix
any issues, but features in technical preview are not subject to the support SLA
of official GA features.
`annotated_text` fields support {ref}/mapping-source-field.html#synthetic-source[synthetic `_source`] if they have
a {ref}/keyword.html#keyword-synthetic-source[`keyword`] sub-field that supports synthetic
`_source` or if the `text` field sets `store` to `true`. Either way, it may
not have {ref}/copy-to.html[`copy_to`].
If using a sub-`keyword` field then the values are sorted in the same way as
a `keyword` field's values are sorted. By default, that means sorted with
duplicates removed. So:
[source,console,id=synthetic-source-text-example-default]
----
PUT idx
{
"mappings": {
"_source": { "mode": "synthetic" },
"properties": {
"text": {
"type": "annotated_text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
PUT idx/_doc/1
{
"text": [
"the quick brown fox",
"the quick brown fox",
"jumped over the lazy dog"
]
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
Will become:
[source,console-result]
----
{
"text": [
"jumped over the lazy dog",
"the quick brown fox"
]
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]
NOTE: Reordering text fields can have an effect on {ref}/query-dsl-match-query-phrase.html[phrase]
and {ref}/span-queries.html[span] queries. See the discussion about {ref}/position-increment-gap.html[`position_increment_gap`] for more detail. You
can avoid this by making sure the `slop` parameter on the phrase queries
is lower than the `position_increment_gap`. This is the default.
If the `annotated_text` field sets `store` to true then order and duplicates
are preserved.
[source,console,id=synthetic-source-text-example-stored]
----
PUT idx
{
"mappings": {
"_source": { "mode": "synthetic" },
"properties": {
"text": { "type": "annotated_text", "store": true }
}
}
}
PUT idx/_doc/1
{
"text": [
"the quick brown fox",
"the quick brown fox",
"jumped over the lazy dog"
]
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
Will become:
[source,console-result]
----
{
"text": [
"the quick brown fox",
"the quick brown fox",
"jumped over the lazy dog"
]
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]
[[mapper-annotated-text-tips]]
==== Data modelling tips
@ -153,13 +254,13 @@ the equals signs so wil actively reject documents that contain this today.
Annotations are normally a way of weaving structured information into unstructured text for
higher-precision search.
`Entity resolution` is a form of document enrichment undertaken by specialist software or people
`Entity resolution` is a form of document enrichment undertaken by specialist software or people
where references to entities in a document are disambiguated by attaching a canonical ID.
The ID is used to resolve any number of aliases or distinguish between people with the
same name. The hyperlinks connecting Wikipedia's articles are a good example of resolved
entity IDs woven into text.
same name. The hyperlinks connecting Wikipedia's articles are a good example of resolved
entity IDs woven into text.
These IDs can be embedded as annotations in an annotated_text field but it often makes
These IDs can be embedded as annotations in an annotated_text field but it often makes
sense to include them in dedicated structured fields to support discovery via aggregations:
[source,console]
@ -214,20 +315,20 @@ GET my-index-000001/_search
--------------------------
<1> Note the `my_twitter_handles` contains a list of the annotation values
also used in the unstructured text. (Note the annotated_text syntax requires escaping).
By repeating the annotation values in a structured field this application has ensured that
the tokens discovered in the structured field can be used for search and highlighting
in the unstructured field.
also used in the unstructured text. (Note the annotated_text syntax requires escaping).
By repeating the annotation values in a structured field this application has ensured that
the tokens discovered in the structured field can be used for search and highlighting
in the unstructured field.
<2> In this example we search for documents that talk about components of the elastic stack
<3> We use the `my_twitter_handles` field here to discover people who are significantly
associated with the elastic stack.
===== Avoiding over-matching annotations
By design, the regular text tokens and the annotation tokens co-exist in the same indexed
By design, the regular text tokens and the annotation tokens co-exist in the same indexed
field but in rare cases this can lead to some over-matching.
The value of an annotation often denotes a _named entity_ (a person, place or company).
The tokens for these named entities are inserted untokenized, and differ from typical text
The tokens for these named entities are inserted untokenized, and differ from typical text
tokens because they are normally:
* Mixed case e.g. `Madonna`
@ -235,19 +336,19 @@ tokens because they are normally:
* Can have punctuation or numbers e.g. `Apple Inc.` or `@kimchy`
This means, for the most part, a search for a named entity in the annotated text field will
not have any false positives e.g. when selecting `Apple Inc.` from an aggregation result
you can drill down to highlight uses in the text without "over matching" on any text tokens
not have any false positives e.g. when selecting `Apple Inc.` from an aggregation result
you can drill down to highlight uses in the text without "over matching" on any text tokens
like the word `apple` in this context:
the apple was very juicy
However, a problem arises if your named entity happens to be a single term and lower-case e.g. the
However, a problem arises if your named entity happens to be a single term and lower-case e.g. the
company `elastic`. In this case, a search on the annotated text field for the token `elastic`
may match a text document such as this:
they fired an elastic band
To avoid such false matches users should consider prefixing annotation values to ensure
To avoid such false matches users should consider prefixing annotation values to ensure
they don't name clash with text tokens e.g.
[elastic](Company_elastic) released version 7.0 of the elastic stack today
@ -273,7 +374,7 @@ GET my-index-000001/_search
{
"query": {
"query_string": {
"query": "cats"
"query": "cats"
}
},
"highlight": {
@ -291,21 +392,21 @@ GET my-index-000001/_search
The annotated highlighter is based on the `unified` highlighter and supports the same
settings but does not use the `pre_tags` or `post_tags` parameters. Rather than using
html-like markup such as `<em>cat</em>` the annotated highlighter uses the same
html-like markup such as `<em>cat</em>` the annotated highlighter uses the same
markdown-like syntax used for annotations and injects a key=value annotation where `_hit_term`
is the key and the matched search term is the value e.g.
is the key and the matched search term is the value e.g.
The [cat](_hit_term=cat) sat on the [mat](sku3578)
The annotated highlighter tries to be respectful of any existing markup in the original
The annotated highlighter tries to be respectful of any existing markup in the original
text:
* If the search term matches exactly the location of an existing annotation then the
* If the search term matches exactly the location of an existing annotation then the
`_hit_term` key is merged into the url-like syntax used in the `(...)` part of the
existing annotation.
existing annotation.
* However, if the search term overlaps the span of an existing annotation it would break
the markup formatting so the original annotation is removed in favour of a new annotation
with just the search hit information in the results.
with just the search hit information in the results.
* Any non-overlapping annotations in the original text are preserved in highlighter
selections

View file

@ -41,6 +41,7 @@ There are a couple of restrictions to be aware of:
types:
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
** <<binary-synthetic-source,`binary`>>
** <<boolean-synthetic-source,`boolean`>>
** <<numeric-synthetic-source,`byte`>>

View file

@ -0,0 +1,19 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
module org.elasticsearch.index.mapper.annotatedtext {
requires org.elasticsearch.base;
requires org.elasticsearch.server;
requires org.elasticsearch.xcontent;
requires org.apache.lucene.core;
requires org.apache.lucene.highlighter;
// exports nothing
provides org.elasticsearch.features.FeatureSpecification with org.elasticsearch.index.mapper.annotatedtext.Features;
}

View file

@ -21,17 +21,22 @@ import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.IndexOptions;
import org.elasticsearch.ElasticsearchParseException;
import org.elasticsearch.features.NodeFeature;
import org.elasticsearch.index.IndexVersion;
import org.elasticsearch.index.analysis.AnalyzerScope;
import org.elasticsearch.index.analysis.IndexAnalyzers;
import org.elasticsearch.index.analysis.NamedAnalyzer;
import org.elasticsearch.index.mapper.DocumentParserContext;
import org.elasticsearch.index.mapper.FieldMapper;
import org.elasticsearch.index.mapper.KeywordFieldMapper;
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.SourceLoader;
import org.elasticsearch.index.mapper.StringStoredFieldFieldLoader;
import org.elasticsearch.index.mapper.TextFieldMapper;
import org.elasticsearch.index.mapper.TextParams;
import org.elasticsearch.index.mapper.TextSearchInfo;
import org.elasticsearch.index.similarity.SimilarityProvider;
import org.elasticsearch.xcontent.XContentBuilder;
import java.io.IOException;
import java.io.Reader;
@ -41,6 +46,7 @@ import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
@ -58,6 +64,8 @@ import java.util.regex.Pattern;
**/
public class AnnotatedTextFieldMapper extends FieldMapper {
public static final NodeFeature SYNTHETIC_SOURCE_SUPPORT = new NodeFeature("mapper.annotated_text.synthetic_source");
public static final String CONTENT_TYPE = "annotated_text";
private static Builder builder(FieldMapper in) {
@ -114,7 +122,7 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
meta };
}
private AnnotatedTextFieldType buildFieldType(FieldType fieldType, MapperBuilderContext context) {
private AnnotatedTextFieldType buildFieldType(FieldType fieldType, MapperBuilderContext context, MultiFields multiFields) {
TextSearchInfo tsi = new TextSearchInfo(
fieldType,
similarity.get(),
@ -126,12 +134,14 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
store.getValue(),
tsi,
context.isSourceSynthetic(),
TextFieldMapper.SyntheticSourceHelper.syntheticSourceDelegate(fieldType, multiFields),
meta.getValue()
);
}
@Override
public AnnotatedTextFieldMapper build(MapperBuilderContext context) {
MultiFields multiFields = multiFieldsBuilder.build(this, context);
FieldType fieldType = TextParams.buildFieldType(() -> true, store, indexOptions, norms, termVectors);
if (fieldType.indexOptions() == IndexOptions.NONE) {
throw new IllegalArgumentException("[" + CONTENT_TYPE + "] fields must be indexed");
@ -146,8 +156,8 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
return new AnnotatedTextFieldMapper(
name(),
fieldType,
buildFieldType(fieldType, context),
multiFieldsBuilder.build(this, context),
buildFieldType(fieldType, context, multiFields),
multiFields,
copyTo,
this
);
@ -472,15 +482,15 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
}
public static final class AnnotatedTextFieldType extends TextFieldMapper.TextFieldType {
private AnnotatedTextFieldType(
String name,
boolean store,
TextSearchInfo tsi,
boolean isSyntheticSource,
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate,
Map<String, String> meta
) {
super(name, true, store, tsi, isSyntheticSource, null, meta, false, false);
super(name, true, store, tsi, isSyntheticSource, syntheticSourceDelegate, meta, false, false);
}
public AnnotatedTextFieldType(String name, Map<String, String> meta) {
@ -544,4 +554,36 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
public FieldMapper.Builder getMergeBuilder() {
return new Builder(simpleName(), builder.indexCreatedVersion, builder.analyzers.indexAnalyzers).init(this);
}
@Override
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
if (copyTo.copyToFields().isEmpty() != true) {
throw new IllegalArgumentException(
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it declares copy_to"
);
}
if (fieldType.stored()) {
return new StringStoredFieldFieldLoader(name(), simpleName(), null) {
@Override
protected void write(XContentBuilder b, Object value) throws IOException {
b.value((String) value);
}
};
}
var kwd = TextFieldMapper.SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
if (kwd != null) {
return kwd.syntheticFieldLoader(simpleName());
}
throw new IllegalArgumentException(
String.format(
Locale.ROOT,
"field [%s] of type [%s] doesn't support synthetic source unless it is stored or has a sub-field of"
+ " type [keyword] with doc values or stored and without a normalizer",
name(),
typeName()
)
);
}
}

View file

@ -0,0 +1,26 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
package org.elasticsearch.index.mapper.annotatedtext;
import org.elasticsearch.features.FeatureSpecification;
import org.elasticsearch.features.NodeFeature;
import java.util.Set;
/**
* Provides features for annotated text mapper.
*/
public class Features implements FeatureSpecification {
@Override
public Set<NodeFeature> getFeatures() {
return Set.of(
AnnotatedTextFieldMapper.SYNTHETIC_SOURCE_SUPPORT // Added in 8.15
);
}
}

View file

@ -0,0 +1,9 @@
#
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
# or more contributor license agreements. Licensed under the Elastic License
# 2.0 and the Server Side Public License, v 1; you may not use this file except
# in compliance with, at your election, the Elastic License 2.0 or the Server
# Side Public License, v 1.
#
org.elasticsearch.index.mapper.annotatedtext.Features

View file

@ -14,6 +14,7 @@ import org.apache.lucene.analysis.core.KeywordAnalyzer;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
@ -29,6 +30,7 @@ import org.elasticsearch.index.analysis.AnalyzerScope;
import org.elasticsearch.index.analysis.CharFilterFactory;
import org.elasticsearch.index.analysis.CustomAnalyzer;
import org.elasticsearch.index.analysis.IndexAnalyzers;
import org.elasticsearch.index.analysis.LowercaseNormalizer;
import org.elasticsearch.index.analysis.NamedAnalyzer;
import org.elasticsearch.index.analysis.StandardTokenizerFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
@ -38,6 +40,7 @@ import org.elasticsearch.index.mapper.MapperParsingException;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.index.mapper.MapperTestCase;
import org.elasticsearch.index.mapper.ParsedDocument;
import org.elasticsearch.index.mapper.TextFieldFamilySyntheticSourceTestSetup;
import org.elasticsearch.index.mapper.TextFieldMapper;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.xcontent.ToXContent;
@ -54,6 +57,7 @@ import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Function;
import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
@ -144,7 +148,8 @@ public class AnnotatedTextFieldMapperTests extends MapperTestCase {
)
);
return IndexAnalyzers.of(
Map.of("default", dflt, "standard", standard, "keyword", keyword, "whitespace", whitespace, "my_stop_analyzer", stop)
Map.of("default", dflt, "standard", standard, "keyword", keyword, "whitespace", whitespace, "my_stop_analyzer", stop),
Map.of("lowercase", new NamedAnalyzer("lowercase", AnalyzerScope.INDEX, new LowercaseNormalizer()))
);
}
@ -595,7 +600,23 @@ public class AnnotatedTextFieldMapperTests extends MapperTestCase {
@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
throw new AssumptionViolatedException("not supported");
assumeFalse("ignore_malformed not supported", ignoreMalformed);
return TextFieldFamilySyntheticSourceTestSetup.syntheticSourceSupport("annotated_text", false);
}
@Override
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
return TextFieldFamilySyntheticSourceTestSetup.getSupportedReaders(mapper, loaderFieldName);
}
@Override
protected Function<Object, Object> loadBlockExpected(BlockReaderSupport blockReaderSupport, boolean columnReader) {
return TextFieldFamilySyntheticSourceTestSetup.loadBlockExpected(blockReaderSupport, columnReader);
}
@Override
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
TextFieldFamilySyntheticSourceTestSetup.validateRoundTripReader(syntheticSource, reader, roundTripReader);
}
@Override

View file

@ -0,0 +1,197 @@
---
setup:
- requires:
cluster_features: ["mapper.annotated_text.synthetic_source"]
reason: introduced in 8.15.0
---
stored annotated_text field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
store: true
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: the quick brown fox
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: the quick brown fox
---
annotated_text field with keyword multi-field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
fields:
keyword:
type: keyword
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: the quick brown fox
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: the quick brown fox
---
multiple values in stored annotated_text field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
store: true
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: ["world", "hello", "world"]
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: ["world", "hello", "world"]
---
multiple values in annotated_text field with keyword multi-field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
fields:
keyword:
type: keyword
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: ["world", "hello", "world"]
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: ["hello", "world"]
---
multiple values in annotated_text field with stored keyword multi-field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
fields:
keyword:
type: keyword
store: true
doc_values: false
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: ["world", "hello", "world"]
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: ["world", "hello", "world"]
---
multiple values in stored annotated_text field with keyword multi-field:
- do:
indices.create:
index: test
body:
mappings:
_source:
mode: synthetic
properties:
annotated_text:
type: annotated_text
store: true
fields:
keyword:
type: keyword
- do:
index:
index: test
id: 1
refresh: true
body:
annotated_text: ["world", "hello", "world"]
- do:
search:
index: test
- match:
hits.hits.0._source:
annotated_text: ["world", "hello", "world"]

View file

@ -1026,7 +1026,7 @@ public final class KeywordFieldMapper extends FieldMapper {
return syntheticFieldLoader(simpleName());
}
SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String simpleName) {
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String simpleName) {
if (hasScript()) {
return SourceLoader.SyntheticFieldLoader.NOTHING;
}

View file

@ -390,7 +390,7 @@ public final class TextFieldMapper extends FieldMapper {
store.getValue(),
tsi,
context.isSourceSynthetic(),
syntheticSourceDelegate(fieldType, multiFields),
SyntheticSourceHelper.syntheticSourceDelegate(fieldType, multiFields),
meta.getValue(),
eagerGlobalOrdinals.getValue(),
indexPhrases.getValue()
@ -402,17 +402,6 @@ public final class TextFieldMapper extends FieldMapper {
return ft;
}
private static KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate(FieldType fieldType, MultiFields multiFields) {
if (fieldType.stored()) {
return null;
}
var kwd = getKeywordFieldMapperForSyntheticSource(multiFields);
if (kwd != null) {
return kwd.fieldType();
}
return null;
}
private SubFieldInfo buildPrefixInfo(MapperBuilderContext context, FieldType fieldType, TextFieldType tft) {
if (indexPrefixes.get() == null) {
return null;
@ -1094,7 +1083,7 @@ public final class TextFieldMapper extends FieldMapper {
return isSyntheticSource;
}
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate() {
public KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate() {
return syntheticSourceDelegate;
}
}
@ -1473,7 +1462,7 @@ public final class TextFieldMapper extends FieldMapper {
};
}
var kwd = getKeywordFieldMapperForSyntheticSource(this);
var kwd = SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
if (kwd != null) {
return kwd.syntheticFieldLoader(simpleName());
}
@ -1489,16 +1478,29 @@ public final class TextFieldMapper extends FieldMapper {
);
}
private static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) {
for (Mapper sub : multiFields) {
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
KeywordFieldMapper kwd = (KeywordFieldMapper) sub;
if (kwd.hasNormalizer() == false && (kwd.fieldType().hasDocValues() || kwd.fieldType().isStored())) {
return kwd;
}
public static class SyntheticSourceHelper {
public static KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate(FieldType fieldType, MultiFields multiFields) {
if (fieldType.stored()) {
return null;
}
var kwd = getKeywordFieldMapperForSyntheticSource(multiFields);
if (kwd != null) {
return kwd.fieldType();
}
return null;
}
return null;
public static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) {
for (Mapper sub : multiFields) {
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
KeywordFieldMapper kwd = (KeywordFieldMapper) sub;
if (kwd.hasNormalizer() == false && (kwd.fieldType().hasDocValues() || kwd.fieldType().isStored())) {
return kwd;
}
}
}
return null;
}
}
}

View file

@ -25,7 +25,6 @@ import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.lucene.Lucene;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.core.Tuple;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.IndexVersion;
import org.elasticsearch.index.analysis.AnalyzerScope;
@ -45,14 +44,11 @@ import org.elasticsearch.script.StringFieldScript;
import org.elasticsearch.xcontent.XContentBuilder;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
import static java.util.Collections.singletonList;
import static java.util.Collections.singletonMap;
@ -658,7 +654,7 @@ public class KeywordFieldMapperTests extends MapperTestCase {
@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
assertFalse("keyword doesn't support ignore_malformed", ignoreMalformed);
return new KeywordSyntheticSourceSupport(
return new KeywordFieldSyntheticSourceSupport(
randomBoolean() ? null : between(10, 100),
randomBoolean(),
usually() ? null : randomAlphaOfLength(2),
@ -666,110 +662,6 @@ public class KeywordFieldMapperTests extends MapperTestCase {
);
}
static class KeywordSyntheticSourceSupport implements SyntheticSourceSupport {
private final Integer ignoreAbove;
private final boolean allIgnored;
private final boolean store;
private final boolean docValues;
private final String nullValue;
private final boolean exampleSortsUsingIgnoreAbove;
KeywordSyntheticSourceSupport(Integer ignoreAbove, boolean store, String nullValue, boolean exampleSortsUsingIgnoreAbove) {
this.ignoreAbove = ignoreAbove;
this.allIgnored = ignoreAbove != null && rarely();
this.store = store;
this.nullValue = nullValue;
this.exampleSortsUsingIgnoreAbove = exampleSortsUsingIgnoreAbove;
this.docValues = store ? randomBoolean() : true;
}
@Override
public SyntheticSourceExample example(int maxValues) {
return example(maxValues, false);
}
public SyntheticSourceExample example(int maxValues, boolean loadBlockFromSource) {
if (randomBoolean()) {
Tuple<String, String> v = generateValue();
Object loadBlock = v.v2();
if (loadBlockFromSource == false && ignoreAbove != null && v.v2().length() > ignoreAbove) {
loadBlock = null;
}
return new SyntheticSourceExample(v.v1(), v.v2(), loadBlock, this::mapping);
}
List<Tuple<String, String>> values = randomList(1, maxValues, this::generateValue);
List<String> in = values.stream().map(Tuple::v1).toList();
List<String> outPrimary = new ArrayList<>();
List<String> outExtraValues = new ArrayList<>();
values.stream().map(Tuple::v2).forEach(v -> {
if (exampleSortsUsingIgnoreAbove && ignoreAbove != null && v.length() > ignoreAbove) {
outExtraValues.add(v);
} else {
outPrimary.add(v);
}
});
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
List<String> loadBlock;
if (loadBlockFromSource) {
// The block loader infrastructure will never return nulls. Just zap them all.
loadBlock = in.stream().filter(m -> m != null).toList();
} else if (docValues) {
loadBlock = new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
} else {
loadBlock = List.copyOf(outList);
}
Object loadBlockResult = loadBlock.size() == 1 ? loadBlock.get(0) : loadBlock;
outList.addAll(outExtraValues);
Object out = outList.size() == 1 ? outList.get(0) : outList;
return new SyntheticSourceExample(in, out, loadBlockResult, this::mapping);
}
private Tuple<String, String> generateValue() {
if (nullValue != null && randomBoolean()) {
return Tuple.tuple(null, nullValue);
}
int length = 5;
if (ignoreAbove != null && (allIgnored || randomBoolean())) {
length = ignoreAbove + 5;
}
String v = randomAlphaOfLength(length);
return Tuple.tuple(v, v);
}
private void mapping(XContentBuilder b) throws IOException {
b.field("type", "keyword");
if (nullValue != null) {
b.field("null_value", nullValue);
}
if (ignoreAbove != null) {
b.field("ignore_above", ignoreAbove);
}
if (store) {
b.field("store", true);
}
if (docValues == false) {
b.field("doc_values", false);
}
}
@Override
public List<SyntheticSourceInvalidExample> invalidExample() throws IOException {
return List.of(
new SyntheticSourceInvalidExample(
equalTo(
"field [field] of type [keyword] doesn't support synthetic source because "
+ "it doesn't have doc values and isn't stored"
),
b -> b.field("type", "keyword").field("doc_values", false)
),
new SyntheticSourceInvalidExample(
equalTo("field [field] of type [keyword] doesn't support synthetic source because it declares a normalizer"),
b -> b.field("type", "keyword").field("normalizer", "lowercase")
)
);
}
}
@Override
protected IngestScriptSupport ingestScriptSupport() {
return new IngestScriptSupport() {

View file

@ -75,7 +75,6 @@ import org.elasticsearch.search.lookup.SourceProvider;
import org.elasticsearch.xcontent.ToXContent;
import org.elasticsearch.xcontent.XContentBuilder;
import org.elasticsearch.xcontent.XContentFactory;
import org.hamcrest.Matcher;
import org.junit.AssumptionViolatedException;
import java.io.IOException;
@ -1178,120 +1177,12 @@ public class TextFieldMapperTests extends MapperTestCase {
@Override
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
assumeFalse("ignore_malformed not supported", ignoreMalformed);
boolean storeTextField = randomBoolean();
boolean storedKeywordField = storeTextField || randomBoolean();
boolean indexText = randomBoolean();
Integer ignoreAbove = randomBoolean() ? null : between(10, 100);
KeywordFieldMapperTests.KeywordSyntheticSourceSupport keywordSupport = new KeywordFieldMapperTests.KeywordSyntheticSourceSupport(
ignoreAbove,
storedKeywordField,
null,
false == storeTextField
);
return new SyntheticSourceSupport() {
@Override
public SyntheticSourceExample example(int maxValues) {
if (storeTextField) {
SyntheticSourceExample delegate = keywordSupport.example(maxValues, true);
return new SyntheticSourceExample(
delegate.inputValue(),
delegate.expectedForSyntheticSource(),
delegate.expectedForBlockLoader(),
b -> {
b.field("type", "text");
b.field("store", true);
if (indexText == false) {
b.field("index", false);
}
}
);
}
// We'll load from _source if ignore_above is defined, otherwise we load from the keyword field.
boolean loadingFromSource = ignoreAbove != null;
SyntheticSourceExample delegate = keywordSupport.example(maxValues, loadingFromSource);
return new SyntheticSourceExample(
delegate.inputValue(),
delegate.expectedForSyntheticSource(),
delegate.expectedForBlockLoader(),
b -> {
b.field("type", "text");
if (indexText == false) {
b.field("index", false);
}
b.startObject("fields");
{
b.startObject(randomAlphaOfLength(4));
delegate.mapping().accept(b);
b.endObject();
}
b.endObject();
}
);
}
@Override
public List<SyntheticSourceInvalidExample> invalidExample() throws IOException {
Matcher<String> err = equalTo(
"field [field] of type [text] doesn't support synthetic source unless it is stored or"
+ " has a sub-field of type [keyword] with doc values or stored and without a normalizer"
);
return List.of(
new SyntheticSourceInvalidExample(err, TextFieldMapperTests.this::minimalMapping),
new SyntheticSourceInvalidExample(err, b -> {
b.field("type", "text");
b.startObject("fields");
{
b.startObject("l");
b.field("type", "long");
b.endObject();
}
b.endObject();
}),
new SyntheticSourceInvalidExample(err, b -> {
b.field("type", "text");
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("normalizer", "lowercase");
b.endObject();
}
b.endObject();
}),
new SyntheticSourceInvalidExample(err, b -> {
b.field("type", "text");
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("doc_values", "false");
b.endObject();
}
b.endObject();
})
);
}
};
return TextFieldFamilySyntheticSourceTestSetup.syntheticSourceSupport("text", true);
}
@Override
protected Function<Object, Object> loadBlockExpected(BlockReaderSupport blockReaderSupport, boolean columnReader) {
if (nullLoaderExpected(blockReaderSupport.mapper(), blockReaderSupport.loaderFieldName())) {
return null;
}
return v -> ((BytesRef) v).utf8ToString();
}
private boolean nullLoaderExpected(MapperService mapper, String fieldName) {
MappedFieldType type = mapper.fieldType(fieldName);
if (type instanceof TextFieldType t) {
if (t.isSyntheticSource() == false || t.canUseSyntheticSourceDelegateForQuerying() || t.isStored()) {
return false;
}
String parentField = mapper.mappingLookup().parentField(fieldName);
return parentField == null || nullLoaderExpected(mapper, parentField);
}
return false;
return TextFieldFamilySyntheticSourceTestSetup.loadBlockExpected(blockReaderSupport, columnReader);
}
@Override
@ -1300,9 +1191,8 @@ public class TextFieldMapperTests extends MapperTestCase {
}
@Override
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader)
throws IOException {
// Disabled because it currently fails
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
TextFieldFamilySyntheticSourceTestSetup.validateRoundTripReader(syntheticSource, reader, roundTripReader);
}
public void testUnknownAnalyzerOnLegacyIndex() throws IOException {
@ -1433,21 +1323,7 @@ public class TextFieldMapperTests extends MapperTestCase {
@Override
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
MappedFieldType ft = mapper.fieldType(loaderFieldName);
String parentName = mapper.mappingLookup().parentField(ft.name());
if (parentName == null) {
TextFieldMapper.TextFieldType text = (TextFieldType) ft;
boolean supportsColumnAtATimeReader = text.syntheticSourceDelegate() != null
&& text.syntheticSourceDelegate().hasDocValues()
&& text.canUseSyntheticSourceDelegateForQuerying();
return new BlockReaderSupport(supportsColumnAtATimeReader, mapper, loaderFieldName);
}
MappedFieldType parent = mapper.fieldType(parentName);
if (false == parent.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
throw new UnsupportedOperationException();
}
KeywordFieldMapper.KeywordFieldType kwd = (KeywordFieldMapper.KeywordFieldType) parent;
return new BlockReaderSupport(kwd.hasDocValues(), mapper, loaderFieldName);
return TextFieldFamilySyntheticSourceTestSetup.getSupportedReaders(mapper, loaderFieldName);
}
public void testBlockLoaderFromParentColumnReader() throws IOException {
@ -1460,7 +1336,7 @@ public class TextFieldMapperTests extends MapperTestCase {
private void testBlockLoaderFromParent(boolean columnReader, boolean syntheticSource) throws IOException {
boolean storeParent = randomBoolean();
KeywordFieldMapperTests.KeywordSyntheticSourceSupport kwdSupport = new KeywordFieldMapperTests.KeywordSyntheticSourceSupport(
KeywordFieldSyntheticSourceSupport kwdSupport = new KeywordFieldSyntheticSourceSupport(
null,
storeParent,
null,

View file

@ -0,0 +1,126 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
package org.elasticsearch.index.mapper;
import org.apache.lucene.tests.util.LuceneTestCase;
import org.elasticsearch.core.Tuple;
import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.xcontent.XContentBuilder;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.stream.Collectors;
import static org.hamcrest.Matchers.equalTo;
public class KeywordFieldSyntheticSourceSupport implements MapperTestCase.SyntheticSourceSupport {
private final Integer ignoreAbove;
private final boolean allIgnored;
private final boolean store;
private final boolean docValues;
private final String nullValue;
private final boolean exampleSortsUsingIgnoreAbove;
KeywordFieldSyntheticSourceSupport(Integer ignoreAbove, boolean store, String nullValue, boolean exampleSortsUsingIgnoreAbove) {
this.ignoreAbove = ignoreAbove;
this.allIgnored = ignoreAbove != null && LuceneTestCase.rarely();
this.store = store;
this.nullValue = nullValue;
this.exampleSortsUsingIgnoreAbove = exampleSortsUsingIgnoreAbove;
this.docValues = store ? ESTestCase.randomBoolean() : true;
}
@Override
public MapperTestCase.SyntheticSourceExample example(int maxValues) {
return example(maxValues, false);
}
public MapperTestCase.SyntheticSourceExample example(int maxValues, boolean loadBlockFromSource) {
if (ESTestCase.randomBoolean()) {
Tuple<String, String> v = generateValue();
Object loadBlock = v.v2();
if (loadBlockFromSource == false && ignoreAbove != null && v.v2().length() > ignoreAbove) {
loadBlock = null;
}
return new MapperTestCase.SyntheticSourceExample(v.v1(), v.v2(), loadBlock, this::mapping);
}
List<Tuple<String, String>> values = ESTestCase.randomList(1, maxValues, this::generateValue);
List<String> in = values.stream().map(Tuple::v1).toList();
List<String> outPrimary = new ArrayList<>();
List<String> outExtraValues = new ArrayList<>();
values.stream().map(Tuple::v2).forEach(v -> {
if (exampleSortsUsingIgnoreAbove && ignoreAbove != null && v.length() > ignoreAbove) {
outExtraValues.add(v);
} else {
outPrimary.add(v);
}
});
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
List<String> loadBlock;
if (loadBlockFromSource) {
// The block loader infrastructure will never return nulls. Just zap them all.
loadBlock = in.stream().filter(m -> m != null).toList();
} else if (docValues) {
loadBlock = new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
} else {
loadBlock = List.copyOf(outList);
}
Object loadBlockResult = loadBlock.size() == 1 ? loadBlock.get(0) : loadBlock;
outList.addAll(outExtraValues);
Object out = outList.size() == 1 ? outList.get(0) : outList;
return new MapperTestCase.SyntheticSourceExample(in, out, loadBlockResult, this::mapping);
}
private Tuple<String, String> generateValue() {
if (nullValue != null && ESTestCase.randomBoolean()) {
return Tuple.tuple(null, nullValue);
}
int length = 5;
if (ignoreAbove != null && (allIgnored || ESTestCase.randomBoolean())) {
length = ignoreAbove + 5;
}
String v = ESTestCase.randomAlphaOfLength(length);
return Tuple.tuple(v, v);
}
private void mapping(XContentBuilder b) throws IOException {
b.field("type", "keyword");
if (nullValue != null) {
b.field("null_value", nullValue);
}
if (ignoreAbove != null) {
b.field("ignore_above", ignoreAbove);
}
if (store) {
b.field("store", true);
}
if (docValues == false) {
b.field("doc_values", false);
}
}
@Override
public List<MapperTestCase.SyntheticSourceInvalidExample> invalidExample() throws IOException {
return List.of(
new MapperTestCase.SyntheticSourceInvalidExample(
equalTo(
"field [field] of type [keyword] doesn't support synthetic source because "
+ "it doesn't have doc values and isn't stored"
),
b -> b.field("type", "keyword").field("doc_values", false)
),
new MapperTestCase.SyntheticSourceInvalidExample(
equalTo("field [field] of type [keyword] doesn't support synthetic source because it declares a normalizer"),
b -> b.field("type", "keyword").field("normalizer", "lowercase")
)
);
}
}

View file

@ -1286,7 +1286,7 @@ public abstract class MapperTestCase extends MapperServiceTestCase {
* @param loaderFieldName the field name to use for loading the field
*/
public record BlockReaderSupport(boolean columnAtATimeReader, boolean syntheticSource, MapperService mapper, String loaderFieldName) {
BlockReaderSupport(boolean columnAtATimeReader, MapperService mapper, String loaderFieldName) {
public BlockReaderSupport(boolean columnAtATimeReader, MapperService mapper, String loaderFieldName) {
this(columnAtATimeReader, true, mapper, loaderFieldName);
}

View file

@ -0,0 +1,207 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
package org.elasticsearch.index.mapper;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.util.BytesRef;
import org.hamcrest.Matcher;
import java.io.IOException;
import java.util.List;
import java.util.Locale;
import java.util.function.Function;
import static org.elasticsearch.test.ESTestCase.between;
import static org.elasticsearch.test.ESTestCase.randomAlphaOfLength;
import static org.elasticsearch.test.ESTestCase.randomBoolean;
import static org.hamcrest.Matchers.equalTo;
/**
* Provides functionality needed to test synthetic source support in text and text-like fields (e.g. "text", "annotated_text").
*/
public final class TextFieldFamilySyntheticSourceTestSetup {
public static MapperTestCase.SyntheticSourceSupport syntheticSourceSupport(String fieldType, boolean supportsCustomIndexConfiguration) {
return new TextFieldFamilySyntheticSourceSupport(fieldType, supportsCustomIndexConfiguration);
}
public static MapperTestCase.BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
MappedFieldType ft = mapper.fieldType(loaderFieldName);
String parentName = mapper.mappingLookup().parentField(ft.name());
if (parentName == null) {
TextFieldMapper.TextFieldType text = (TextFieldMapper.TextFieldType) ft;
boolean supportsColumnAtATimeReader = text.syntheticSourceDelegate() != null
&& text.syntheticSourceDelegate().hasDocValues()
&& text.canUseSyntheticSourceDelegateForQuerying();
return new MapperTestCase.BlockReaderSupport(supportsColumnAtATimeReader, mapper, loaderFieldName);
}
MappedFieldType parent = mapper.fieldType(parentName);
if (false == parent.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
throw new UnsupportedOperationException();
}
KeywordFieldMapper.KeywordFieldType kwd = (KeywordFieldMapper.KeywordFieldType) parent;
return new MapperTestCase.BlockReaderSupport(kwd.hasDocValues(), mapper, loaderFieldName);
}
public static Function<Object, Object> loadBlockExpected(MapperTestCase.BlockReaderSupport blockReaderSupport, boolean columnReader) {
if (nullLoaderExpected(blockReaderSupport.mapper(), blockReaderSupport.loaderFieldName())) {
return null;
}
return v -> ((BytesRef) v).utf8ToString();
}
private static boolean nullLoaderExpected(MapperService mapper, String fieldName) {
MappedFieldType type = mapper.fieldType(fieldName);
if (type instanceof TextFieldMapper.TextFieldType t) {
if (t.isSyntheticSource() == false || t.canUseSyntheticSourceDelegateForQuerying() || t.isStored()) {
return false;
}
String parentField = mapper.mappingLookup().parentField(fieldName);
return parentField == null || nullLoaderExpected(mapper, parentField);
}
return false;
}
public static void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
// `reader` here is reader of original document and `roundTripReader` reads document
// created from synthetic source.
// This check fails when synthetic source is constructed using keyword subfield
// since in that case values are sorted (due to being read from doc values) but original document isn't.
//
// So it is disabled.
}
private static class TextFieldFamilySyntheticSourceSupport implements MapperTestCase.SyntheticSourceSupport {
private final String fieldType;
private final boolean storeTextField;
private final boolean storedKeywordField;
private final boolean indexText;
private final Integer ignoreAbove;
private final KeywordFieldSyntheticSourceSupport keywordSupport;
TextFieldFamilySyntheticSourceSupport(String fieldType, boolean supportsCustomIndexConfiguration) {
this.fieldType = fieldType;
this.storeTextField = randomBoolean();
this.storedKeywordField = storeTextField || randomBoolean();
this.indexText = supportsCustomIndexConfiguration ? randomBoolean() : true;
this.ignoreAbove = randomBoolean() ? null : between(10, 100);
this.keywordSupport = new KeywordFieldSyntheticSourceSupport(ignoreAbove, storedKeywordField, null, false == storeTextField);
}
@Override
public MapperTestCase.SyntheticSourceExample example(int maxValues) {
if (storeTextField) {
MapperTestCase.SyntheticSourceExample delegate = keywordSupport.example(maxValues, true);
return new MapperTestCase.SyntheticSourceExample(
delegate.inputValue(),
delegate.expectedForSyntheticSource(),
delegate.expectedForBlockLoader(),
b -> {
b.field("type", fieldType);
b.field("store", true);
if (indexText == false) {
b.field("index", false);
}
}
);
}
// We'll load from _source if ignore_above is defined, otherwise we load from the keyword field.
boolean loadingFromSource = ignoreAbove != null;
MapperTestCase.SyntheticSourceExample delegate = keywordSupport.example(maxValues, loadingFromSource);
return new MapperTestCase.SyntheticSourceExample(
delegate.inputValue(),
delegate.expectedForSyntheticSource(),
delegate.expectedForBlockLoader(),
b -> {
b.field("type", fieldType);
if (indexText == false) {
b.field("index", false);
}
b.startObject("fields");
{
b.startObject(randomAlphaOfLength(4));
delegate.mapping().accept(b);
b.endObject();
}
b.endObject();
}
);
}
@Override
public List<MapperTestCase.SyntheticSourceInvalidExample> invalidExample() throws IOException {
Matcher<String> err = equalTo(
String.format(
Locale.ROOT,
"field [field] of type [%s] doesn't support synthetic source unless it is stored or"
+ " has a sub-field of type [keyword] with doc values or stored and without a normalizer",
fieldType
)
);
return List.of(
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> b.field("type", fieldType)),
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
b.field("type", fieldType);
b.startObject("fields");
{
b.startObject("l");
b.field("type", "long");
b.endObject();
}
b.endObject();
}),
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
b.field("type", fieldType);
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("normalizer", "lowercase");
b.endObject();
}
b.endObject();
}),
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
b.field("type", fieldType);
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("doc_values", "false");
b.endObject();
}
b.endObject();
}),
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
b.field("type", fieldType);
b.field("store", "false");
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("doc_values", "false");
b.endObject();
}
b.endObject();
}),
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
b.field("type", fieldType);
b.startObject("fields");
{
b.startObject("kwd");
b.field("type", "keyword");
b.field("doc_values", "false");
b.field("store", "false");
b.endObject();
}
b.endObject();
})
);
}
}
}