mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-28 17:34:17 -04:00
Implement synthetic source support for annotated text field (#107735)
This PR adds synthetic source support for annotated_text fields. Existing implementation for text is reused including test infrastructure so the majority of the change is moving and making things accessible. Contributes to #106460, #78744.
This commit is contained in:
parent
4ef8b3825e
commit
e1d902d33b
16 changed files with 824 additions and 300 deletions
5
docs/changelog/107735.yaml
Normal file
5
docs/changelog/107735.yaml
Normal file
|
@ -0,0 +1,5 @@
|
||||||
|
pr: 107735
|
||||||
|
summary: Implement synthetic source support for annotated text field
|
||||||
|
area: Mapping
|
||||||
|
type: feature
|
||||||
|
issues: []
|
|
@ -6,7 +6,7 @@ experimental[]
|
||||||
The mapper-annotated-text plugin provides the ability to index text that is a
|
The mapper-annotated-text plugin provides the ability to index text that is a
|
||||||
combination of free-text and special markup that is typically used to identify
|
combination of free-text and special markup that is typically used to identify
|
||||||
items of interest such as people or organisations (see NER or Named Entity Recognition
|
items of interest such as people or organisations (see NER or Named Entity Recognition
|
||||||
tools).
|
tools).
|
||||||
|
|
||||||
|
|
||||||
The elasticsearch markup allows one or more additional tokens to be injected, unchanged, into the token
|
The elasticsearch markup allows one or more additional tokens to be injected, unchanged, into the token
|
||||||
|
@ -18,7 +18,7 @@ include::install_remove.asciidoc[]
|
||||||
[[mapper-annotated-text-usage]]
|
[[mapper-annotated-text-usage]]
|
||||||
==== Using the `annotated-text` field
|
==== Using the `annotated-text` field
|
||||||
|
|
||||||
The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see
|
The `annotated-text` tokenizes text content as per the more common {ref}/text.html[`text`] field (see
|
||||||
"limitations" below) but also injects any marked-up annotation tokens directly into
|
"limitations" below) but also injects any marked-up annotation tokens directly into
|
||||||
the search index:
|
the search index:
|
||||||
|
|
||||||
|
@ -49,7 +49,7 @@ in the search index:
|
||||||
--------------------------
|
--------------------------
|
||||||
GET my-index-000001/_analyze
|
GET my-index-000001/_analyze
|
||||||
{
|
{
|
||||||
"field": "my_field",
|
"field": "my_field",
|
||||||
"text":"Investors in [Apple](Apple+Inc.) rejoiced."
|
"text":"Investors in [Apple](Apple+Inc.) rejoiced."
|
||||||
}
|
}
|
||||||
--------------------------
|
--------------------------
|
||||||
|
@ -76,7 +76,7 @@ Response:
|
||||||
"position": 1
|
"position": 1
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"token": "Apple Inc.", <1>
|
"token": "Apple Inc.", <1>
|
||||||
"start_offset": 13,
|
"start_offset": 13,
|
||||||
"end_offset": 18,
|
"end_offset": 18,
|
||||||
"type": "annotation",
|
"type": "annotation",
|
||||||
|
@ -106,7 +106,7 @@ the token stream and at the same position (position 2) as the text token (`apple
|
||||||
|
|
||||||
|
|
||||||
We can now perform searches for annotations using regular `term` queries that don't tokenize
|
We can now perform searches for annotations using regular `term` queries that don't tokenize
|
||||||
the provided search values. Annotations are a more precise way of matching as can be seen
|
the provided search values. Annotations are a more precise way of matching as can be seen
|
||||||
in this example where a search for `Beck` will not match `Jeff Beck` :
|
in this example where a search for `Beck` will not match `Jeff Beck` :
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
|
@ -133,18 +133,119 @@ GET my-index-000001/_search
|
||||||
}
|
}
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
<1> As well as tokenising the plain text into single words e.g. `beck`, here we
|
<1> As well as tokenising the plain text into single words e.g. `beck`, here we
|
||||||
inject the single token value `Beck` at the same position as `beck` in the token stream.
|
inject the single token value `Beck` at the same position as `beck` in the token stream.
|
||||||
<2> Note annotations can inject multiple tokens at the same position - here we inject both
|
<2> Note annotations can inject multiple tokens at the same position - here we inject both
|
||||||
the very specific value `Jeff Beck` and the broader term `Guitarist`. This enables
|
the very specific value `Jeff Beck` and the broader term `Guitarist`. This enables
|
||||||
broader positional queries e.g. finding mentions of a `Guitarist` near to `strat`.
|
broader positional queries e.g. finding mentions of a `Guitarist` near to `strat`.
|
||||||
<3> A benefit of searching with these carefully defined annotation tokens is that a query for
|
<3> A benefit of searching with these carefully defined annotation tokens is that a query for
|
||||||
`Beck` will not match document 2 that contains the tokens `jeff`, `beck` and `Jeff Beck`
|
`Beck` will not match document 2 that contains the tokens `jeff`, `beck` and `Jeff Beck`
|
||||||
|
|
||||||
WARNING: Any use of `=` signs in annotation values eg `[Prince](person=Prince)` will
|
WARNING: Any use of `=` signs in annotation values eg `[Prince](person=Prince)` will
|
||||||
cause the document to be rejected with a parse failure. In future we hope to have a use for
|
cause the document to be rejected with a parse failure. In future we hope to have a use for
|
||||||
the equals signs so wil actively reject documents that contain this today.
|
the equals signs so wil actively reject documents that contain this today.
|
||||||
|
|
||||||
|
[[annotated-text-synthetic-source]]
|
||||||
|
===== Synthetic `_source`
|
||||||
|
|
||||||
|
IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
|
||||||
|
(indices that have `index.mode` set to `time_series`). For other indices
|
||||||
|
synthetic `_source` is in technical preview. Features in technical preview may
|
||||||
|
be changed or removed in a future release. Elastic will work to fix
|
||||||
|
any issues, but features in technical preview are not subject to the support SLA
|
||||||
|
of official GA features.
|
||||||
|
|
||||||
|
`annotated_text` fields support {ref}/mapping-source-field.html#synthetic-source[synthetic `_source`] if they have
|
||||||
|
a {ref}/keyword.html#keyword-synthetic-source[`keyword`] sub-field that supports synthetic
|
||||||
|
`_source` or if the `text` field sets `store` to `true`. Either way, it may
|
||||||
|
not have {ref}/copy-to.html[`copy_to`].
|
||||||
|
|
||||||
|
If using a sub-`keyword` field then the values are sorted in the same way as
|
||||||
|
a `keyword` field's values are sorted. By default, that means sorted with
|
||||||
|
duplicates removed. So:
|
||||||
|
[source,console,id=synthetic-source-text-example-default]
|
||||||
|
----
|
||||||
|
PUT idx
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"_source": { "mode": "synthetic" },
|
||||||
|
"properties": {
|
||||||
|
"text": {
|
||||||
|
"type": "annotated_text",
|
||||||
|
"fields": {
|
||||||
|
"raw": {
|
||||||
|
"type": "keyword"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PUT idx/_doc/1
|
||||||
|
{
|
||||||
|
"text": [
|
||||||
|
"the quick brown fox",
|
||||||
|
"the quick brown fox",
|
||||||
|
"jumped over the lazy dog"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
||||||
|
|
||||||
|
Will become:
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"text": [
|
||||||
|
"jumped over the lazy dog",
|
||||||
|
"the quick brown fox"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
||||||
|
|
||||||
|
NOTE: Reordering text fields can have an effect on {ref}/query-dsl-match-query-phrase.html[phrase]
|
||||||
|
and {ref}/span-queries.html[span] queries. See the discussion about {ref}/position-increment-gap.html[`position_increment_gap`] for more detail. You
|
||||||
|
can avoid this by making sure the `slop` parameter on the phrase queries
|
||||||
|
is lower than the `position_increment_gap`. This is the default.
|
||||||
|
|
||||||
|
If the `annotated_text` field sets `store` to true then order and duplicates
|
||||||
|
are preserved.
|
||||||
|
[source,console,id=synthetic-source-text-example-stored]
|
||||||
|
----
|
||||||
|
PUT idx
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"_source": { "mode": "synthetic" },
|
||||||
|
"properties": {
|
||||||
|
"text": { "type": "annotated_text", "store": true }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PUT idx/_doc/1
|
||||||
|
{
|
||||||
|
"text": [
|
||||||
|
"the quick brown fox",
|
||||||
|
"the quick brown fox",
|
||||||
|
"jumped over the lazy dog"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
|
||||||
|
|
||||||
|
Will become:
|
||||||
|
[source,console-result]
|
||||||
|
----
|
||||||
|
{
|
||||||
|
"text": [
|
||||||
|
"the quick brown fox",
|
||||||
|
"the quick brown fox",
|
||||||
|
"jumped over the lazy dog"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
----
|
||||||
|
// TEST[s/^/{"_source":/ s/\n$/}/]
|
||||||
|
|
||||||
|
|
||||||
[[mapper-annotated-text-tips]]
|
[[mapper-annotated-text-tips]]
|
||||||
==== Data modelling tips
|
==== Data modelling tips
|
||||||
|
@ -153,13 +254,13 @@ the equals signs so wil actively reject documents that contain this today.
|
||||||
Annotations are normally a way of weaving structured information into unstructured text for
|
Annotations are normally a way of weaving structured information into unstructured text for
|
||||||
higher-precision search.
|
higher-precision search.
|
||||||
|
|
||||||
`Entity resolution` is a form of document enrichment undertaken by specialist software or people
|
`Entity resolution` is a form of document enrichment undertaken by specialist software or people
|
||||||
where references to entities in a document are disambiguated by attaching a canonical ID.
|
where references to entities in a document are disambiguated by attaching a canonical ID.
|
||||||
The ID is used to resolve any number of aliases or distinguish between people with the
|
The ID is used to resolve any number of aliases or distinguish between people with the
|
||||||
same name. The hyperlinks connecting Wikipedia's articles are a good example of resolved
|
same name. The hyperlinks connecting Wikipedia's articles are a good example of resolved
|
||||||
entity IDs woven into text.
|
entity IDs woven into text.
|
||||||
|
|
||||||
These IDs can be embedded as annotations in an annotated_text field but it often makes
|
These IDs can be embedded as annotations in an annotated_text field but it often makes
|
||||||
sense to include them in dedicated structured fields to support discovery via aggregations:
|
sense to include them in dedicated structured fields to support discovery via aggregations:
|
||||||
|
|
||||||
[source,console]
|
[source,console]
|
||||||
|
@ -214,20 +315,20 @@ GET my-index-000001/_search
|
||||||
--------------------------
|
--------------------------
|
||||||
|
|
||||||
<1> Note the `my_twitter_handles` contains a list of the annotation values
|
<1> Note the `my_twitter_handles` contains a list of the annotation values
|
||||||
also used in the unstructured text. (Note the annotated_text syntax requires escaping).
|
also used in the unstructured text. (Note the annotated_text syntax requires escaping).
|
||||||
By repeating the annotation values in a structured field this application has ensured that
|
By repeating the annotation values in a structured field this application has ensured that
|
||||||
the tokens discovered in the structured field can be used for search and highlighting
|
the tokens discovered in the structured field can be used for search and highlighting
|
||||||
in the unstructured field.
|
in the unstructured field.
|
||||||
<2> In this example we search for documents that talk about components of the elastic stack
|
<2> In this example we search for documents that talk about components of the elastic stack
|
||||||
<3> We use the `my_twitter_handles` field here to discover people who are significantly
|
<3> We use the `my_twitter_handles` field here to discover people who are significantly
|
||||||
associated with the elastic stack.
|
associated with the elastic stack.
|
||||||
|
|
||||||
===== Avoiding over-matching annotations
|
===== Avoiding over-matching annotations
|
||||||
By design, the regular text tokens and the annotation tokens co-exist in the same indexed
|
By design, the regular text tokens and the annotation tokens co-exist in the same indexed
|
||||||
field but in rare cases this can lead to some over-matching.
|
field but in rare cases this can lead to some over-matching.
|
||||||
|
|
||||||
The value of an annotation often denotes a _named entity_ (a person, place or company).
|
The value of an annotation often denotes a _named entity_ (a person, place or company).
|
||||||
The tokens for these named entities are inserted untokenized, and differ from typical text
|
The tokens for these named entities are inserted untokenized, and differ from typical text
|
||||||
tokens because they are normally:
|
tokens because they are normally:
|
||||||
|
|
||||||
* Mixed case e.g. `Madonna`
|
* Mixed case e.g. `Madonna`
|
||||||
|
@ -235,19 +336,19 @@ tokens because they are normally:
|
||||||
* Can have punctuation or numbers e.g. `Apple Inc.` or `@kimchy`
|
* Can have punctuation or numbers e.g. `Apple Inc.` or `@kimchy`
|
||||||
|
|
||||||
This means, for the most part, a search for a named entity in the annotated text field will
|
This means, for the most part, a search for a named entity in the annotated text field will
|
||||||
not have any false positives e.g. when selecting `Apple Inc.` from an aggregation result
|
not have any false positives e.g. when selecting `Apple Inc.` from an aggregation result
|
||||||
you can drill down to highlight uses in the text without "over matching" on any text tokens
|
you can drill down to highlight uses in the text without "over matching" on any text tokens
|
||||||
like the word `apple` in this context:
|
like the word `apple` in this context:
|
||||||
|
|
||||||
the apple was very juicy
|
the apple was very juicy
|
||||||
|
|
||||||
However, a problem arises if your named entity happens to be a single term and lower-case e.g. the
|
However, a problem arises if your named entity happens to be a single term and lower-case e.g. the
|
||||||
company `elastic`. In this case, a search on the annotated text field for the token `elastic`
|
company `elastic`. In this case, a search on the annotated text field for the token `elastic`
|
||||||
may match a text document such as this:
|
may match a text document such as this:
|
||||||
|
|
||||||
they fired an elastic band
|
they fired an elastic band
|
||||||
|
|
||||||
To avoid such false matches users should consider prefixing annotation values to ensure
|
To avoid such false matches users should consider prefixing annotation values to ensure
|
||||||
they don't name clash with text tokens e.g.
|
they don't name clash with text tokens e.g.
|
||||||
|
|
||||||
[elastic](Company_elastic) released version 7.0 of the elastic stack today
|
[elastic](Company_elastic) released version 7.0 of the elastic stack today
|
||||||
|
@ -273,7 +374,7 @@ GET my-index-000001/_search
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
"query_string": {
|
"query_string": {
|
||||||
"query": "cats"
|
"query": "cats"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"highlight": {
|
"highlight": {
|
||||||
|
@ -291,21 +392,21 @@ GET my-index-000001/_search
|
||||||
|
|
||||||
The annotated highlighter is based on the `unified` highlighter and supports the same
|
The annotated highlighter is based on the `unified` highlighter and supports the same
|
||||||
settings but does not use the `pre_tags` or `post_tags` parameters. Rather than using
|
settings but does not use the `pre_tags` or `post_tags` parameters. Rather than using
|
||||||
html-like markup such as `<em>cat</em>` the annotated highlighter uses the same
|
html-like markup such as `<em>cat</em>` the annotated highlighter uses the same
|
||||||
markdown-like syntax used for annotations and injects a key=value annotation where `_hit_term`
|
markdown-like syntax used for annotations and injects a key=value annotation where `_hit_term`
|
||||||
is the key and the matched search term is the value e.g.
|
is the key and the matched search term is the value e.g.
|
||||||
|
|
||||||
The [cat](_hit_term=cat) sat on the [mat](sku3578)
|
The [cat](_hit_term=cat) sat on the [mat](sku3578)
|
||||||
|
|
||||||
The annotated highlighter tries to be respectful of any existing markup in the original
|
The annotated highlighter tries to be respectful of any existing markup in the original
|
||||||
text:
|
text:
|
||||||
|
|
||||||
* If the search term matches exactly the location of an existing annotation then the
|
* If the search term matches exactly the location of an existing annotation then the
|
||||||
`_hit_term` key is merged into the url-like syntax used in the `(...)` part of the
|
`_hit_term` key is merged into the url-like syntax used in the `(...)` part of the
|
||||||
existing annotation.
|
existing annotation.
|
||||||
* However, if the search term overlaps the span of an existing annotation it would break
|
* However, if the search term overlaps the span of an existing annotation it would break
|
||||||
the markup formatting so the original annotation is removed in favour of a new annotation
|
the markup formatting so the original annotation is removed in favour of a new annotation
|
||||||
with just the search hit information in the results.
|
with just the search hit information in the results.
|
||||||
* Any non-overlapping annotations in the original text are preserved in highlighter
|
* Any non-overlapping annotations in the original text are preserved in highlighter
|
||||||
selections
|
selections
|
||||||
|
|
||||||
|
|
|
@ -41,6 +41,7 @@ There are a couple of restrictions to be aware of:
|
||||||
types:
|
types:
|
||||||
|
|
||||||
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
|
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
|
||||||
|
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
|
||||||
** <<binary-synthetic-source,`binary`>>
|
** <<binary-synthetic-source,`binary`>>
|
||||||
** <<boolean-synthetic-source,`boolean`>>
|
** <<boolean-synthetic-source,`boolean`>>
|
||||||
** <<numeric-synthetic-source,`byte`>>
|
** <<numeric-synthetic-source,`byte`>>
|
||||||
|
|
19
plugins/mapper-annotated-text/src/main/java/module-info.java
Normal file
19
plugins/mapper-annotated-text/src/main/java/module-info.java
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
/*
|
||||||
|
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||||
|
* or more contributor license agreements. Licensed under the Elastic License
|
||||||
|
* 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||||
|
* in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||||
|
* Side Public License, v 1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
module org.elasticsearch.index.mapper.annotatedtext {
|
||||||
|
requires org.elasticsearch.base;
|
||||||
|
requires org.elasticsearch.server;
|
||||||
|
requires org.elasticsearch.xcontent;
|
||||||
|
requires org.apache.lucene.core;
|
||||||
|
requires org.apache.lucene.highlighter;
|
||||||
|
|
||||||
|
// exports nothing
|
||||||
|
|
||||||
|
provides org.elasticsearch.features.FeatureSpecification with org.elasticsearch.index.mapper.annotatedtext.Features;
|
||||||
|
}
|
|
@ -21,17 +21,22 @@ import org.apache.lucene.document.Field;
|
||||||
import org.apache.lucene.document.FieldType;
|
import org.apache.lucene.document.FieldType;
|
||||||
import org.apache.lucene.index.IndexOptions;
|
import org.apache.lucene.index.IndexOptions;
|
||||||
import org.elasticsearch.ElasticsearchParseException;
|
import org.elasticsearch.ElasticsearchParseException;
|
||||||
|
import org.elasticsearch.features.NodeFeature;
|
||||||
import org.elasticsearch.index.IndexVersion;
|
import org.elasticsearch.index.IndexVersion;
|
||||||
import org.elasticsearch.index.analysis.AnalyzerScope;
|
import org.elasticsearch.index.analysis.AnalyzerScope;
|
||||||
import org.elasticsearch.index.analysis.IndexAnalyzers;
|
import org.elasticsearch.index.analysis.IndexAnalyzers;
|
||||||
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
||||||
import org.elasticsearch.index.mapper.DocumentParserContext;
|
import org.elasticsearch.index.mapper.DocumentParserContext;
|
||||||
import org.elasticsearch.index.mapper.FieldMapper;
|
import org.elasticsearch.index.mapper.FieldMapper;
|
||||||
|
import org.elasticsearch.index.mapper.KeywordFieldMapper;
|
||||||
import org.elasticsearch.index.mapper.MapperBuilderContext;
|
import org.elasticsearch.index.mapper.MapperBuilderContext;
|
||||||
|
import org.elasticsearch.index.mapper.SourceLoader;
|
||||||
|
import org.elasticsearch.index.mapper.StringStoredFieldFieldLoader;
|
||||||
import org.elasticsearch.index.mapper.TextFieldMapper;
|
import org.elasticsearch.index.mapper.TextFieldMapper;
|
||||||
import org.elasticsearch.index.mapper.TextParams;
|
import org.elasticsearch.index.mapper.TextParams;
|
||||||
import org.elasticsearch.index.mapper.TextSearchInfo;
|
import org.elasticsearch.index.mapper.TextSearchInfo;
|
||||||
import org.elasticsearch.index.similarity.SimilarityProvider;
|
import org.elasticsearch.index.similarity.SimilarityProvider;
|
||||||
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.io.Reader;
|
import java.io.Reader;
|
||||||
|
@ -41,6 +46,7 @@ import java.net.URLDecoder;
|
||||||
import java.nio.charset.StandardCharsets;
|
import java.nio.charset.StandardCharsets;
|
||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
import java.util.Locale;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.regex.Matcher;
|
import java.util.regex.Matcher;
|
||||||
import java.util.regex.Pattern;
|
import java.util.regex.Pattern;
|
||||||
|
@ -58,6 +64,8 @@ import java.util.regex.Pattern;
|
||||||
**/
|
**/
|
||||||
public class AnnotatedTextFieldMapper extends FieldMapper {
|
public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
|
|
||||||
|
public static final NodeFeature SYNTHETIC_SOURCE_SUPPORT = new NodeFeature("mapper.annotated_text.synthetic_source");
|
||||||
|
|
||||||
public static final String CONTENT_TYPE = "annotated_text";
|
public static final String CONTENT_TYPE = "annotated_text";
|
||||||
|
|
||||||
private static Builder builder(FieldMapper in) {
|
private static Builder builder(FieldMapper in) {
|
||||||
|
@ -114,7 +122,7 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
meta };
|
meta };
|
||||||
}
|
}
|
||||||
|
|
||||||
private AnnotatedTextFieldType buildFieldType(FieldType fieldType, MapperBuilderContext context) {
|
private AnnotatedTextFieldType buildFieldType(FieldType fieldType, MapperBuilderContext context, MultiFields multiFields) {
|
||||||
TextSearchInfo tsi = new TextSearchInfo(
|
TextSearchInfo tsi = new TextSearchInfo(
|
||||||
fieldType,
|
fieldType,
|
||||||
similarity.get(),
|
similarity.get(),
|
||||||
|
@ -126,12 +134,14 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
store.getValue(),
|
store.getValue(),
|
||||||
tsi,
|
tsi,
|
||||||
context.isSourceSynthetic(),
|
context.isSourceSynthetic(),
|
||||||
|
TextFieldMapper.SyntheticSourceHelper.syntheticSourceDelegate(fieldType, multiFields),
|
||||||
meta.getValue()
|
meta.getValue()
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public AnnotatedTextFieldMapper build(MapperBuilderContext context) {
|
public AnnotatedTextFieldMapper build(MapperBuilderContext context) {
|
||||||
|
MultiFields multiFields = multiFieldsBuilder.build(this, context);
|
||||||
FieldType fieldType = TextParams.buildFieldType(() -> true, store, indexOptions, norms, termVectors);
|
FieldType fieldType = TextParams.buildFieldType(() -> true, store, indexOptions, norms, termVectors);
|
||||||
if (fieldType.indexOptions() == IndexOptions.NONE) {
|
if (fieldType.indexOptions() == IndexOptions.NONE) {
|
||||||
throw new IllegalArgumentException("[" + CONTENT_TYPE + "] fields must be indexed");
|
throw new IllegalArgumentException("[" + CONTENT_TYPE + "] fields must be indexed");
|
||||||
|
@ -146,8 +156,8 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
return new AnnotatedTextFieldMapper(
|
return new AnnotatedTextFieldMapper(
|
||||||
name(),
|
name(),
|
||||||
fieldType,
|
fieldType,
|
||||||
buildFieldType(fieldType, context),
|
buildFieldType(fieldType, context, multiFields),
|
||||||
multiFieldsBuilder.build(this, context),
|
multiFields,
|
||||||
copyTo,
|
copyTo,
|
||||||
this
|
this
|
||||||
);
|
);
|
||||||
|
@ -472,15 +482,15 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
}
|
}
|
||||||
|
|
||||||
public static final class AnnotatedTextFieldType extends TextFieldMapper.TextFieldType {
|
public static final class AnnotatedTextFieldType extends TextFieldMapper.TextFieldType {
|
||||||
|
|
||||||
private AnnotatedTextFieldType(
|
private AnnotatedTextFieldType(
|
||||||
String name,
|
String name,
|
||||||
boolean store,
|
boolean store,
|
||||||
TextSearchInfo tsi,
|
TextSearchInfo tsi,
|
||||||
boolean isSyntheticSource,
|
boolean isSyntheticSource,
|
||||||
|
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate,
|
||||||
Map<String, String> meta
|
Map<String, String> meta
|
||||||
) {
|
) {
|
||||||
super(name, true, store, tsi, isSyntheticSource, null, meta, false, false);
|
super(name, true, store, tsi, isSyntheticSource, syntheticSourceDelegate, meta, false, false);
|
||||||
}
|
}
|
||||||
|
|
||||||
public AnnotatedTextFieldType(String name, Map<String, String> meta) {
|
public AnnotatedTextFieldType(String name, Map<String, String> meta) {
|
||||||
|
@ -544,4 +554,36 @@ public class AnnotatedTextFieldMapper extends FieldMapper {
|
||||||
public FieldMapper.Builder getMergeBuilder() {
|
public FieldMapper.Builder getMergeBuilder() {
|
||||||
return new Builder(simpleName(), builder.indexCreatedVersion, builder.analyzers.indexAnalyzers).init(this);
|
return new Builder(simpleName(), builder.indexCreatedVersion, builder.analyzers.indexAnalyzers).init(this);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
|
||||||
|
if (copyTo.copyToFields().isEmpty() != true) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it declares copy_to"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (fieldType.stored()) {
|
||||||
|
return new StringStoredFieldFieldLoader(name(), simpleName(), null) {
|
||||||
|
@Override
|
||||||
|
protected void write(XContentBuilder b, Object value) throws IOException {
|
||||||
|
b.value((String) value);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
var kwd = TextFieldMapper.SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
|
||||||
|
if (kwd != null) {
|
||||||
|
return kwd.syntheticFieldLoader(simpleName());
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
String.format(
|
||||||
|
Locale.ROOT,
|
||||||
|
"field [%s] of type [%s] doesn't support synthetic source unless it is stored or has a sub-field of"
|
||||||
|
+ " type [keyword] with doc values or stored and without a normalizer",
|
||||||
|
name(),
|
||||||
|
typeName()
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -0,0 +1,26 @@
|
||||||
|
/*
|
||||||
|
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||||
|
* or more contributor license agreements. Licensed under the Elastic License
|
||||||
|
* 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||||
|
* in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||||
|
* Side Public License, v 1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
package org.elasticsearch.index.mapper.annotatedtext;
|
||||||
|
|
||||||
|
import org.elasticsearch.features.FeatureSpecification;
|
||||||
|
import org.elasticsearch.features.NodeFeature;
|
||||||
|
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Provides features for annotated text mapper.
|
||||||
|
*/
|
||||||
|
public class Features implements FeatureSpecification {
|
||||||
|
@Override
|
||||||
|
public Set<NodeFeature> getFeatures() {
|
||||||
|
return Set.of(
|
||||||
|
AnnotatedTextFieldMapper.SYNTHETIC_SOURCE_SUPPORT // Added in 8.15
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
#
|
||||||
|
# Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||||
|
# or more contributor license agreements. Licensed under the Elastic License
|
||||||
|
# 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||||
|
# in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||||
|
# Side Public License, v 1.
|
||||||
|
#
|
||||||
|
|
||||||
|
org.elasticsearch.index.mapper.annotatedtext.Features
|
|
@ -14,6 +14,7 @@ import org.apache.lucene.analysis.core.KeywordAnalyzer;
|
||||||
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
|
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
|
||||||
import org.apache.lucene.analysis.en.EnglishAnalyzer;
|
import org.apache.lucene.analysis.en.EnglishAnalyzer;
|
||||||
import org.apache.lucene.analysis.standard.StandardAnalyzer;
|
import org.apache.lucene.analysis.standard.StandardAnalyzer;
|
||||||
|
import org.apache.lucene.index.DirectoryReader;
|
||||||
import org.apache.lucene.index.DocValuesType;
|
import org.apache.lucene.index.DocValuesType;
|
||||||
import org.apache.lucene.index.IndexOptions;
|
import org.apache.lucene.index.IndexOptions;
|
||||||
import org.apache.lucene.index.IndexableField;
|
import org.apache.lucene.index.IndexableField;
|
||||||
|
@ -29,6 +30,7 @@ import org.elasticsearch.index.analysis.AnalyzerScope;
|
||||||
import org.elasticsearch.index.analysis.CharFilterFactory;
|
import org.elasticsearch.index.analysis.CharFilterFactory;
|
||||||
import org.elasticsearch.index.analysis.CustomAnalyzer;
|
import org.elasticsearch.index.analysis.CustomAnalyzer;
|
||||||
import org.elasticsearch.index.analysis.IndexAnalyzers;
|
import org.elasticsearch.index.analysis.IndexAnalyzers;
|
||||||
|
import org.elasticsearch.index.analysis.LowercaseNormalizer;
|
||||||
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
import org.elasticsearch.index.analysis.NamedAnalyzer;
|
||||||
import org.elasticsearch.index.analysis.StandardTokenizerFactory;
|
import org.elasticsearch.index.analysis.StandardTokenizerFactory;
|
||||||
import org.elasticsearch.index.analysis.TokenFilterFactory;
|
import org.elasticsearch.index.analysis.TokenFilterFactory;
|
||||||
|
@ -38,6 +40,7 @@ import org.elasticsearch.index.mapper.MapperParsingException;
|
||||||
import org.elasticsearch.index.mapper.MapperService;
|
import org.elasticsearch.index.mapper.MapperService;
|
||||||
import org.elasticsearch.index.mapper.MapperTestCase;
|
import org.elasticsearch.index.mapper.MapperTestCase;
|
||||||
import org.elasticsearch.index.mapper.ParsedDocument;
|
import org.elasticsearch.index.mapper.ParsedDocument;
|
||||||
|
import org.elasticsearch.index.mapper.TextFieldFamilySyntheticSourceTestSetup;
|
||||||
import org.elasticsearch.index.mapper.TextFieldMapper;
|
import org.elasticsearch.index.mapper.TextFieldMapper;
|
||||||
import org.elasticsearch.plugins.Plugin;
|
import org.elasticsearch.plugins.Plugin;
|
||||||
import org.elasticsearch.xcontent.ToXContent;
|
import org.elasticsearch.xcontent.ToXContent;
|
||||||
|
@ -54,6 +57,7 @@ import java.util.HashSet;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.Set;
|
import java.util.Set;
|
||||||
|
import java.util.function.Function;
|
||||||
|
|
||||||
import static org.hamcrest.Matchers.containsString;
|
import static org.hamcrest.Matchers.containsString;
|
||||||
import static org.hamcrest.Matchers.equalTo;
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
|
@ -144,7 +148,8 @@ public class AnnotatedTextFieldMapperTests extends MapperTestCase {
|
||||||
)
|
)
|
||||||
);
|
);
|
||||||
return IndexAnalyzers.of(
|
return IndexAnalyzers.of(
|
||||||
Map.of("default", dflt, "standard", standard, "keyword", keyword, "whitespace", whitespace, "my_stop_analyzer", stop)
|
Map.of("default", dflt, "standard", standard, "keyword", keyword, "whitespace", whitespace, "my_stop_analyzer", stop),
|
||||||
|
Map.of("lowercase", new NamedAnalyzer("lowercase", AnalyzerScope.INDEX, new LowercaseNormalizer()))
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -595,7 +600,23 @@ public class AnnotatedTextFieldMapperTests extends MapperTestCase {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
||||||
throw new AssumptionViolatedException("not supported");
|
assumeFalse("ignore_malformed not supported", ignoreMalformed);
|
||||||
|
return TextFieldFamilySyntheticSourceTestSetup.syntheticSourceSupport("annotated_text", false);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
|
||||||
|
return TextFieldFamilySyntheticSourceTestSetup.getSupportedReaders(mapper, loaderFieldName);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected Function<Object, Object> loadBlockExpected(BlockReaderSupport blockReaderSupport, boolean columnReader) {
|
||||||
|
return TextFieldFamilySyntheticSourceTestSetup.loadBlockExpected(blockReaderSupport, columnReader);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
|
||||||
|
TextFieldFamilySyntheticSourceTestSetup.validateRoundTripReader(syntheticSource, reader, roundTripReader);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
|
|
@ -0,0 +1,197 @@
|
||||||
|
---
|
||||||
|
setup:
|
||||||
|
- requires:
|
||||||
|
cluster_features: ["mapper.annotated_text.synthetic_source"]
|
||||||
|
reason: introduced in 8.15.0
|
||||||
|
|
||||||
|
---
|
||||||
|
stored annotated_text field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
store: true
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: the quick brown fox
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: the quick brown fox
|
||||||
|
|
||||||
|
---
|
||||||
|
annotated_text field with keyword multi-field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
fields:
|
||||||
|
keyword:
|
||||||
|
type: keyword
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: the quick brown fox
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: the quick brown fox
|
||||||
|
|
||||||
|
---
|
||||||
|
multiple values in stored annotated_text field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
store: true
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
---
|
||||||
|
multiple values in annotated_text field with keyword multi-field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
fields:
|
||||||
|
keyword:
|
||||||
|
type: keyword
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: ["hello", "world"]
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
multiple values in annotated_text field with stored keyword multi-field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
fields:
|
||||||
|
keyword:
|
||||||
|
type: keyword
|
||||||
|
store: true
|
||||||
|
doc_values: false
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
---
|
||||||
|
multiple values in stored annotated_text field with keyword multi-field:
|
||||||
|
- do:
|
||||||
|
indices.create:
|
||||||
|
index: test
|
||||||
|
body:
|
||||||
|
mappings:
|
||||||
|
_source:
|
||||||
|
mode: synthetic
|
||||||
|
properties:
|
||||||
|
annotated_text:
|
||||||
|
type: annotated_text
|
||||||
|
store: true
|
||||||
|
fields:
|
||||||
|
keyword:
|
||||||
|
type: keyword
|
||||||
|
|
||||||
|
- do:
|
||||||
|
index:
|
||||||
|
index: test
|
||||||
|
id: 1
|
||||||
|
refresh: true
|
||||||
|
body:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
||||||
|
|
||||||
|
- do:
|
||||||
|
search:
|
||||||
|
index: test
|
||||||
|
|
||||||
|
- match:
|
||||||
|
hits.hits.0._source:
|
||||||
|
annotated_text: ["world", "hello", "world"]
|
|
@ -1026,7 +1026,7 @@ public final class KeywordFieldMapper extends FieldMapper {
|
||||||
return syntheticFieldLoader(simpleName());
|
return syntheticFieldLoader(simpleName());
|
||||||
}
|
}
|
||||||
|
|
||||||
SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String simpleName) {
|
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String simpleName) {
|
||||||
if (hasScript()) {
|
if (hasScript()) {
|
||||||
return SourceLoader.SyntheticFieldLoader.NOTHING;
|
return SourceLoader.SyntheticFieldLoader.NOTHING;
|
||||||
}
|
}
|
||||||
|
|
|
@ -390,7 +390,7 @@ public final class TextFieldMapper extends FieldMapper {
|
||||||
store.getValue(),
|
store.getValue(),
|
||||||
tsi,
|
tsi,
|
||||||
context.isSourceSynthetic(),
|
context.isSourceSynthetic(),
|
||||||
syntheticSourceDelegate(fieldType, multiFields),
|
SyntheticSourceHelper.syntheticSourceDelegate(fieldType, multiFields),
|
||||||
meta.getValue(),
|
meta.getValue(),
|
||||||
eagerGlobalOrdinals.getValue(),
|
eagerGlobalOrdinals.getValue(),
|
||||||
indexPhrases.getValue()
|
indexPhrases.getValue()
|
||||||
|
@ -402,17 +402,6 @@ public final class TextFieldMapper extends FieldMapper {
|
||||||
return ft;
|
return ft;
|
||||||
}
|
}
|
||||||
|
|
||||||
private static KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate(FieldType fieldType, MultiFields multiFields) {
|
|
||||||
if (fieldType.stored()) {
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
var kwd = getKeywordFieldMapperForSyntheticSource(multiFields);
|
|
||||||
if (kwd != null) {
|
|
||||||
return kwd.fieldType();
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
private SubFieldInfo buildPrefixInfo(MapperBuilderContext context, FieldType fieldType, TextFieldType tft) {
|
private SubFieldInfo buildPrefixInfo(MapperBuilderContext context, FieldType fieldType, TextFieldType tft) {
|
||||||
if (indexPrefixes.get() == null) {
|
if (indexPrefixes.get() == null) {
|
||||||
return null;
|
return null;
|
||||||
|
@ -1094,7 +1083,7 @@ public final class TextFieldMapper extends FieldMapper {
|
||||||
return isSyntheticSource;
|
return isSyntheticSource;
|
||||||
}
|
}
|
||||||
|
|
||||||
KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate() {
|
public KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate() {
|
||||||
return syntheticSourceDelegate;
|
return syntheticSourceDelegate;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -1473,7 +1462,7 @@ public final class TextFieldMapper extends FieldMapper {
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
var kwd = getKeywordFieldMapperForSyntheticSource(this);
|
var kwd = SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
|
||||||
if (kwd != null) {
|
if (kwd != null) {
|
||||||
return kwd.syntheticFieldLoader(simpleName());
|
return kwd.syntheticFieldLoader(simpleName());
|
||||||
}
|
}
|
||||||
|
@ -1489,16 +1478,29 @@ public final class TextFieldMapper extends FieldMapper {
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
private static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) {
|
public static class SyntheticSourceHelper {
|
||||||
for (Mapper sub : multiFields) {
|
public static KeywordFieldMapper.KeywordFieldType syntheticSourceDelegate(FieldType fieldType, MultiFields multiFields) {
|
||||||
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
|
if (fieldType.stored()) {
|
||||||
KeywordFieldMapper kwd = (KeywordFieldMapper) sub;
|
return null;
|
||||||
if (kwd.hasNormalizer() == false && (kwd.fieldType().hasDocValues() || kwd.fieldType().isStored())) {
|
|
||||||
return kwd;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
var kwd = getKeywordFieldMapperForSyntheticSource(multiFields);
|
||||||
|
if (kwd != null) {
|
||||||
|
return kwd.fieldType();
|
||||||
|
}
|
||||||
|
return null;
|
||||||
}
|
}
|
||||||
|
|
||||||
return null;
|
public static KeywordFieldMapper getKeywordFieldMapperForSyntheticSource(Iterable<? extends Mapper> multiFields) {
|
||||||
|
for (Mapper sub : multiFields) {
|
||||||
|
if (sub.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
|
||||||
|
KeywordFieldMapper kwd = (KeywordFieldMapper) sub;
|
||||||
|
if (kwd.hasNormalizer() == false && (kwd.fieldType().hasDocValues() || kwd.fieldType().isStored())) {
|
||||||
|
return kwd;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return null;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
@ -25,7 +25,6 @@ import org.elasticsearch.cluster.metadata.IndexMetadata;
|
||||||
import org.elasticsearch.common.Strings;
|
import org.elasticsearch.common.Strings;
|
||||||
import org.elasticsearch.common.lucene.Lucene;
|
import org.elasticsearch.common.lucene.Lucene;
|
||||||
import org.elasticsearch.common.settings.Settings;
|
import org.elasticsearch.common.settings.Settings;
|
||||||
import org.elasticsearch.core.Tuple;
|
|
||||||
import org.elasticsearch.index.IndexSettings;
|
import org.elasticsearch.index.IndexSettings;
|
||||||
import org.elasticsearch.index.IndexVersion;
|
import org.elasticsearch.index.IndexVersion;
|
||||||
import org.elasticsearch.index.analysis.AnalyzerScope;
|
import org.elasticsearch.index.analysis.AnalyzerScope;
|
||||||
|
@ -45,14 +44,11 @@ import org.elasticsearch.script.StringFieldScript;
|
||||||
import org.elasticsearch.xcontent.XContentBuilder;
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import java.util.ArrayList;
|
|
||||||
import java.util.Arrays;
|
import java.util.Arrays;
|
||||||
import java.util.Collection;
|
import java.util.Collection;
|
||||||
import java.util.HashSet;
|
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.function.Function;
|
import java.util.function.Function;
|
||||||
import java.util.stream.Collectors;
|
|
||||||
|
|
||||||
import static java.util.Collections.singletonList;
|
import static java.util.Collections.singletonList;
|
||||||
import static java.util.Collections.singletonMap;
|
import static java.util.Collections.singletonMap;
|
||||||
|
@ -658,7 +654,7 @@ public class KeywordFieldMapperTests extends MapperTestCase {
|
||||||
@Override
|
@Override
|
||||||
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
||||||
assertFalse("keyword doesn't support ignore_malformed", ignoreMalformed);
|
assertFalse("keyword doesn't support ignore_malformed", ignoreMalformed);
|
||||||
return new KeywordSyntheticSourceSupport(
|
return new KeywordFieldSyntheticSourceSupport(
|
||||||
randomBoolean() ? null : between(10, 100),
|
randomBoolean() ? null : between(10, 100),
|
||||||
randomBoolean(),
|
randomBoolean(),
|
||||||
usually() ? null : randomAlphaOfLength(2),
|
usually() ? null : randomAlphaOfLength(2),
|
||||||
|
@ -666,110 +662,6 @@ public class KeywordFieldMapperTests extends MapperTestCase {
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
static class KeywordSyntheticSourceSupport implements SyntheticSourceSupport {
|
|
||||||
private final Integer ignoreAbove;
|
|
||||||
private final boolean allIgnored;
|
|
||||||
private final boolean store;
|
|
||||||
private final boolean docValues;
|
|
||||||
private final String nullValue;
|
|
||||||
private final boolean exampleSortsUsingIgnoreAbove;
|
|
||||||
|
|
||||||
KeywordSyntheticSourceSupport(Integer ignoreAbove, boolean store, String nullValue, boolean exampleSortsUsingIgnoreAbove) {
|
|
||||||
this.ignoreAbove = ignoreAbove;
|
|
||||||
this.allIgnored = ignoreAbove != null && rarely();
|
|
||||||
this.store = store;
|
|
||||||
this.nullValue = nullValue;
|
|
||||||
this.exampleSortsUsingIgnoreAbove = exampleSortsUsingIgnoreAbove;
|
|
||||||
this.docValues = store ? randomBoolean() : true;
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
|
||||||
public SyntheticSourceExample example(int maxValues) {
|
|
||||||
return example(maxValues, false);
|
|
||||||
}
|
|
||||||
|
|
||||||
public SyntheticSourceExample example(int maxValues, boolean loadBlockFromSource) {
|
|
||||||
if (randomBoolean()) {
|
|
||||||
Tuple<String, String> v = generateValue();
|
|
||||||
Object loadBlock = v.v2();
|
|
||||||
if (loadBlockFromSource == false && ignoreAbove != null && v.v2().length() > ignoreAbove) {
|
|
||||||
loadBlock = null;
|
|
||||||
}
|
|
||||||
return new SyntheticSourceExample(v.v1(), v.v2(), loadBlock, this::mapping);
|
|
||||||
}
|
|
||||||
List<Tuple<String, String>> values = randomList(1, maxValues, this::generateValue);
|
|
||||||
List<String> in = values.stream().map(Tuple::v1).toList();
|
|
||||||
List<String> outPrimary = new ArrayList<>();
|
|
||||||
List<String> outExtraValues = new ArrayList<>();
|
|
||||||
values.stream().map(Tuple::v2).forEach(v -> {
|
|
||||||
if (exampleSortsUsingIgnoreAbove && ignoreAbove != null && v.length() > ignoreAbove) {
|
|
||||||
outExtraValues.add(v);
|
|
||||||
} else {
|
|
||||||
outPrimary.add(v);
|
|
||||||
}
|
|
||||||
});
|
|
||||||
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
|
|
||||||
List<String> loadBlock;
|
|
||||||
if (loadBlockFromSource) {
|
|
||||||
// The block loader infrastructure will never return nulls. Just zap them all.
|
|
||||||
loadBlock = in.stream().filter(m -> m != null).toList();
|
|
||||||
} else if (docValues) {
|
|
||||||
loadBlock = new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
|
|
||||||
} else {
|
|
||||||
loadBlock = List.copyOf(outList);
|
|
||||||
}
|
|
||||||
Object loadBlockResult = loadBlock.size() == 1 ? loadBlock.get(0) : loadBlock;
|
|
||||||
outList.addAll(outExtraValues);
|
|
||||||
Object out = outList.size() == 1 ? outList.get(0) : outList;
|
|
||||||
return new SyntheticSourceExample(in, out, loadBlockResult, this::mapping);
|
|
||||||
}
|
|
||||||
|
|
||||||
private Tuple<String, String> generateValue() {
|
|
||||||
if (nullValue != null && randomBoolean()) {
|
|
||||||
return Tuple.tuple(null, nullValue);
|
|
||||||
}
|
|
||||||
int length = 5;
|
|
||||||
if (ignoreAbove != null && (allIgnored || randomBoolean())) {
|
|
||||||
length = ignoreAbove + 5;
|
|
||||||
}
|
|
||||||
String v = randomAlphaOfLength(length);
|
|
||||||
return Tuple.tuple(v, v);
|
|
||||||
}
|
|
||||||
|
|
||||||
private void mapping(XContentBuilder b) throws IOException {
|
|
||||||
b.field("type", "keyword");
|
|
||||||
if (nullValue != null) {
|
|
||||||
b.field("null_value", nullValue);
|
|
||||||
}
|
|
||||||
if (ignoreAbove != null) {
|
|
||||||
b.field("ignore_above", ignoreAbove);
|
|
||||||
}
|
|
||||||
if (store) {
|
|
||||||
b.field("store", true);
|
|
||||||
}
|
|
||||||
if (docValues == false) {
|
|
||||||
b.field("doc_values", false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
|
||||||
public List<SyntheticSourceInvalidExample> invalidExample() throws IOException {
|
|
||||||
return List.of(
|
|
||||||
new SyntheticSourceInvalidExample(
|
|
||||||
equalTo(
|
|
||||||
"field [field] of type [keyword] doesn't support synthetic source because "
|
|
||||||
+ "it doesn't have doc values and isn't stored"
|
|
||||||
),
|
|
||||||
b -> b.field("type", "keyword").field("doc_values", false)
|
|
||||||
),
|
|
||||||
new SyntheticSourceInvalidExample(
|
|
||||||
equalTo("field [field] of type [keyword] doesn't support synthetic source because it declares a normalizer"),
|
|
||||||
b -> b.field("type", "keyword").field("normalizer", "lowercase")
|
|
||||||
)
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected IngestScriptSupport ingestScriptSupport() {
|
protected IngestScriptSupport ingestScriptSupport() {
|
||||||
return new IngestScriptSupport() {
|
return new IngestScriptSupport() {
|
||||||
|
|
|
@ -75,7 +75,6 @@ import org.elasticsearch.search.lookup.SourceProvider;
|
||||||
import org.elasticsearch.xcontent.ToXContent;
|
import org.elasticsearch.xcontent.ToXContent;
|
||||||
import org.elasticsearch.xcontent.XContentBuilder;
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
import org.elasticsearch.xcontent.XContentFactory;
|
import org.elasticsearch.xcontent.XContentFactory;
|
||||||
import org.hamcrest.Matcher;
|
|
||||||
import org.junit.AssumptionViolatedException;
|
import org.junit.AssumptionViolatedException;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
|
@ -1178,120 +1177,12 @@ public class TextFieldMapperTests extends MapperTestCase {
|
||||||
@Override
|
@Override
|
||||||
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
protected SyntheticSourceSupport syntheticSourceSupport(boolean ignoreMalformed) {
|
||||||
assumeFalse("ignore_malformed not supported", ignoreMalformed);
|
assumeFalse("ignore_malformed not supported", ignoreMalformed);
|
||||||
boolean storeTextField = randomBoolean();
|
return TextFieldFamilySyntheticSourceTestSetup.syntheticSourceSupport("text", true);
|
||||||
boolean storedKeywordField = storeTextField || randomBoolean();
|
|
||||||
boolean indexText = randomBoolean();
|
|
||||||
Integer ignoreAbove = randomBoolean() ? null : between(10, 100);
|
|
||||||
KeywordFieldMapperTests.KeywordSyntheticSourceSupport keywordSupport = new KeywordFieldMapperTests.KeywordSyntheticSourceSupport(
|
|
||||||
ignoreAbove,
|
|
||||||
storedKeywordField,
|
|
||||||
null,
|
|
||||||
false == storeTextField
|
|
||||||
);
|
|
||||||
return new SyntheticSourceSupport() {
|
|
||||||
@Override
|
|
||||||
public SyntheticSourceExample example(int maxValues) {
|
|
||||||
if (storeTextField) {
|
|
||||||
SyntheticSourceExample delegate = keywordSupport.example(maxValues, true);
|
|
||||||
return new SyntheticSourceExample(
|
|
||||||
delegate.inputValue(),
|
|
||||||
delegate.expectedForSyntheticSource(),
|
|
||||||
delegate.expectedForBlockLoader(),
|
|
||||||
b -> {
|
|
||||||
b.field("type", "text");
|
|
||||||
b.field("store", true);
|
|
||||||
if (indexText == false) {
|
|
||||||
b.field("index", false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
);
|
|
||||||
}
|
|
||||||
// We'll load from _source if ignore_above is defined, otherwise we load from the keyword field.
|
|
||||||
boolean loadingFromSource = ignoreAbove != null;
|
|
||||||
SyntheticSourceExample delegate = keywordSupport.example(maxValues, loadingFromSource);
|
|
||||||
return new SyntheticSourceExample(
|
|
||||||
delegate.inputValue(),
|
|
||||||
delegate.expectedForSyntheticSource(),
|
|
||||||
delegate.expectedForBlockLoader(),
|
|
||||||
b -> {
|
|
||||||
b.field("type", "text");
|
|
||||||
if (indexText == false) {
|
|
||||||
b.field("index", false);
|
|
||||||
}
|
|
||||||
b.startObject("fields");
|
|
||||||
{
|
|
||||||
b.startObject(randomAlphaOfLength(4));
|
|
||||||
delegate.mapping().accept(b);
|
|
||||||
b.endObject();
|
|
||||||
}
|
|
||||||
b.endObject();
|
|
||||||
}
|
|
||||||
);
|
|
||||||
}
|
|
||||||
|
|
||||||
@Override
|
|
||||||
public List<SyntheticSourceInvalidExample> invalidExample() throws IOException {
|
|
||||||
Matcher<String> err = equalTo(
|
|
||||||
"field [field] of type [text] doesn't support synthetic source unless it is stored or"
|
|
||||||
+ " has a sub-field of type [keyword] with doc values or stored and without a normalizer"
|
|
||||||
);
|
|
||||||
return List.of(
|
|
||||||
new SyntheticSourceInvalidExample(err, TextFieldMapperTests.this::minimalMapping),
|
|
||||||
new SyntheticSourceInvalidExample(err, b -> {
|
|
||||||
b.field("type", "text");
|
|
||||||
b.startObject("fields");
|
|
||||||
{
|
|
||||||
b.startObject("l");
|
|
||||||
b.field("type", "long");
|
|
||||||
b.endObject();
|
|
||||||
}
|
|
||||||
b.endObject();
|
|
||||||
}),
|
|
||||||
new SyntheticSourceInvalidExample(err, b -> {
|
|
||||||
b.field("type", "text");
|
|
||||||
b.startObject("fields");
|
|
||||||
{
|
|
||||||
b.startObject("kwd");
|
|
||||||
b.field("type", "keyword");
|
|
||||||
b.field("normalizer", "lowercase");
|
|
||||||
b.endObject();
|
|
||||||
}
|
|
||||||
b.endObject();
|
|
||||||
}),
|
|
||||||
new SyntheticSourceInvalidExample(err, b -> {
|
|
||||||
b.field("type", "text");
|
|
||||||
b.startObject("fields");
|
|
||||||
{
|
|
||||||
b.startObject("kwd");
|
|
||||||
b.field("type", "keyword");
|
|
||||||
b.field("doc_values", "false");
|
|
||||||
b.endObject();
|
|
||||||
}
|
|
||||||
b.endObject();
|
|
||||||
})
|
|
||||||
);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected Function<Object, Object> loadBlockExpected(BlockReaderSupport blockReaderSupport, boolean columnReader) {
|
protected Function<Object, Object> loadBlockExpected(BlockReaderSupport blockReaderSupport, boolean columnReader) {
|
||||||
if (nullLoaderExpected(blockReaderSupport.mapper(), blockReaderSupport.loaderFieldName())) {
|
return TextFieldFamilySyntheticSourceTestSetup.loadBlockExpected(blockReaderSupport, columnReader);
|
||||||
return null;
|
|
||||||
}
|
|
||||||
return v -> ((BytesRef) v).utf8ToString();
|
|
||||||
}
|
|
||||||
|
|
||||||
private boolean nullLoaderExpected(MapperService mapper, String fieldName) {
|
|
||||||
MappedFieldType type = mapper.fieldType(fieldName);
|
|
||||||
if (type instanceof TextFieldType t) {
|
|
||||||
if (t.isSyntheticSource() == false || t.canUseSyntheticSourceDelegateForQuerying() || t.isStored()) {
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
String parentField = mapper.mappingLookup().parentField(fieldName);
|
|
||||||
return parentField == null || nullLoaderExpected(mapper, parentField);
|
|
||||||
}
|
|
||||||
return false;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -1300,9 +1191,8 @@ public class TextFieldMapperTests extends MapperTestCase {
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader)
|
protected void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
|
||||||
throws IOException {
|
TextFieldFamilySyntheticSourceTestSetup.validateRoundTripReader(syntheticSource, reader, roundTripReader);
|
||||||
// Disabled because it currently fails
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public void testUnknownAnalyzerOnLegacyIndex() throws IOException {
|
public void testUnknownAnalyzerOnLegacyIndex() throws IOException {
|
||||||
|
@ -1433,21 +1323,7 @@ public class TextFieldMapperTests extends MapperTestCase {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
|
protected BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
|
||||||
MappedFieldType ft = mapper.fieldType(loaderFieldName);
|
return TextFieldFamilySyntheticSourceTestSetup.getSupportedReaders(mapper, loaderFieldName);
|
||||||
String parentName = mapper.mappingLookup().parentField(ft.name());
|
|
||||||
if (parentName == null) {
|
|
||||||
TextFieldMapper.TextFieldType text = (TextFieldType) ft;
|
|
||||||
boolean supportsColumnAtATimeReader = text.syntheticSourceDelegate() != null
|
|
||||||
&& text.syntheticSourceDelegate().hasDocValues()
|
|
||||||
&& text.canUseSyntheticSourceDelegateForQuerying();
|
|
||||||
return new BlockReaderSupport(supportsColumnAtATimeReader, mapper, loaderFieldName);
|
|
||||||
}
|
|
||||||
MappedFieldType parent = mapper.fieldType(parentName);
|
|
||||||
if (false == parent.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
|
|
||||||
throw new UnsupportedOperationException();
|
|
||||||
}
|
|
||||||
KeywordFieldMapper.KeywordFieldType kwd = (KeywordFieldMapper.KeywordFieldType) parent;
|
|
||||||
return new BlockReaderSupport(kwd.hasDocValues(), mapper, loaderFieldName);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
public void testBlockLoaderFromParentColumnReader() throws IOException {
|
public void testBlockLoaderFromParentColumnReader() throws IOException {
|
||||||
|
@ -1460,7 +1336,7 @@ public class TextFieldMapperTests extends MapperTestCase {
|
||||||
|
|
||||||
private void testBlockLoaderFromParent(boolean columnReader, boolean syntheticSource) throws IOException {
|
private void testBlockLoaderFromParent(boolean columnReader, boolean syntheticSource) throws IOException {
|
||||||
boolean storeParent = randomBoolean();
|
boolean storeParent = randomBoolean();
|
||||||
KeywordFieldMapperTests.KeywordSyntheticSourceSupport kwdSupport = new KeywordFieldMapperTests.KeywordSyntheticSourceSupport(
|
KeywordFieldSyntheticSourceSupport kwdSupport = new KeywordFieldSyntheticSourceSupport(
|
||||||
null,
|
null,
|
||||||
storeParent,
|
storeParent,
|
||||||
null,
|
null,
|
||||||
|
|
|
@ -0,0 +1,126 @@
|
||||||
|
/*
|
||||||
|
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||||
|
* or more contributor license agreements. Licensed under the Elastic License
|
||||||
|
* 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||||
|
* in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||||
|
* Side Public License, v 1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
package org.elasticsearch.index.mapper;
|
||||||
|
|
||||||
|
import org.apache.lucene.tests.util.LuceneTestCase;
|
||||||
|
import org.elasticsearch.core.Tuple;
|
||||||
|
import org.elasticsearch.test.ESTestCase;
|
||||||
|
import org.elasticsearch.xcontent.XContentBuilder;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.HashSet;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.stream.Collectors;
|
||||||
|
|
||||||
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
|
|
||||||
|
public class KeywordFieldSyntheticSourceSupport implements MapperTestCase.SyntheticSourceSupport {
|
||||||
|
private final Integer ignoreAbove;
|
||||||
|
private final boolean allIgnored;
|
||||||
|
private final boolean store;
|
||||||
|
private final boolean docValues;
|
||||||
|
private final String nullValue;
|
||||||
|
private final boolean exampleSortsUsingIgnoreAbove;
|
||||||
|
|
||||||
|
KeywordFieldSyntheticSourceSupport(Integer ignoreAbove, boolean store, String nullValue, boolean exampleSortsUsingIgnoreAbove) {
|
||||||
|
this.ignoreAbove = ignoreAbove;
|
||||||
|
this.allIgnored = ignoreAbove != null && LuceneTestCase.rarely();
|
||||||
|
this.store = store;
|
||||||
|
this.nullValue = nullValue;
|
||||||
|
this.exampleSortsUsingIgnoreAbove = exampleSortsUsingIgnoreAbove;
|
||||||
|
this.docValues = store ? ESTestCase.randomBoolean() : true;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public MapperTestCase.SyntheticSourceExample example(int maxValues) {
|
||||||
|
return example(maxValues, false);
|
||||||
|
}
|
||||||
|
|
||||||
|
public MapperTestCase.SyntheticSourceExample example(int maxValues, boolean loadBlockFromSource) {
|
||||||
|
if (ESTestCase.randomBoolean()) {
|
||||||
|
Tuple<String, String> v = generateValue();
|
||||||
|
Object loadBlock = v.v2();
|
||||||
|
if (loadBlockFromSource == false && ignoreAbove != null && v.v2().length() > ignoreAbove) {
|
||||||
|
loadBlock = null;
|
||||||
|
}
|
||||||
|
return new MapperTestCase.SyntheticSourceExample(v.v1(), v.v2(), loadBlock, this::mapping);
|
||||||
|
}
|
||||||
|
List<Tuple<String, String>> values = ESTestCase.randomList(1, maxValues, this::generateValue);
|
||||||
|
List<String> in = values.stream().map(Tuple::v1).toList();
|
||||||
|
List<String> outPrimary = new ArrayList<>();
|
||||||
|
List<String> outExtraValues = new ArrayList<>();
|
||||||
|
values.stream().map(Tuple::v2).forEach(v -> {
|
||||||
|
if (exampleSortsUsingIgnoreAbove && ignoreAbove != null && v.length() > ignoreAbove) {
|
||||||
|
outExtraValues.add(v);
|
||||||
|
} else {
|
||||||
|
outPrimary.add(v);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
List<String> outList = store ? outPrimary : new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
|
||||||
|
List<String> loadBlock;
|
||||||
|
if (loadBlockFromSource) {
|
||||||
|
// The block loader infrastructure will never return nulls. Just zap them all.
|
||||||
|
loadBlock = in.stream().filter(m -> m != null).toList();
|
||||||
|
} else if (docValues) {
|
||||||
|
loadBlock = new HashSet<>(outPrimary).stream().sorted().collect(Collectors.toList());
|
||||||
|
} else {
|
||||||
|
loadBlock = List.copyOf(outList);
|
||||||
|
}
|
||||||
|
Object loadBlockResult = loadBlock.size() == 1 ? loadBlock.get(0) : loadBlock;
|
||||||
|
outList.addAll(outExtraValues);
|
||||||
|
Object out = outList.size() == 1 ? outList.get(0) : outList;
|
||||||
|
return new MapperTestCase.SyntheticSourceExample(in, out, loadBlockResult, this::mapping);
|
||||||
|
}
|
||||||
|
|
||||||
|
private Tuple<String, String> generateValue() {
|
||||||
|
if (nullValue != null && ESTestCase.randomBoolean()) {
|
||||||
|
return Tuple.tuple(null, nullValue);
|
||||||
|
}
|
||||||
|
int length = 5;
|
||||||
|
if (ignoreAbove != null && (allIgnored || ESTestCase.randomBoolean())) {
|
||||||
|
length = ignoreAbove + 5;
|
||||||
|
}
|
||||||
|
String v = ESTestCase.randomAlphaOfLength(length);
|
||||||
|
return Tuple.tuple(v, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
private void mapping(XContentBuilder b) throws IOException {
|
||||||
|
b.field("type", "keyword");
|
||||||
|
if (nullValue != null) {
|
||||||
|
b.field("null_value", nullValue);
|
||||||
|
}
|
||||||
|
if (ignoreAbove != null) {
|
||||||
|
b.field("ignore_above", ignoreAbove);
|
||||||
|
}
|
||||||
|
if (store) {
|
||||||
|
b.field("store", true);
|
||||||
|
}
|
||||||
|
if (docValues == false) {
|
||||||
|
b.field("doc_values", false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<MapperTestCase.SyntheticSourceInvalidExample> invalidExample() throws IOException {
|
||||||
|
return List.of(
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(
|
||||||
|
equalTo(
|
||||||
|
"field [field] of type [keyword] doesn't support synthetic source because "
|
||||||
|
+ "it doesn't have doc values and isn't stored"
|
||||||
|
),
|
||||||
|
b -> b.field("type", "keyword").field("doc_values", false)
|
||||||
|
),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(
|
||||||
|
equalTo("field [field] of type [keyword] doesn't support synthetic source because it declares a normalizer"),
|
||||||
|
b -> b.field("type", "keyword").field("normalizer", "lowercase")
|
||||||
|
)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
|
@ -1286,7 +1286,7 @@ public abstract class MapperTestCase extends MapperServiceTestCase {
|
||||||
* @param loaderFieldName the field name to use for loading the field
|
* @param loaderFieldName the field name to use for loading the field
|
||||||
*/
|
*/
|
||||||
public record BlockReaderSupport(boolean columnAtATimeReader, boolean syntheticSource, MapperService mapper, String loaderFieldName) {
|
public record BlockReaderSupport(boolean columnAtATimeReader, boolean syntheticSource, MapperService mapper, String loaderFieldName) {
|
||||||
BlockReaderSupport(boolean columnAtATimeReader, MapperService mapper, String loaderFieldName) {
|
public BlockReaderSupport(boolean columnAtATimeReader, MapperService mapper, String loaderFieldName) {
|
||||||
this(columnAtATimeReader, true, mapper, loaderFieldName);
|
this(columnAtATimeReader, true, mapper, loaderFieldName);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,207 @@
|
||||||
|
/*
|
||||||
|
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||||
|
* or more contributor license agreements. Licensed under the Elastic License
|
||||||
|
* 2.0 and the Server Side Public License, v 1; you may not use this file except
|
||||||
|
* in compliance with, at your election, the Elastic License 2.0 or the Server
|
||||||
|
* Side Public License, v 1.
|
||||||
|
*/
|
||||||
|
|
||||||
|
package org.elasticsearch.index.mapper;
|
||||||
|
|
||||||
|
import org.apache.lucene.index.DirectoryReader;
|
||||||
|
import org.apache.lucene.util.BytesRef;
|
||||||
|
import org.hamcrest.Matcher;
|
||||||
|
|
||||||
|
import java.io.IOException;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Locale;
|
||||||
|
import java.util.function.Function;
|
||||||
|
|
||||||
|
import static org.elasticsearch.test.ESTestCase.between;
|
||||||
|
import static org.elasticsearch.test.ESTestCase.randomAlphaOfLength;
|
||||||
|
import static org.elasticsearch.test.ESTestCase.randomBoolean;
|
||||||
|
import static org.hamcrest.Matchers.equalTo;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Provides functionality needed to test synthetic source support in text and text-like fields (e.g. "text", "annotated_text").
|
||||||
|
*/
|
||||||
|
public final class TextFieldFamilySyntheticSourceTestSetup {
|
||||||
|
public static MapperTestCase.SyntheticSourceSupport syntheticSourceSupport(String fieldType, boolean supportsCustomIndexConfiguration) {
|
||||||
|
return new TextFieldFamilySyntheticSourceSupport(fieldType, supportsCustomIndexConfiguration);
|
||||||
|
}
|
||||||
|
|
||||||
|
public static MapperTestCase.BlockReaderSupport getSupportedReaders(MapperService mapper, String loaderFieldName) {
|
||||||
|
MappedFieldType ft = mapper.fieldType(loaderFieldName);
|
||||||
|
String parentName = mapper.mappingLookup().parentField(ft.name());
|
||||||
|
if (parentName == null) {
|
||||||
|
TextFieldMapper.TextFieldType text = (TextFieldMapper.TextFieldType) ft;
|
||||||
|
boolean supportsColumnAtATimeReader = text.syntheticSourceDelegate() != null
|
||||||
|
&& text.syntheticSourceDelegate().hasDocValues()
|
||||||
|
&& text.canUseSyntheticSourceDelegateForQuerying();
|
||||||
|
return new MapperTestCase.BlockReaderSupport(supportsColumnAtATimeReader, mapper, loaderFieldName);
|
||||||
|
}
|
||||||
|
MappedFieldType parent = mapper.fieldType(parentName);
|
||||||
|
if (false == parent.typeName().equals(KeywordFieldMapper.CONTENT_TYPE)) {
|
||||||
|
throw new UnsupportedOperationException();
|
||||||
|
}
|
||||||
|
KeywordFieldMapper.KeywordFieldType kwd = (KeywordFieldMapper.KeywordFieldType) parent;
|
||||||
|
return new MapperTestCase.BlockReaderSupport(kwd.hasDocValues(), mapper, loaderFieldName);
|
||||||
|
}
|
||||||
|
|
||||||
|
public static Function<Object, Object> loadBlockExpected(MapperTestCase.BlockReaderSupport blockReaderSupport, boolean columnReader) {
|
||||||
|
if (nullLoaderExpected(blockReaderSupport.mapper(), blockReaderSupport.loaderFieldName())) {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
return v -> ((BytesRef) v).utf8ToString();
|
||||||
|
}
|
||||||
|
|
||||||
|
private static boolean nullLoaderExpected(MapperService mapper, String fieldName) {
|
||||||
|
MappedFieldType type = mapper.fieldType(fieldName);
|
||||||
|
if (type instanceof TextFieldMapper.TextFieldType t) {
|
||||||
|
if (t.isSyntheticSource() == false || t.canUseSyntheticSourceDelegateForQuerying() || t.isStored()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
String parentField = mapper.mappingLookup().parentField(fieldName);
|
||||||
|
return parentField == null || nullLoaderExpected(mapper, parentField);
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
public static void validateRoundTripReader(String syntheticSource, DirectoryReader reader, DirectoryReader roundTripReader) {
|
||||||
|
// `reader` here is reader of original document and `roundTripReader` reads document
|
||||||
|
// created from synthetic source.
|
||||||
|
// This check fails when synthetic source is constructed using keyword subfield
|
||||||
|
// since in that case values are sorted (due to being read from doc values) but original document isn't.
|
||||||
|
//
|
||||||
|
// So it is disabled.
|
||||||
|
}
|
||||||
|
|
||||||
|
private static class TextFieldFamilySyntheticSourceSupport implements MapperTestCase.SyntheticSourceSupport {
|
||||||
|
private final String fieldType;
|
||||||
|
private final boolean storeTextField;
|
||||||
|
private final boolean storedKeywordField;
|
||||||
|
private final boolean indexText;
|
||||||
|
private final Integer ignoreAbove;
|
||||||
|
private final KeywordFieldSyntheticSourceSupport keywordSupport;
|
||||||
|
|
||||||
|
TextFieldFamilySyntheticSourceSupport(String fieldType, boolean supportsCustomIndexConfiguration) {
|
||||||
|
this.fieldType = fieldType;
|
||||||
|
this.storeTextField = randomBoolean();
|
||||||
|
this.storedKeywordField = storeTextField || randomBoolean();
|
||||||
|
this.indexText = supportsCustomIndexConfiguration ? randomBoolean() : true;
|
||||||
|
this.ignoreAbove = randomBoolean() ? null : between(10, 100);
|
||||||
|
this.keywordSupport = new KeywordFieldSyntheticSourceSupport(ignoreAbove, storedKeywordField, null, false == storeTextField);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public MapperTestCase.SyntheticSourceExample example(int maxValues) {
|
||||||
|
if (storeTextField) {
|
||||||
|
MapperTestCase.SyntheticSourceExample delegate = keywordSupport.example(maxValues, true);
|
||||||
|
return new MapperTestCase.SyntheticSourceExample(
|
||||||
|
delegate.inputValue(),
|
||||||
|
delegate.expectedForSyntheticSource(),
|
||||||
|
delegate.expectedForBlockLoader(),
|
||||||
|
b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.field("store", true);
|
||||||
|
if (indexText == false) {
|
||||||
|
b.field("index", false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// We'll load from _source if ignore_above is defined, otherwise we load from the keyword field.
|
||||||
|
boolean loadingFromSource = ignoreAbove != null;
|
||||||
|
MapperTestCase.SyntheticSourceExample delegate = keywordSupport.example(maxValues, loadingFromSource);
|
||||||
|
return new MapperTestCase.SyntheticSourceExample(
|
||||||
|
delegate.inputValue(),
|
||||||
|
delegate.expectedForSyntheticSource(),
|
||||||
|
delegate.expectedForBlockLoader(),
|
||||||
|
b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
if (indexText == false) {
|
||||||
|
b.field("index", false);
|
||||||
|
}
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject(randomAlphaOfLength(4));
|
||||||
|
delegate.mapping().accept(b);
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public List<MapperTestCase.SyntheticSourceInvalidExample> invalidExample() throws IOException {
|
||||||
|
Matcher<String> err = equalTo(
|
||||||
|
String.format(
|
||||||
|
Locale.ROOT,
|
||||||
|
"field [field] of type [%s] doesn't support synthetic source unless it is stored or"
|
||||||
|
+ " has a sub-field of type [keyword] with doc values or stored and without a normalizer",
|
||||||
|
fieldType
|
||||||
|
)
|
||||||
|
);
|
||||||
|
return List.of(
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> b.field("type", fieldType)),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject("l");
|
||||||
|
b.field("type", "long");
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject("kwd");
|
||||||
|
b.field("type", "keyword");
|
||||||
|
b.field("normalizer", "lowercase");
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject("kwd");
|
||||||
|
b.field("type", "keyword");
|
||||||
|
b.field("doc_values", "false");
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.field("store", "false");
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject("kwd");
|
||||||
|
b.field("type", "keyword");
|
||||||
|
b.field("doc_values", "false");
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
}),
|
||||||
|
new MapperTestCase.SyntheticSourceInvalidExample(err, b -> {
|
||||||
|
b.field("type", fieldType);
|
||||||
|
b.startObject("fields");
|
||||||
|
{
|
||||||
|
b.startObject("kwd");
|
||||||
|
b.field("type", "keyword");
|
||||||
|
b.field("doc_values", "false");
|
||||||
|
b.field("store", "false");
|
||||||
|
b.endObject();
|
||||||
|
}
|
||||||
|
b.endObject();
|
||||||
|
})
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
Loading…
Add table
Add a link
Reference in a new issue