This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
Follow-up from #77144 (comment) with converting id/_id to always be strings instead of integers. This makes the type value in the Elasticsearch specification be only string instead of string | number.
this change was generated using following command on ubuntu
find . -type f -name "*.yml" -print0 | xargs -0 sed -i -r 's/([^a-zA-Z0-9_\.]id|[^a-zA-Z0-9_]_id):(\s)([0-9]+)/\1:\2"\3"/g'
relates #82681
This commit changes default deprecation logger level to CRITICAL, where default means deprecations emitted by DeprecationLogger#critical method.
It also introduces WARN deprecations which are emitted by DeprecationLogger#warn Those log lines emitted at WARN are meant to indicate that a functionality is deprecated but will not break at next major version.
relates #76754
Mapper.build() currently takes a ContentPath object that it can use to generate
field type names that will include its parent names. We would like to expand field types
to include more information about their parents, and ContentPath does not hold this
information. This commit replaces the ContentPath parameter with a new
MapperBuilderContext, which currently holds only the content path information but
can be expanded in future to hold parent relationship information.
Relates to #75474
* Introduce simple public yaml-rest-test plugin (#76554)
This introduces a basic public yaml rest test plugin that is supposed to be used by external
elasticsearch plugin authors. This is driven by #76215
- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins
Co-authored-by: Mark Vieira <portugee@gmail.com>
* Fix test assertion after output normalization
Co-authored-by: Mark Vieira <portugee@gmail.com>
This PR implements support for multiple validators to a FieldMapper.Parameter.
The Parameter#setValidator method was replaced by Parameter#addValidator that can be called multipled times
to add validation to a parameter.
All validators of a parameter will be executed in the same order as they have been added and if any of them fails all validation will failed.
* Reformatting to keep Checkstyle after formatting
* Configure spotless everywhere, and disable the tasks if necessary
* Add XContentBuilder helpers, fix test
ParseContext is used to parse documents. It was easily confused with ParserContext (now renamed to MappingParserContext) which is instead used to parse mappings.
To remove any confusion, this commit renames ParseContext to DocumentParserContext and adapts its subclasses accordingly.
Currently the `_ignore` field indexes and stores the names of every field in a document that has been ignored
because eg. it was malformed. The `ignore_above` option for keyword-type fields
serves a somewhat similar purpose, so this change add logix that adds these
fields to the "_ignored" field as well for `keyword`, `wildcard` and
`icu_collation_keyword` fields.
Closes#74228
Modularization of the JDK has been ongoing for several years. Recently
in Java 16 the JDK began enforcing module boundaries by default. While
Elasticsearch does not yet use the module system directly, there are
some side effects even for those projects not modularized (eg #73517).
Before we can even begin to think about how to modularize, we must
Prepare The Way by enforcing packages only exist in a single jar file,
since the module system does not allow packages to coexist in multiple
modules.
This commit adds a precommit check to the build which detects split
packages. The expectation is that we will add the existing split
packages to the ignore list so that any new classes will not exacerbate
the problem, and the work to cleanup these split packages can be
parallelized.
relates #73525
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
backport #73909
Upgrades to Lucene-8.9 snapshot which includes:
- LUCENE-9507: Custom order for leaves
- LUCENE-9935: Enable bulk merge for stored fields with index sort
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.
This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.
Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().
GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.
Relates to #56063
The FieldNamesFieldMapper is a metadata mapper defining a field that
can be used for exists queries if a mapper does not use doc values or
norms. Currently, data is added to it via a special method on FieldMapper
that pulls the metadata mapper from a mapping lookup, checks to see
if it is enabled, and then adds the relevant value to a lucene document.
This is one of only two places that pulls a metadata mapper from the
MappingLookup, and it would be nice to remove this method. This commit
refactors field name handling by instead storing the names of fields to
index in the fieldnames field in a set on the ParseContext, and then
building the field itself in FieldNamesFieldMapper.postParse(). This means
that all of the responsibility for enabling indexing, etc, is handled within
the metadata mapper itself.
Wildcard queries on text fields should not apply the fields analyzer to the
search query. However, we accidentally enabled this in #53127 by moving the
query normalization to the StringFieldType super type. This change fixes this by
separating the notion of normalization and case insensitivity (as implemented in
the `case_insensitive` flag). This is done because we still need to maintain
normalization of the query sting when the wildcard query method on the field type is
requested from the `query_string` query parser. Wildcard queries on keyword
fields should also continue to apply the fields normalizer, regardless of
whether the `case_insensitive` is set, because normalization could involve
something else than lowercasing (e.g. substituting umlauts like in the
GermanNormalizationFilter).
Closes#71403
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.
It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.
We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.
We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.
I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
Sort-of backport of #67443.
Closes#64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
This change upgrades to the latest Lucene 8.8.0 snapshot.
It also restores the compression on binary doc values that was lost in the last snapshot upgrade.
The compression is now configurable on binary doc values but we don't expose this functionality yet so this commit ensures that we pick the same compression mode as previous releases (BEST_COMPRESSION).
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.
Closes#64740.
This PR simplifies how the document source is passed to each fetch subphase. A
summary of the strategy:
* For each document, we try to eagerly load the source and store it on
`HitContext`. Most subphases that access source, like source filtering and
highlighting, use `HitContext`. For nested hits, we filter the parent source
and also store this source on `HitContext`.
* Only for non-nested documents, we also store the loaded source on
`QueryShardContext#lookup`. This allows subphases that access source through
`SearchLookup` to use the pre-loaded source when possible. This is now a common
occurrence, since runtime fields are supported in the 'fields' option and may
soon be supported in highlighting.
There is no longer a special `SearchLookup` just for the fetch phase. This was
not necessary and was mostly caused by a misunderstanding of how
`QueryShardContext` should be used.
Addresses #62511.
Mapper.BuilderContext is a simple wrapper around two objects, some
IndexSettings and a ContentPath. The IndexSettings are the same as
those provided in the ParserContext, so we can simplify things here
by removing them and just passing ContentPath directly to
Mapper.Builder#build()
The signature of MappedFieldType#valueFetcher requires MapperService as an argument which is unfortunate as that is one of the reasons why FetchContext exposes the whole MapperService.
Such use of MapperService can be replaced with exposing the QueryShardContext which encapsulates the MapperService.
Index-time analyzers are currently specified on the MappedFieldType. This
has a number of unfortunate consequences; for example, field mappers that
index data into implementation sub-fields, such as prefix or phrase
accelerators on text fields, need to expose these sub-fields as MappedFieldTypes,
which means that they then appear in field caps, are externally searchable,
etc. It also adds index-time logic to a class that should only be concerned
with search-time behaviour.
This commit removes references to the index analyzer from MappedFieldType.
Instead, FieldMappers that use the terms index can pass either a single analyzer
or a Map of fields to analyzers to their super constructor, which are then
exposed via a new FieldMapper#indexAnalyzers() method; all index-time analysis
is mediated through the delegating analyzer wrapper on MapperService.
In a follow-up, this will make it possible to register multiple field analyzers from
a single FieldMapper, removing the need for 'hidden' mapper implementations
on text field, parent joins, and elsewhere.
Now that all our FieldMapper implementations extend ParametrizedFieldMapper,
we can collapse the two classes together, and remove a load of cruft from
FieldMapper that is unused. In particular:
* we no longer need the lucene FieldType field on FieldMapper
* we no longer use clone() for merging, so we can remove it from all impls
* the serialization code in FieldMapper that assumes we're looking at text fields can go
Some supported field types don't support term queries, and throw exception in their termQuery method. That exception is either an IllegalArgumentException or a QueryShardException. There is logic in MatchQuery that skips the field or not depending on the exception that is thrown.
Also, such field types should hold a TextSearchInfo.NONE while that is not always the case.
With this commit we make the following changes:
- streamline using TextSearchInfo.NONE in all field types that don't support text queries
- standardize the exception being thrown when a field type does not support term queries to be IllegalArgumentException. Note that this is not a breaking change as both exceptions previously returned translated to 400 status code.
- Adapt the MatchQuery logic to skip fields that don't support term queries. There is no need to call termQuery passing an empty string and catch exceptions potentially thrown. We can rather check the TextSearchInfo which tells already whether the field supports text queries or not.
- add a test method to MapperTestCase that verifies the consistency of a field type by verifying that it is not searchable whenever it uses TextSearchInfo.NONE, while it is otherwise. This is what triggered all of the above changes.
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.
This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.
Follow-up to #62974.