* Refactor build params for FieldMapper
* more mappers and tests
* more mappers
* more mappers
* spotless
* spotless
* stored by default
* Revert "stored by default"
This reverts commit bbd247d64b.
* restore storeIgnored
* sync
* list valid values for SourceKeepMode
* small refactoring
* spotless
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while the MappedFieldType name does.
We have renamed Mapper.Builder#name to leafName (#109971) and Mapper#simpleName to leafName (#110030). This commit renames Mapper#name to fullPath for clarity
This required some adjustments in FieldAliasMapper to avoid confusion between the existing path method and fullPath. I renamed path to targetPath for clarity.
ObjectMapper already had a fullPath method that returned name, and was effectively a copy of name, so it could be removed.
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does. We have method called simpleName to signal that, but leafName signals that more clearly and aligns with
the name we have recently introduced in Mapper.Builder (renamed from name to leafName).
Relates to #109971
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does.
Today, we're counting all mappers, including mappers for subfields that
aren't explicitly added to the mapping towards the field limit.
This means that some field types, such as `search_as_you_type` or
`percolator` count as more than one field even though that's not
apparent to users as they're just defining them as a single field in the
mapping.
This change makes it so that each field mapper only counts as one. We're
still counting multi-fields.
This makes it easier to understand for users why the field limit is hit.
~In addition to that, it also simplifies
https://github.com/elastic/elasticsearch/pull/96235 as it makes the
implementation of `Mapper.Builder#getTotalFieldsCount` much easier and
easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk
of over- or under-estimating the field count of a `Mapper.Builder` in
`DocumentParserContext#addDynamicMapper`, which in turn reduces the risk
of data loss due to the issue described here:
https://github.com/elastic/elasticsearch/pull/96235#discussion_r1402495749.~
*Edit: due to https://github.com/elastic/elasticsearch/pull/103865, we
don't need an implementation of `getTotalFieldsCount` or `mapperSize` in
`Mapper.Builder`. Still, this PR more closely aligns
`Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`,
which `DocumentParserContext#addDynamicMapper` uses to determine
whether the field limit is hit*
A potential risk of this is that we're now effectively allowing more
fields in the mapping. It may be surprising to users that more fields
can be added to a mapping. Although, I'd not expect negative
consequences from that. Generally, I'd expect users to be happy about
any change that reduces the risk of data loss.
We could also think about whether to apply the new counting logic only
to new indices (depending on the `IndexVersion`). However, that would
add more complexity and I'm not convinced about the value. We'd then
need to maintain two different ways of counting fields and also require
passing in the `IndexVersion` to `MappingLookup` which previously didn't
require the `IndexVersion`.
This PR is meant as a conversation starter. It would also simplify
https://github.com/elastic/elasticsearch/pull/96235 but I don't think
this blocks that PR in any way.
I'm curious about the opinion of @javanna and @jpountz on this.
Fixes the remaining test leaks in datastream and mustache. Mustache will
require a separate PR to make the search template response classes
properly ref-counted but we can prepare for that here already.
for https://github.com/elastic/elasticsearch/issues/102030
We're leaking quite a few of these parsers. That doesn't seem to be much
of a problem but results in some memory inefficiencies in Jackson here
and there. This PR bulk fixes a bunch of instances that I could easily
automatically fix. I'll open a follow-up for closing the parser on the
document parsing context which also suffers from this but is non-trivial
to fix.
Same as #101175, shorten `client().prepareIndex(index)` and
`client().prepareIndex().setIndex(index)` via a test utility.
Saves lots of code now and sets up some follow-up simplifcations.
Just found that we have a lot of inconsistency and needless verbosity
here in tests. We can just use `assertAcked` in a couple spots
to save `.get`, `.actionGet` etc., especially with the signature
change I added here.
Similar to the TransportVersions holder class, IndexVersions is the new
place to contain all constants for IndexVersion. This commit moves all
existing constants to the new class. It is purely mechanical.
The only reason this method is throwing an exception is because the
method ByteArrayOutputStream#close() is declaring it although it is a
noop. Therefore it can be safely ignored.
Thanks @romseygeek for bringing into attention.
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
Data-stream mappings require a @timestamp field to be present and configured
as a date with a specific set of parameters. The index-wide setting of
ignore_malformed can cause problems here if it is set to true, because it needs
to be false for the @timestamp field.
This commit detects if a set of mappings is configured for a datastream by checking
for the presence of a DataStreamTimestampFieldMapper metadata field, and passes
that information on during Mapper construction as part of the MapperBuilderContext.
DateFieldMapper.Builder now checks to see if it is specifically for a data stream timestamp
field, and if it is, sets ignore_malformed to false.
Relates to #96051
The copyTo builder is really hard to reason about when it comes to
mapper merging, because the `reset` method would actually mutate an
existing mapper. That seems dangerous and the whole thing is quite
inefficient as well. -> this PR just removes it and uses a copy
constructor for copy on write, avoiding instance creation on mapper
merges here and there and leaving no doubt about these things being
immutable.
We can dry up and shorten a bunch of spots by using the overloads
we already have in the code. Also, added a new overload for writing a
map with string keys.
Noticed this when benchmarking FieldCaps transport messages.
The `writeList` alias just adds more lines to the code and makes
profiling more annoying to read, lets remove it.
When the subobject property is set to false and we encounter an object
while parsing we need a way to understand if its FieldMapper is able to
parse an object. If that's the case we can provide the entire object to
the FieldMapper otherwise its name becomes the part of the dotted field
name of each internal value.
This has being achieved by adding the `supportsParsingObject()` method
to the `FieldMapper` class. This method defaults to `false` since the
majority of FieldMappers do not support parsing objects and is
overwritten to return `true` by the ones that do support objects.
Replacing the remaining usages that I could automatically replace
and a couple that I did by hand in this PR.
Also, added the same shortcut to the single node tests to save some
duplication there.
Document parsing methods currently throw MapperParsingException. This
isn't very helpful, as it doesn't contain any information about where the parse
error happened - it is designed for parsing mappings, which are realised into
java maps before being examined. This commit introduces a new exception
specifically for document parsing that extends XContentException, so that
it reports the current position of the parser as part of its error message.
Fixes#85083
We currently work out whether or not a mapper should be storing additional
values for synthetic source by looking at the DocumentParserContext. However,
this value does not change for the lifetime of the mapper - it is defined by
metadata on the root mapper and is immutable - and DocumentParserContext
feels like the wrong place for this information as it holds context specific
to the document being parsed.
This commit moves synthetic source status information from DocumentParserContext
to MapperBuilderContext instead. Mappers which need this information retrieve
it at build time and hold it on final fields.
#91043 surfaced some inconsistencies in field names validation between mapping parsing and document parsing. This commit centralizes the validation of field names when parsing mappings to a single place, and attempts to address some of the inconsistencies.
- field names that contain only whitespaces are no longer accepted in mappings. It was previously possible to map a field containing only whitespaces but a document containing such a field would be rejected. We start rejecting mappings with fields that contain only whitespaces for indices that are created from 8.6 on, just in case existing indices contain such fields. This is true also for dotted fields like top. .foo when subobjects are enabled.
- A clear error message is thrown when mappings hold fields with names made of dots only. An ArrayIndexOutOfBoundsException was thrown before
- The error thrown when a field name is empty is now unified with that thrown when an empty field name is provided as part of a document (field name cannot be an empty string)
- When parsing documents (with subobjects set to false), distinguish between the error thrown when a field name is empty and that thrown when a field name is made of whitespaces only
- When parsing documents (with subobjects set to false), accept field names that are made of dots only (these are already accepted in mappings), effectively reverts #90950
This adds support for `ignore_malformed` to numeric fields other than
`scaled_float` in synthetic `_source`. Their values are saved to a
stored field and loaded to render the `_source`.
This makes sure that all field types that support `ignore_malfored` do
so in the same way.
Production changes:
* All mapper has an `ignoreMalformed` method that must return `true` if
the field accepts the `ignore_malformed` mapping parameter was
configured. It defaults to `false` because many fields either don't
have a concept of "malformed" value or don't have the ability to
ignore malformed values.
* Fix the `scaled_float` field to store it's field name in `_ignored` if
it ignores any malfored values. This is how all other field mappers
work.
Test changes:
* `MapperTestCase` forces subclasses to declare if their
`supportIgnoreMalformed` or not.
* If `MapperTestCase` subclasses `supportIgnoreMalfored` they must
define some `exampleMalformedValues`.
* `MapperTestCase` always grows three new tests:
* One that creates a field without setting `ignore_malformed` and
verifies that all `exampleMalformedValues` throw expected errors
* On that explicitly configured `ignore_malformed` to false and, if
`supportIgnoreMalformed` it verifies the errors again. If not
`supportIgnoreMalformed` it verifies that the parameter is unknown.
* On that explicitly configured `ignore_malformed` to true and, if
`supportIgnoreMalformed` it verifies that parsing doesn't produce
errors and correctly produces `_ignored`. If not
`supportIgnoreMalformed` it verifies that the parameter is unknown.
* Moved some subclasesses of `MapperTestCase` from
`internalClusterTests` to `tests`. This isn't strictly required but
that's the right place for them.
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support.
- Use sha256 in favor of sha1 as sha1 is not considered safe these days.
Closes https://github.com/elastic/elasticsearch/issues/69736
This PR uses Lucene-9.3 snapshot in Elasticsearch 8.4. Noticeable changes in this Lucene snapshot:
- Merge-on-refresh (disabled)
- No more pathological merging
- SortedSetDocValues#count for value_count aggs