Commit graph

359 commits

Author SHA1 Message Date
Parker Timmins
9aaba25d58
Simple version of patterned_text with a single doc value for arguments (#129292)
Initial version of patterned_text mapper. Behaves similarly to match_only_text. This version uses a single SortedSetDocValues for a template and another for arguments. It splits the message by delimiters, the classifies a token as an argument if it contains a digit. All arguments are concatenated and inserted as a single doc value. A single inverted index is used, without positions. Phrase queries are still possible, using the SourceConfirmedTextQuery, but are not fast.
2025-06-25 21:31:32 -05:00
Jordan Powers
5d1999781a
Use optimized text in match_only_text fields (#129371)
Follow-up to #126492 to use the json parsing optimizations for
match_only_text fields.

Relates to #129072.
2025-06-17 08:15:40 -07:00
Martijn van Groningen
a0cc698fa2
Update multi field stored by default index version check (#129386)
Relates to #129126
2025-06-17 12:20:38 +02:00
Simon Cooper
3988ee1935
Check positions on MultiPhraseQueries as well as phrase queries (#129326) 2025-06-12 16:05:07 +01:00
Ignacio Vera
f02a3c423f
Revert "Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (#128293)" (#129206)
This reverts commit de7c91c1d9.
2025-06-12 10:10:29 +02:00
Martijn van Groningen
33af83a0ca
Synthetic source: avoid storing multi fields of type text and match_only_text by default. (#129126)
Don't store text and match_only_text field by default when source mode is synthetic and a field is a multi field or when there is a suitable multi field.

Without this change, ES would store field otherwise twice in a multi-field configuration.

For example:

```
...
"os": {
  "properties": {
    "name": {
      "ignore_above": 1024,
      "type": "keyword",
      "fields": {
        "text": {
          "type": "match_only_text"
        }
      }
    }
...
```

In this case, two stored fields were added, one in case for the `name` field and one for `name.text` multi-field.
This change prevents this, and would never store a stored field when text or match_only_text field is a multi-field.
2025-06-10 16:32:47 +02:00
Benjamin Trent
2a44166a2c
Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732 (#128671)
* Applying Apache Lucene fix: https://github.com/apache/lucene/pull/14732

* fixing test

* fixing annot
2025-06-02 09:50:25 -04:00
Ignacio Vera
de7c91c1d9
Use IndexOrDocValuesQuery in NumberFieldType#termQuery implementations (#128293) 2025-05-23 16:58:50 +02:00
Oleksandr Kolomiiets
0c1b3acee2
Properly handle multi fields in block loaders with synthetic source enabled (#127483) 2025-04-30 09:33:35 -07:00
Benjamin Trent
3d67e0e7ca
Fix npe when using source confirmed text query against missing field (#127414)
We should check for the field and statistics actually existing when
checking matches and explanation with `match_only_text` fields

closes: https://github.com/elastic/elasticsearch/issues/125635
2025-04-30 03:05:01 +10:00
Oleksandr Kolomiiets
26e2261132
Remove legacy block loader test infrastructure (#127273) 2025-04-25 10:26:27 -07:00
Oleksandr Kolomiiets
5e2b199b94
[TEST] Move test data generation out of logsdb namespace (#119994) 2025-04-23 08:29:32 -07:00
Jordan Powers
71e74bdd66
Store arrays offsets for scaled float fields natively with synthetic source (#125793)
This patch builds on the work in #113757, #122999, #124594, #125529, and 
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
2025-03-28 20:26:29 +01:00
Oleksandr Kolomiiets
033d28e792
Use FallbackSyntheticSourceBlockLoader for shape and geo_shape (#124927) 2025-03-18 08:49:08 -07:00
Nik Everett
50aaa1c2a6
ESQL: Pragma to load from stored fields (#122891)
This creates a `pragma` you can use to request that fields load from a
stored field rather than doc values. It implements that pragma for
`keyword` and number fields.

We expect that, for some disk configuration and some number of fields,
that it's faster to load those fields from _source or stored fields than
it is to use doc values. Our default is doc values and on my laptop it's
*always* faster to use doc values. But we don't ship my laptop to every
cluster.

This will let us experiment and debug slow queries by trying to load
fields a different way.

You access this pragma with:
```
curl -HContent-Type:application/json -XPOST localhost:9200/_query?pretty -d '{
    "query": "FROM foo",
    "pragma": {
        "field_extract_preference": "STORED"
    }
}'
```

On a release build you'll need to add `"accept_pragma_risks": true`.
2025-03-12 09:40:42 -04:00
Oleksandr Kolomiiets
99262c6256
Use FallbackSyntheticSourceBlockLoader for boolean and date fields (#124050) 2025-03-05 11:43:47 -08:00
Gal Lalouche
a6e47ae85b
Refactor FieldCapabilities creation by adding a proper builder object (#121310)
Reduce boilerplate associated with creating `FieldCapabilities` instances.
Since it's a class with a huge number of fields, it makes sense to define a builder object, as that can also help with all the Boolean and null blindness going on.
Note while there is a static Builder class in `FieldCapabilities`, it is not a proper builder object (no setters, still need to pass a lot of otherwise default parameters) and also package-private. To avoid changing that, I defined a new `FieldCapabilitiesBuilder` class. I also went over the code and refactored places which used the old constructor.
2025-03-05 13:09:36 +01:00
Martijn van Groningen
086329c5cb
Tidy up some noise during indexing with synthetic source. (#123724) 2025-02-28 16:52:17 +00:00
kanoshiou
7326928502
Fix failed ScaledFloatFieldMapperTests (#123144) 2025-02-21 11:34:46 -08:00
kanoshiou
de41d5704b
ESQL: Fix precision of scaled_float field values retrieved from stored source (#122586) 2025-02-20 14:01:34 -08:00
Oleksandr Kolomiiets
ba8c5764f8
Use FallbackSyntheticSourceBlockLoader for unsigned_long and scaled_float fields (#122637) 2025-02-18 09:28:26 -08:00
Oleksandr Kolomiiets
b8d7e99cb9
Use FallbackSyntheticSourceBlockLoader for number fields (#122280) 2025-02-12 16:12:19 -08:00
Chris Hegarty
4baffe4de1
Upgrade to Lucene 10.1.0 (#119308)
This commit upgrades to Lucene 10.1.0.
2025-01-30 13:41:02 +00:00
Kostas Krikellas
8de9539e29
Lazy initialization for SyntheticSourceSupport.loader() (#120896)
* Lazy initialization for `SyntheticSourceSupport.loader()`

* [CI] Auto commit changes from spotless

* add missing

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2025-01-27 17:12:42 +02:00
Rene Groeschke
ba61f8c7f7
Update Gradle wrapper to 8.12 (#118683)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions
2024-12-30 15:34:24 +01:00
Armin Braun
e94f145350
Fix a bunch of non-final static fields (#119185)
Fixing almost all missing `final` spots, who knows maybe we get a small speedup from
some constant folding here and there.
2024-12-26 19:14:36 +01:00
Dimitris Rempapis
a514aad3c2
Fix/meta fields bad request (#117229)
400 rather a 5xx error is returned when _source / _seq_no / _feature / _nested_path / _field_names is requested, via fields
2024-12-03 10:58:20 +02:00
Oleksandr Kolomiiets
54db947020
Fix scaled_float test (#117662) 2024-11-28 07:33:35 -08:00
Oleksandr Kolomiiets
2b8e4e727c
Migrate mapper-related modules to internal-*-rest-test (#117298) 2024-11-23 00:35:24 +00:00
Rene Groeschke
f6ac6e1c3b
[Build] Remove deprecated BuildParams (#116984) 2024-11-22 16:30:57 +01:00
Rene Groeschke
13c8aaeffa
[Gradle] Remove static use of BuildParams (#115122)
Static fields dont do well in Gradle with configuration cache enabled.

- Use buildParams extension in build scripts
- Keep BuildParams.ci for now for easy serverless migration
-  Tweak testing doc
2024-11-15 17:58:57 +01:00
Kostas Krikellas
4573ab8ec1
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests - take 2 (#116072)
* Reapply "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)

This reverts commit e8bf344a28.

* [TEST] Replace _source.mode with index.mapping.source.mode in integration tests

* add reason

* add reason

* spotless

* revert unneeded
2024-11-04 09:39:34 +02:00
Kostas Krikellas
e8bf344a28
Revert "[TEST] Replace _source.mode with index.mapping.source.mode in integra…" (#116069)
This reverts commit a360757968.
2024-11-01 10:53:08 +02:00
Kostas Krikellas
a360757968
[TEST] Replace _source.mode with index.mapping.source.mode in integration tests (#115926)
* Replace _source.mode with index.mapping.source.mode in integration tests

* fix tests

* revert 40_source_mode_setting.yml
2024-11-01 09:46:06 +02:00
Nhat Nguyen
f3b34f3e34
Remove old synthetic source mapping config (#115889)
This change replaces the old synthetic source config in mappings with 
the newly introduced index setting.

Closes #115859
2024-10-30 09:15:16 -07:00
Martijn van Groningen
387062eb80
Sometimes delegate to SourceLoader in ValueSourceReaderOperator for required stored fields (#115114)
If source is required by a block loader then the StoredFieldsSpec that gets populated should be enhanced by SourceLoader#requiredStoredFields(...) in ValuesSourceReaderOperator. Otherwise in case of synthetic source many stored fields aren't loaded, which causes only a subset of _source to be synthesized. For example when unmapped fields exist or field values that exceed configured ignore above will not appear is _source.

This happens when field types fallback to a block loader implementation that uses _source. The required field values are then extracted from the source once loaded.

This change also reverts the production code changes introduced via #114903. That change only ensured that _ignored_source field was added to the required list of stored fields. In reality more fields could be required. This change is better fix, since it handles also other cases and the SourceLoader implementation indicates which stored fields are needed.

Closes #115076
2024-10-23 10:20:42 +02:00
Luca Cavanna
8efd08b019
Upgrade to Lucene 10 (#114741)
The most relevant ES changes that upgrading to Lucene 10 requires are:

- use the appropriate IOContext
- Scorer / ScorerSupplier breaking changes
- Regex automaton are no longer determinized by default
- minimize moved to test classes
- introduce Elasticsearch900Codec
- adjust slicing code according to the added support for intra-segment concurrency
- disable intra-segment concurrency in tests
- adjust accessor methods for many Lucene classes that became a record
- adapt to breaking changes in the analysis area

Co-authored-by: Christoph Büscher <christophbuescher@posteo.de>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Brian Seeders <brian.seeders@elastic.co>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Panagiotis Bailis <pmpailis@gmail.com>
Co-authored-by: Benjamin Trent <4357155+benwtrent@users.noreply.github.com>
2024-10-21 13:38:23 +02:00
Martijn van Groningen
c62a96c8ab
Include ignored source as part of loading field values in ValueSourceReaderOperator via BlockSourceReader. (#114903)
Currently, in compute engine when loading source if source mode is synthetic, the synthetic source loader is already used. But the ignored_source field isn't always marked as a required source field, causing the source to potentially miss a lot of fields.

This change includes _ignored_source field as a required stored field and allowing keyword fields without doc values or stored fields to be used in case of synthetic source.

Relying on synthetic source to get the values (because a field doesn't have stored fields / doc values) is slow. In case of synthetic source we already keep ignored field/values in a special place, named ignored source. Long term in case of synthetic source we should only load ignored source in case a field has no doc values or stored field. Like is being explored in #114886 Thereby avoiding synthesizing the complete _source in order to get only one field.
2024-10-18 07:49:00 +02:00
Oleksandr Kolomiiets
2c10a18774
Fix block loader tests for token_count (#113718) 2024-10-01 10:25:26 -07:00
Chris Hegarty
32dde26e49
Upgrade to Lucene 9.12.0 (#113333)
This commit upgrades to Lucene 9.12.0.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Chris Hegarty <chegar999@gmail.com>
Co-authored-by: John Wagster <john.wagster@elastic.co>
Co-authored-by: Luca Cavanna <javanna@apache.org>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-10-01 08:39:27 +01:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Kostas Krikellas
86a88d735f
Fix synthetic source field names for multi-fields (#112850)
* Fix synthetic source field names for multi-fields

* enable logsdb in randomized tests

* Revert "enable logsdb in randomized tests"

This reverts commit 2e2c22e2bb.

* Update docs/changelog/112850.yaml

* fix
2024-09-13 15:00:55 +03:00
Oleksandr Kolomiiets
082e7211b3
Use fallback synthetic source for copy_to and doc_values: false cases (#112294) 2024-09-10 12:12:51 -07:00
Kostas Krikellas
f3bc281978
Refactor build params for FieldMapper, adding SourceKeepMode (#112455)
* Refactor build params for FieldMapper

* more mappers and tests

* more mappers

* more mappers

* spotless

* spotless

* stored by default

* Revert "stored by default"

This reverts commit bbd247d64b.

* restore storeIgnored

* sync

* list valid values for SourceKeepMode

* small refactoring

* spotless
2024-09-06 14:16:17 +03:00
Oleksandr Kolomiiets
38adbb0724
Prevent synthetic field loaders accessing stored fields from using stale data (#112173) 2024-08-27 14:55:00 -07:00
Luca Cavanna
915e4a50c5
Rename Mapper#name to Mapper#fullPath (#110040)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while the MappedFieldType name does.

We have renamed Mapper.Builder#name to leafName (#109971) and Mapper#simpleName to leafName (#110030). This commit renames Mapper#name to fullPath for clarity
This required some adjustments in FieldAliasMapper to avoid confusion between the existing path method and fullPath. I renamed path to targetPath for clarity.
ObjectMapper already had a fullPath method that returned name, and was effectively a copy of name, so it could be removed.
2024-06-21 22:47:27 +02:00
Luca Cavanna
54e7b4d93b
Rename Mapper#simpleName to Mapper#leafName (#110030)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does. We have method called simpleName to signal that, but leafName signals that more clearly and aligns with
the name we have recently introduced in Mapper.Builder (renamed from name to leafName).

Relates to #109971
2024-06-21 14:28:36 +02:00
Luca Cavanna
15c7abe111
Rename Mapper#name to Mapper#leafName (#109971)
This addresses a long standing TODO that caused quite a few bugs over time, in that the mapper name does not include its full path, while
the MappedFieldType name does.
2024-06-21 11:48:17 +02:00
Oleksandr Kolomiiets
60a34f2a90
Do not use nested arrays as malformed values in scaled_float and unsigned_long synthetic source tests (#109650)
They don't provide any additional value because arrays are parsed at the
level above and tests already cover arrays.

Fixes #109649.
2024-06-13 07:15:32 +10:00
Oleksandr Kolomiiets
c847235ed0
Support synthetic source for scaled_float and unsigned_long when ignore_malformed is used (#109506) 2024-06-12 11:05:23 -07:00