Commit graph

71 commits

Author SHA1 Message Date
Rene Groeschke
ae569def9c
[Build] Require reason for usesDefaultDistribution (#124707)
This makes using usesDefaultDistribution in our test setup for explicit by requiring a reason why it's needed.
This is helpful as part of revisiting the need for all those usages in our code base.
2025-03-17 08:25:39 +01:00
Moritz Mack
a608f0626e
Added query param ?include_source_on_error for ingest requests (#120725)
A new query parameter `?include_source_on_error` was added for create / index, update and bulk REST APIs to control
if to include the document source in the error response in case of parsing errors. The default value is `true`.
2025-01-28 09:33:22 +01:00
Martijn van Groningen
e833e7b6c4
Add feature flag for subobjects auto (#114616) 2024-10-12 18:55:27 +02:00
Chris Hegarty
32dde26e49
Upgrade to Lucene 9.12.0 (#113333)
This commit upgrades to Lucene 9.12.0.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Chris Hegarty <chegar999@gmail.com>
Co-authored-by: John Wagster <john.wagster@elastic.co>
Co-authored-by: Luca Cavanna <javanna@apache.org>
Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-10-01 08:39:27 +01:00
Salvatore Campagna
208a1fe571
Introduce an ignore_above index-level setting (#113121)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
2024-09-23 18:05:02 +02:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Simon Cooper
a36d90cf34
Use CLDR locale provider on JDK 23+ (#110222)
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.

This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.
2024-09-04 13:42:40 +01:00
Mikhail Berezovskiy
1163d2e4f9
Rename streamContent/Separator to bulkContent/Separator (#111716)
Rename `xContent.streamSeparator()` and
`RestHandler.supportsStreamContent()` to `xContent.bulkSeparator()` and
`RestHandler.supportsBulkContent()`.

I want to reserve use of "supportsStreamContent" for current work in
HTTP layer to [support incremental content
handling](https://github.com/elastic/elasticsearch/pull/111438) besides
fully aggregated byte buffers. `supportsStreamContent` would indicate
that handler can parse chunks of http content as they arrive.
2024-08-09 06:32:20 +10:00
Alexander Spies
e540732e39
Aggs: Scripted metric allow list (#109444)
Introduces new cluster settings that allow only a certain set of scripts in scripted metrics aggregations:
- search.aggs.only_allowed_metric_scripts, defaults to false
- search.aggs.allowed_inline_metric_scripts, defaults to empty list
- search.aggs.allowed_stored_metric_scripts, defaults to empty list
2024-06-12 14:23:03 +02:00
Jim Ferenczi
4380cd1bd5
Allow rescorer with field collapsing (#107779)
This change adds the support for rescoring collapsed documents.
The rescoring is applied on the top document per group on each shard.

Closes #27243
2024-04-29 08:48:12 +01:00
eyalkoren
6f4e293d29
Add require_data_stream feature (#101872)
Closes #97032

Adding the ability to set `require_data_stream` parameter (boolean) on bulk and indexing APIs.
For document indexing, this flag requires the indexing operation to either be pointed at a data stream, or match a template that will create a data stream.
2024-01-18 09:15:48 -07:00
Lorenzo Dematté
2b175653d9
YAML test framework: separate skip and requires sections (#104140)
* Introduce Prerequisites criteria (Predicate + factory) for modular skip decisions
- Removed accessors to specific criteria from SkipSection (used only on tests), adjusted test assertions
- Moved Features check (YAML test runner features) to SkipSection build time

* Separated check for xpack/no_xpack
Check for xpack is cluster-configuration (modules installed) dependent, while Features are meant to be "static" test-runner capabilities. We separate them so checks on one (test-runner features) can be run before and separately from the other.

* Consolidate skip() methods
- Divide require and skip predicates
- Divide requires and skip parsing (distinct sections)
- Renaming SkipSection to PrerequisiteSection and related methods/fields (e.g. skip -> evaluate)

* Refactoring tests
- moving and adding VersionRange tests
- adding specific version and os skip tests
- modified parse/validate/build to make SkipSection more unit-testable

* Adding cluster feature-based skip criteria
* Updated javadoc + renaming + better skip reason message
2024-01-15 14:48:36 +01:00
Luca Cavanna
9cd96df179
Add support for index_filter to open pit (#102388)
The open point in time API accepts a list of indices and opens a point in time view against those indices.
Like we do already for field caps, this commit allows users to provide an index_filter parameter as part of
the request body, that will be used to execute the can match phase and exclude the indices that can't possibly
match such filter.

Closes #99740
2023-11-21 15:35:49 +01:00
Keith Massey
92ec9d6605
Add executed pipelines to bulk api response (#100031)
This change allows users to pass a new list_executed_pipelines parameter
to the bulk API, which results in an executed_pipelines list being returned.
2023-10-17 09:39:09 -05:00
Albert Zaharovits
d6df838307
Refactor REST tests to the new internal cluster rule orchestration (#100399)
This PR is migrating some of the ITs that use either the
`elasticsearch.legacy-java-rest-test` or the
`elasticsearch.legacy-yaml-rest-test` gradle test plugins to the new 
`elasticsearch.internal-java-rest-test` and
`elasticsearch.internal-yaml-rest-test` equivalents. This is the list of
the affected ITs:  * SamlAuthenticationIT  * OperatorPrivilegesIT  *
ProfileIT  * SetSecurityUserProcessorWithWithSecurityDisabledIT  *
AsyncSearchSecurityIT  * SecurityRealmSmokeTestCase  *
KibanaSystemIndexIT  * KerberosAuthenticationIT  * ReindexWithSecurityIT
and ReindexWithSecurityClientYamlTestSuiteIT  *
ReloadSecureSettingsWithPasswordProtectedKeystoreRestIT  * PermissionsIT
from slm:qa:with-security  * Permissions IT from
runtime-fields:with-security  * Permissions IT from ilm:qa:with-securiy 
* GraphWithSecurityIT and GraphWithSecurityInsufficientRoleIT

Related: ES-6751
2023-10-17 07:42:43 -04:00
Armin Braun
b7eafce32c
Make some practically static methods static (#97565)
Another round of automated fixes to this, marking things that can be
made static as static. Saves some JIT cycles but also turns some lambdas
from capturing to non-capturing and makes the "utilityness" of some
classes visible.
2023-10-06 23:37:07 +02:00
Mayya Sharipova
f8c626f792
Track max_score in collapse when requested (#97703)
Before we used to track max_score in collapse when requested (track_scores=true)
or when there is no sort in collapse (see PR#27122). But this feature
was lost through refactoring and changes.

This PR restores this feature.

Closes #97653
2023-07-17 06:48:00 -04:00
eyalkoren
3d36b08d28
Fix fields API with subobjects: false (#97092) 2023-07-12 11:35:18 +03:00
Martijn van Groningen
b11cbd43dd
Move matrix stats to aggregations module (#92435)
Running the `matrix_stats_multi_value_field.yaml` test in multi node
test cluster showed a bug, see: 88758ab577
Also removes `MatrixStats` interface, removed usage of deprecated
ValueType enum and removed unused generic usage.

Relates to #90283
2022-12-22 03:15:05 -05:00
Mark Vieira
c2eda511de
Add JUnit rule based integration test cluster orchestration framework (#92379)
This commit adds a new test framework for configuring and orchestrating
test clusters for both Java and YAML REST testing. This will eventually
replace the existing "test-clusters" Gradle plugin and the build-time
cluster orchestration.
2022-12-21 15:33:46 -08:00
Iraklis Psaroudakis
756fcc212d
Log YAML test file on failure (#91349)
Relates #91081
2022-11-09 18:35:36 +02:00
Slobodan Adamović
6f4cee4737
Remove HLRC from security integration tests (#91088)
Removed remaining usages of HLRC from security integration tests.

Relates to #83423
2022-11-01 09:20:56 +01:00
Nik Everett
092f370c10
Remove numbers from aggs tests (#90983)
The line number style numbers prefixing the names of the aggregation
tests don't buy us anything. Worse, they've obfuscated that I forgot to
delete two files after merging their contents into the aggregations
module. I've deleted those as part of this PR.
2022-10-18 14:39:17 -04:00
Nik Everett
a544b127be Fix runtime field build
The runtime field build now relies on the painless execute API
transitively by virtue of using the aggs yaml tests. This pulls them in.
2022-10-18 09:52:21 -04:00
Nik Everett
71b5cad4eb
Move aggregations tests to module (#90953)
We're going to move all aggregations to the module soon and this saves a
little time in the build by only running the tests one time - in the
aggregations module.
2022-10-18 07:38:10 -04:00
Nik Everett
f2a1ee9995
Synthetic _source: test top_hits (#90137)
This adds a test for the `top_hits` aggregation using synthetic
`_source`. It works but let's be a bit paranoid here because it's a
whole new fetch phase.....
2022-09-24 05:55:23 +09:30
Yang Wang
adf8e01286
Add info of effective roles in denial messages (#89680)
When an action is denied due to authorization error, the list of
assigned roles is shown in the error message. However, it is possible
that the effective roles are fewer or more than the assigned list: *
Fewer roles can happen when the role is not defined or the license does
not permit it * More roles can happen when anonymous access is enabled

This PR changes the error message to show the effective roles instead of
the assigned roles (whenever possible) to help troubleshooting. In
addition, it also reports any missing roles, i.e. roles that are
assigned but cannot be found.
2022-09-01 11:30:01 +09:30
Mark Tozzi
9ee6a19187
Add ability to select execution mode for cardinality aggregation (#87704)
Plumbs through a new parameter for the cardinality aggregation, to allow configuring the execution mode.  This can have significant impacts on speed and memory usage.  This PR exposes three collection modes and two heuristics that we can tune going forward.  All of these are treated as hints and can be silently ignored, e.g. if not applicable to the given field type.  I've change the default behavior to optimize for time, which potentially uses more memory.  Users can override this for the old behavior if needed.
2022-07-05 09:11:22 -04:00
Nik Everett
8ebf39b7e1
Fixup highlighting with synthetic source (#87667)
Synthetic source has a habit of reordering text fields. This frustrates
highlighting because it *often* wants to use index structures to find
the offsets to values in the field. This disables the FVH highlighter
for multi-valued text fields when synthetic source is enabled and runs
the unified highlighter in "analyze" mode when synthetic source is
enabled. That's *enough* to stop them from spitting out wrong answers.

We might be leaving some performance on the table when the unified
highlighter works on a single valued text field that is indexed with
offsets or term vectors. We don't really expect that to be common at all
though because *generally* folks will enable synthetic source to save
space and adding offsets or term vectors is quite space inefficient. If
it comes up, we might be able to improve here.
2022-06-15 14:49:06 -04:00
Rene Groeschke
95d56cc262
Fix configuration cache compatibility issues in gradle plugins (#87567)
This fixes references to project that makes the plugin incompatible with Gradle
configuration cache. We also remove custom xpackProject utility:

using xpackProject in certain situations can break configure configuration cache compatibility as it uses a mutual project object under the hood that is discouraged to use in some use cases (e.g. at execution time)

It always breaks compatibility with --configure-on-demand

using xpackProject uses the project of the :x-pack project. referencing other project objects from other subproject should avoided where possible to decouple (sub project configurations). There's a good explanation of why we want to decouple our project configurations as much as possible here: https://docs.gradle.org/current/userguide/multi_project_configuration_and_execution.html#sec:decoupled_projects

it adds little value over default out of the box gradle api (just use project(':x-pack:someProject') instead of xpackProject('someProject') Also in some occasions its even shorter. e.g. when this is used as xpackProject('someProject').path instead of just passing :x-pack:someProject

I'll try to put a bit more context in the PR description in the future to make the motivation behind these kind of changes more clear upfront

Related to #57918
2022-06-13 13:20:18 +02:00
Luca Cavanna
a48965decf
Authorize painless execute as index action when an index is specified (#85512)
Painless execute allows users to validate their scripts. Some of the supported script contexts
support providing a sample document as well as an index to pull the mappings from.

The painless execute API requires cluster admin privileges today and while that's ok for the contexts that
don't support providing an index, it is not ideal when an index is provided. In fact users can run scripts
as part of the search API, which requires only the indices/read privilege on the indices that the users
is reading from.

This commit maps the painless execute action to an indices/read action when an index is specified, so that in
that case the same privileges as a search action will be requested to run painless execute.

Relates to #48856
Closes #86428
2022-05-17 17:22:42 +02:00
Nik Everett
a589456b81
Synthetic source (#85649)
This attempts to shrink the index by implementing a "synthetic _source" field.
You configure it by in the mapping:
```
{
  "mappings": {
    "_source": {
      "synthetic": true
    }
  }
}
```

And we just stop storing the `_source` field - kind of. When you go to access
the `_source` we regenerate it on the fly by loading doc values. Doc values
don't preserve the original structure of the source you sent so we have to
make some educated guesses. And we have a rule: the source we generate would
result in the same index if you sent it back to us. That way you can use it
for things like `_reindex`.

Fetching the `_source` from doc values does slow down loading somewhat. See
numbers further down.

## Supported fields
This only works for the following fields:
* `boolean`
* `byte`
* `date`
* `double`
* `float`
* `geo_point` (with precision loss)
* `half_float`
* `integer`
* `ip`
* `keyword`
* `long`
* `scaled_float`
* `short`
* `text` (when there is a `keyword` sub-field that is compatible with this feature)


## Educated guesses

The synthetic source generator makes `_source` fields that are:
* sorted alphabetically
* as "objecty" as possible
* pushes all arrays to the "leaf" fields
* sorts most array values
* removes duplicate text and keyword values

These are mostly artifacts of how doc values are stored.

### sorted alphabetically
```
{
  "b": 1,
  "c": 2,
  "a": 3
}
```
becomes
```
{
  "a": 3,
  "b": 1,
  "c": 2
}
```

### as "objecty" as possible
```
{
  "a.b": "foo"
}
```
becomes
```
{
  "a": {
    "b": "foo"
  }
}
```

### pushes all arrays to the "leaf" fields
```
{
  "a": [
    {
      "b": "foo",
      "c": "bar"
    },
    {
      "c": "bort"
    },
    {
      "b": "snort"
    }
}
```
becomes
```
{
  "a" {
    "b": ["foo", "snort"],
    "c": ["bar", "bort"]
  }
}
```

### sorts most array values
```
{
  "a": [2, 3, 1]
}
```
becomes
```
{
  "a": [1, 2, 3]
}
```

### removes duplicate text and keyword values
```
{
  "a": ["bar", "baz", "baz", "baz", "foo", "foo"]
}
```
becomes
```
{
  "a": ["bar", "baz", "foo"]
}
```
## `_recovery_source`

Elasticsearch's shard "recovery" process needs `_source` *sometimes*. So does
cross cluster replication. If you disable source or filter it somehow we store
a `_recovery_source` field for as long as the recovery process might need it.
When everything is running smoothly that's generally a few seconds or minutes.
Then the fields is removed on merge. This synthetic source feature continues
to produce `_recovery_source` and relies on it for recovery. It's *possible*
to synthesize `_source` during recovery but we don't do it.

That means that synethic source doesn't speed up writing the index. But in the
future we might be able to turn this on to trade writing less data at index
time for slower recovery and cross cluster replication. That's an area of
future improvement.

## perf numbers

I loaded the entire tsdb data set with this change and the size:

```
           standard -> synthetic
store size  31.0 GB ->  7.0 GB  (77.5% reduction)
_source  24695.7 MB -> 47.6 MB  (99.8% reduction - synthetic is in _recovery_source)
```

A second _forcemerge a few minutes after rally finishes should removes the
remaining 47.6MB of _recovery_source.

With this fetching source for 1,000 documents seems to take about 500ms. I
spot checked a lot of different areas and haven't seen any different hit. I
*expect* this performance impact is based on the number of doc values fields
in the index and how sparse they are.
2022-05-10 07:46:58 -04:00
Salvatore Campagna
08141cf875
fix: nested top metrics sort on keyword field (#85058)
Using a double as a return value works only if the field we are
sorting on is a number. If the field is not a value we can convert
to a double, like a non-numeric keyword, converting it to a number
returns `NaN`. Without this patch, sorting takes place on the bucket
key, if the order field points to a non-numeric value. The additional
bucket key comparator is implicitly added as a tie breaker to avoid
non-deterministic sorting of buckets.

With this change we support sorting using any subclass of SortValue.
This means the bucket key will be used just in case of equal values
on the order field.

Issue: #78506
2022-03-22 00:38:51 +01:00
Mark Vieira
2cbc7cda49
Remove build plugin from additional QA projects (#84960) 2022-03-14 14:11:05 -07:00
Ryan Ernst
0ec229050e
Move yaml rest test case to separate test lib (#84835)
The ESClientYamlSuiteTestCase is used to run yaml tests throughout
Elasticsearch. It utilizes the low level rest client in sniffing for
nodes, but the sniffer is not needed anywhere else in the test
framework.

This commit creates a new project, `:test:rest-runner` which is meant to
house the rest test running infrastructure. This has two purposes. First
is to remove the sniffer from the test framework dependencies, because
it transitively depends on Jackson. Second is to setup the runner for
future refactorings where it could be made to not depend on the entire
test framework, though how that could work is left for the future.
2022-03-11 10:51:11 -05:00
Alan Woodward
5ebcf60fbc
Don't run the runtime field YAML tests over TSDB aggs (#84791)
Runtime fields don't support dimension parameters so we can't shadow
keyword fields in the normal way for TSDB yaml tests.
2022-03-09 13:32:47 +00:00
Benjamin Trent
b592d2bf01
New random_sampler aggregation for sampling documents in aggregations (#84363)
This adds a new sampling aggregation that performs a background sampling over all documents in an index. 

The syntax is as follows:
```
{
  "aggregations": {
    "sampling": {
      "random_sampler": {
        "probability": 0.1
      },
      "aggs": {
        "price_percentiles": {
          "percentiles": {
            "field": "taxful_total_price"
          }
        }
      }
    }
  }
}
```

This aggregation provides fast random sampling over the entire document set in order to speed up costly aggregations.

Testing this over a variety of aggregations and data sets, the median speed up when sampling at `0.001` over millions of documents is around 70X speed improvement.

Relative error rate does rely on the size of the data and the aggregation kind. Here are some typically expected numbers when sampling over 10s of millions of documents. `p` is the configured probability and `n` is the number of documents matched by your provided filter query.
2022-03-02 14:32:30 -05:00
Nhat Nguyen
31d703f24c
Introduce lookup runtime fields (#82385)
This PR introduces the lookup runtime fields which are used to retrieve 
data from the related indices. The below search request enriches its
search hits with the location of each IP address from the `ip_location`
index.

```
POST logs/_search
{
  "runtime_mappings": {
    "location": {
      "type": "lookup",
      "lookup_index": "ip_location",
      "query_type": "term",
      "query_input_field": "ip",
      "query_target_field": "_id",
      "fetch_fields": [
        "country",
        "city"
      ]
    }
  },
  "fields": [
    "timestamp",
    "message",
    "location"
  ]
}
```

Response:

```
{
  "hits": {
    "hits": [
      {
        "_index": "logs",
        "_id": "1",
        "fields": {
          "location": [
            {
              "city": [ "Montreal" ],
              "country": [ "Canada" ]
            }
          ],
          "message": [ "the first message" ]
        }
      }
    ]
  }
}
```
2022-02-22 21:36:19 -05:00
Mark Vieira
64929dc5df
Introduce explicit API for configure test cluster feature flags (#83876) 2022-02-14 15:22:33 -08:00
Alan Woodward
8bc46ad959
Add filtering to fieldcaps endpoint (#83636)
Many consumers of the field caps API need to do some post-processing of the
results before they can use them; for instance, Kibana would like to exclude
multifields from certain field selections, or would like to display only geo_point
fields in Maps. ML and QL consumers exclude nested fields in certain
circumstances. This post-processing is possible at the moment, but can be
hacky; and in all cases it involves sending the whole (possibly very large) field
caps response over the wire and then whittling it down in the client. It is also not
guaranteed to be accurate - runtime fields may be incorrectly classified as multifields,
for example.

This commit pushes filtering into elasticsearch itself, reducing the amount of data
that needs to be transported and ensuring better accuracy. The field caps API gets
two new parameters:

* filters - a comma-delimited list that may contain any combination of: `+metadata`,
  `-metadata`, `-nested`, `-parent`, `-multifield`
* types - a comma-delimited list of field types; only fields that have a type in this set
  will be returned

The API will make best-effort attempts to apply the filters post-hoc to responses from
older nodes, so this should still work in a mixed-cluster or cross-cluster situation.

Fixes #82966, #72174
2022-02-10 14:06:26 +00:00
Martijn van Groningen
0ddfad4cd7
Fix release build (#83720)
-  Add `es.index_mode_feature_flag_registered` feature flag to data-streams module's internalClusterTest task.
-  Add `es.random_sampler_feature_flag_registered` feature flag to xpack rest tests with security qa module.

Closes #83722
2022-02-09 18:30:15 -05:00
weizijun
9503e9f4e2
Runtime fields core-with-mapped tests support tsdb (#83577)
As runtime fields not support `time_series_dimension` and
`time_series_metric`, it will lead to the failure of tsdb test case. And
tsdb indices require the @timestamp field.

So I improve the `runtimeifyMappingProperties` method logic, add some
skip rule.

- skip `time_series_dimension` field.
- skip `time_series_metric` field.
- skip `@timestamp` field.

And the PR fixed the failed test in
https://github.com/elastic/elasticsearch/issues/83431
2022-02-08 14:21:23 -05:00
Mark Vieira
1c95dfc94e Mute failing runtime fields test 2022-02-02 13:55:03 -08:00
Tim Vernum
d61dda2c01
Remove system-index write-access from superuser role (#81400)
This commit changes the superuser role (as used by the "elastic"
builtin user) so that it no longer has any sort of write access to
restricted indices (system indices).
This improves the safety and security of the cluster, as it means
that there are no out-of-the-box users or roles that can write to,
delete or close the security index.

Superusers can still read from (and monitor) system indices.

Other roles (and users) can still access system indices as specified
in their descriptor. These can be custom such as the
"_es_test_root" role used in the integration test suite, or builtin
roles such as kibana_system.
2022-01-17 12:00:38 +11:00
Artem Prigoda
0699c9351f
Use Java 14 switch expressions (#82178)
JEP 361[https://openjdk.java.net/jeps/361] added support for switch expressions
which can be much more terse and less error-prone than switch statements.

Another useful feature of switch expressions is exhaustiveness: we can make
sure that an enum switch expression covers all the cases at compile time.
2022-01-10 09:53:35 +01:00
Artem Prigoda
763d6d510f
Use Java 15 text blocks for JSON and multiline strings (#80751)
The ES code base is quite JSON heavy. It uses a lot of multi-line JSON requests in tests which need to be escaped and concatenated which in turn makes them hard to read. Let's try to leverage Java 15 text blocks for representing them.
2021-12-15 18:01:28 +01:00
Mark Vieira
12ad399c48 Reformat Elasticsearch source 2021-10-27 08:19:51 -07:00
Lee Hinman
c017e1acdb
Add deprecation headers to HLRC classes (#79754)
This commit adds the @Deprecated annotation and Javadoc to HLRC classes.
2021-10-25 16:11:16 -06:00
Igor Motov
f6034e643a
TSDB: Add time series information to field caps (#78790)
Exposes information about dimensions and metrics via field caps. This
information will be needed for PromQL support.

Relates to #74660
2021-10-13 11:03:38 -10:00
Chris Hegarty
20c9f756d2
Fix split package org.elasticsearch.common.xcontent (#78831)
Fix the split package org.elasticsearch.common.xcontent, between server and the x-content lib. Move the x-content lib exported package from org.elasticsearch.common.xcontent to org.elasticsearch.xcontent ( following the naming convention of similar libraries ). Removing split packages is a prerequisite to modularization.
2021-10-08 17:14:26 +01:00