Commit graph

674 commits

Author SHA1 Message Date
Salvatore Campagna
e2281a1158
Introduce an IndexSettingsProvider to inject logsdb index mode (#113505)
Here we introduce a new implementation of `IndexSettingProvider` whose goal is to "inject" the
`index.mode` setting with value `logsdb` when a cluster setting `cluster.logsdb.enabled` is `true`.
We also make sure that:
* the existing `index.mode` is not set
* the datastream name matches the `logs-*-*` pattern
* `logs@settings` component template is used
2024-09-26 14:44:03 +02:00
Kostas Krikellas
fffe8844e9
Apply auto-flattening to subobjects: auto (#112092)
* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-09-26 11:42:40 +03:00
Albert Zaharovits
71ccf2089f Merge main into multi-project 2024-09-26 09:57:32 +03:00
Mary Gouseti
3d7904bee3
Add template builder (#113444)
Since we are enriching the component templates with more entries such as
the data stream lifecycle and in the future the data stream options, we
add a template builder to help with the code, especially tests.

To highlight the value and prepare for the PRs that will add the data
stream options to the template we replace calls to the constructor with
all arguments by the builder: - when there are aguements with null
values, or - when we copy another template and change only a few fields.

This prepares the ground, so when we add data stream options, we will
not need to edit all these places.
2024-09-25 19:00:17 +10:00
Tim Vernum
ad6435dede Merge main into multi-project 2024-09-25 12:49:01 +10:00
Felix Barnsteiner
8d223cbf7a
Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
2024-09-23 17:31:18 +10:00
Tim Vernum
f6458344ce Merge main into multi-project 2024-09-23 11:36:52 +10:00
Mary Gouseti
f4f075a2cc
Add failure store status in index response of data streams (#112816)
The failure store status is a flag that indicates how the failure store was used or could be used if enabled. The user can be informed about the usage of the failure store in the following way:

When relevant we add the optional field `failure_store` . The field will be omitted when the use of the failure store is not relevant. For example, if a document was successfully indexed in a data stream, if a failure concerns an index or if the opType is not index or create. In more detail:
- when we have a “success” create/index response, the field `failure_store` will not be present if the documented was indexed in a backing index. Otherwise, if it got stored in the failure store it will have the value `used`.
- when we have a “rejected“ create/index response, meaning the document was not persisted in elasticsearch, we return the field `failure_store` which is either `not_enabled`, if the document could have ended up in the failure store if it was enabled, or `failed` if something went wrong and the document was not persisted in the failure store, for example, the cluster is out of space and in read-only mode.

We chose to make it an optional field to reduce the impact of this field on a bulk response. The value will exist in the java object but it will not be returned to the user. The only values that will be displayed are:

- `used`: meaning this document was indexed in the failure store
- `not_enabled`: meaning this document was rejected but could have been stored in the failure store if it was applicable.
- `failed`: meaning this failed document, failed to be stored in the failure store.

Example:
```
"errors": true,
  "took": 202,
  "items": [
    {
      "create": {
        "_index": ".fs-my-ds-2024.09.04-000002",
        "_id": "iRDDvJEB_J3Inuia2zgH",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 6,
        "_primary_term": 1,
        "status": 201,
        "failure_store": "used"
      }
    },
    {
      "create": {
        "_index": "ds-no-fs",
        "_id": "hxDDvJEB_J3Inuia2jj3",
        "status": 400,
        "error": {
          "type": "document_parsing_exception",
          "reason": "[1:153] failed to parse field [count] of type [long] in document with id 'hxDDvJEB_J3Inuia2jj3'. Preview of field's value: 'bla'",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "For input string: \"bla\""
          }
        }
      },
      "failure_store": "not_enabled"
    },
    {
      "create": {
        "_index": ".ds-my-ds-2024.09.04-000001",
        "_id": "iBDDvJEB_J3Inuia2jj3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "_seq_no": 7,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
```
2024-09-20 10:53:39 +03:00
Lee Hinman
4f221bb4c6
Mark data streams stats API as internal-only (again) (#112712)
This is a redo of https://github.com/elastic/elasticsearch/pull/108745 which was reverted. Now that https://github.com/elastic/elasticsearch/pull/112303 has been merged, there is an alternative to retrieve the `maximum_timestamp`.
2024-09-19 13:24:02 -06:00
Kostas Krikellas
e244216c0f
Configure keeping source in FieldMapper (#112706)
Introduces per-field param `synthetic_source_keep` that overrides the
behavior for keeping the field source in synthetic source mode:  -
`none` : no source is stored  - `arrays`: the incoming source is
recorded as-is for arrays of a given field  - `all`: the incoming source
is recorded as is for both singleton and array values of a given field

Related to #112012
2024-09-19 23:29:09 +10:00
Kostas Krikellas
4ff4384550
Retrieve the source for objects and arrays within arrays in a separate parsing phase (#113027)
In synthetic source, storing array elements to `_ignored_source` may
hide other, regular elements from showing up during source synthesizing.
This is due to contents from `_ignored_source` taking precedence over
matching fields from regular source loading. 

To avoid this, arrays are pre-emptively tracked and marked for source
storing, if any of their elements needs to store its source. A second
doc parsing phase is introduced that checks for fields missing values
and records their source, while skipping objects and arrays that don't
contain any such fields.

Fixes #112374
2024-09-19 20:07:31 +10:00
Tim Vernum
d5d5131e25 Merge main into multi-project 2024-09-19 18:52:20 +10:00
Lee Hinman
b94720dca5
Deprecate dot-prefixed indices and composable template index patterns (#112571)
This commit adds a module emitting a deprecation warning when a
dot-prefixed index is manually or automatically created, or when a
composable index template with an index pattern that uses a dot-prefix
is created. This pattern warns that in the future these indices will not
be allowed. In a future breaking change (10.0.0 maybe?) the deprecation
can then be changed to an exception.

These deprecations are only displayed when a non-operator user is using
the API (one that does not set the `X-elastic-product-origin` header).
2024-09-19 05:29:53 +10:00
Pete Gillin
81041b47d4
[TEST] Assert DSL merge policy respects end date (#113038)
[TEST] Assert DSL merge policy respects end date

Backing indexes with an end date in the future may still get writes,
so DSL should not apply the merge policy (first configuring the
settings on the index, then doing the force merge) until that time has
passed. The implementation already does this, because
`DataStreamLifecycleService.run()` calls
`timeSeriesIndicesStillWithinTimeBounds` and adds the resulting
indices to `indicesToExcludeForRemainingRun` before calling
`maybeExecuteForceMerge`. This change simply adds a unit test to
ensure that this behaviour does not regress.

Closes #109030
2024-09-18 12:03:49 +01:00
Nhat Nguyen
af7ed9515f
Enable ignore_malformed in logsdb (#113072)
This change enables ignore_malformed by default for newly created 
logsdb indices.

Closes #106822
2024-09-17 22:41:41 -07:00
Lee Hinman
4a0ccbf4b4
Fix verbose get data stream API not requiring extra privileges (#112973)
* Fix verbose get data stream API not requiring extra privileges

When a user uses the `GET /_data_stream?verbose` API to retrieve the verbose version of the response (which includes the `maximum_timestamp`, as added in #112303), the response object should be performed with the same privilege-checking as the get-data-stream API, meaning that no extra priveleges should be required return the field.

This commit makes the Transport action use an entitled client so that extra privileges are not required, and adds a test to ensure that it works.

* Update docs/changelog/112973.yaml
2024-09-17 10:03:44 -06:00
Salvatore Campagna
f7880ae85f
LogsDB data migration integration testing (#112710)
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.
2024-09-17 16:26:48 +02:00
Oleksandr Kolomiiets
7923870b42
Fix license headers in test files (#112965) 2024-09-16 13:45:33 -07:00
Oleksandr Kolomiiets
9de285e0f1
Add LogsDB challenge tests for reindexing (#112849) 2024-09-16 13:32:40 -07:00
David Turner
3a8835853f
Remove unnecessary bwc from get-aliases API (#112797)
Dropping support for pre-8.12 requests from remote nodes, and also
cleaning up some unnecessary abstraction in the request builder
hierarchy.

Relates #101815
Relates #107984 (drops some unnecessary trappy timeouts)
2024-09-16 06:31:37 +01:00
Niels Bauman
c41ed527b3 Merge main into multi-project 2024-09-14 10:52:45 +02:00
Mark Vieira
a59c182f9f
Add AGPLv3 as a supported license 2024-09-13 15:29:46 -07:00
Niels Bauman
9968e076f7 Merge main into multi-project
# Conflicts:
#	x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/ReservedLifecycleAction.java
2024-09-13 12:52:08 +02:00
Oleksandr Kolomiiets
44c9271562
Ensure that fields copied using copy_to are not present in synthetic source (#112625) 2024-09-12 12:18:25 -07:00
Niels Bauman
158d66cc2e Merge main into multi-project 2024-09-12 15:16:45 +02:00
David Turner
8607d40679
Introduce test utils for ingest pipelines (#112733)
Replaces the somewhat-awkward API on `ClusterAdminClient` for
manipulating ingest pipelines with some test-specific utilities that are
easier to use.

Relates #107984 in that this change massively reduces the noise that
would otherwise result from removing the trappy timeouts in these APIs.
2024-09-12 08:22:50 +01:00
David Turner
2db52aeb0e
Fix trappy timeouts in downsample action (#112734)
Relates #107984
2024-09-11 13:18:03 +01:00
Albert Zaharovits
8e1a004260 Metadata -> ProjectMetadata related to index creation (MP-1644)
Metadata -> ProjectMetadata related to index creation
2024-09-11 13:26:49 +03:00
Oleksandr Kolomiiets
082e7211b3
Use fallback synthetic source for copy_to and doc_values: false cases (#112294) 2024-09-10 12:12:51 -07:00
Tim Vernum
56ba3385c8 Merge main into multi-project 2024-09-10 16:59:33 +02:00
David Turner
ecd887d651
Remove unused compat shims from o.e.a.datastreams (#112697)
Relates #111474 Relates #107984
2024-09-10 21:55:34 +10:00
David Turner
8f07d60c2c
Fix trappy timeouts in o.e.a.a.cluster.* (#112674)
Removes all usages of `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in
cluster-related APIs in `:server`.

Relates #107984
2024-09-10 08:17:09 +01:00
Mary Gouseti
a43ffd57c2
Do not send version conflicts to failure store (#112537)
When indexing to a data stream with a failure store it's possible to get
a version conflict. The reproduction path is the following:

```
PUT /_bulk
{"create":{"_index": "my-ds-with-fs", "_id": "1"}}
{"@timestamp": "2022-01-01", "baz": "quick", "a": "brown", "b": "fox"}
{"create":{"_index": "my-ds-with-fs", "_id": "1"}}
{"@timestamp": "2022-01-01", "baz": "lazy", "a": "dog"}
```

We would like the second document to not be sent to the failure store
and return an error to the user:

```
{
  "errors" : true,
  "took" : 409,
  "items" : [
    {
      "create" : {
        "_index" : ".ds-my-ds-with-fs-xxxxx-xxxx",
        "_id" : "1",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "create" : {
        "_index" : ".ds-my-ds-with-fs-xxxxx-xxxx",
        "_id" : "1",
        "status" : 409,
        "error" : {
          "type" : "version_conflict_engine_exception",
          "reason" : "[1]: version conflict, document already exists (current version [1])",
          "index_uuid" : ".....",
          "shard" : "0",
          "index" : ".ds-my-ds-with-fs-xxxxx-xxxx"
        }
      }
    }
  ]
}
```

The version conflict doc is counted as a rejected doc in APM telemetry.
2024-09-09 20:47:51 +10:00
Nikolaj Volgushev
ee68d0cb03
Bring back operator and serverless request marking (#112554)
Reverts https://github.com/elastic/elasticsearch/pull/111810
2024-09-06 19:01:10 +10:00
Tim Vernum
26f0b75a3f Merge main into multi-project 2024-09-06 12:37:27 +10:00
Mark Vieira
24f33e95e8
Ensure rest compatibility tests are run when appropriate (#112526) 2024-09-05 08:22:48 -07:00
Kostas Krikellas
d5bae2cdee
Control storing array source with index setting (#112397)
Introduce an index setting that forces storing the source of leaf field
and object arrays in synthetic source mode. Nested objects are excluded
as they already preserve ordering in synthetic source.

Next step is to introduce override params at the mapper level that will
allow disabling the source, or storing the source for arrays (if not
enabled at index level), or storing the source for both arrays and
singletons. This will happen in follow-up changes, so that we can
benchmark the impact of this change in parallel.

Related to #112012
2024-09-05 01:12:19 +10:00
Albert Zaharovits
36a3eb7edc Merge main into multi-project 2024-09-04 10:10:04 +03:00
Albert Zaharovits
28f4a7b4f8 Index Settings provider with project metadata (MP-1630)
Restricts the "index settings" provider that's invoked when creating new
indices to only inspect the current project's metadata (rather than the
whole global metadata).
2024-09-03 15:12:47 +03:00
Niels Bauman
62d544e03d
Allow pathRestricted param in RestGetDataStreamsAction (#112434)
This is a follow-up of #112303 which prohibited this parameter. This
resulted in test failures on serverless.
2024-09-03 01:35:07 +10:00
Mary Gouseti
91f4023e27
Expose global retention settings via data stream lifecycle API (#112210)
In this PR we expose the global retention via the `GET
_data_stream/{target}/_lifecycle` API.

Since the global retention is a main feature of the data stream
lifecycle we chose to expose it by default.

```
GET /_data_stream/my-data-stream/_lifecycle
{
 "global_retention": {
      "default_retention": "7d",
      "max_retention": "365d"
  }, 
  "data_streams": [...]
}
```
2024-09-02 18:40:08 +10:00
Lee Hinman
4ae88f98dc
Add 'verbose' flag retrieving maximum_timestamp for get data stream API (#112303)
This commit adds support for the `verbose` querystring parameter to the
get data stream API (`GET /_data_stream/{name}`).

The flag defaults to "false".

When set to true, the `maximum_timestamp` for the data stream will be
retrieved and returned for each data stream retrieved. This is the same
information available from the data stream stats API (and internally
uses the same action to retrieval).
2024-08-31 03:18:15 +10:00
Oleksandr Kolomiiets
2dae0533a7
LogsDB QA tests - add dynamic mapping support (#112321) 2024-08-29 12:22:29 -07:00
Tim Vernum
a100bc3131 Merge main into multi-project 2024-08-28 20:22:59 +10:00
Mary Gouseti
bed6e18fa3
Exclude internal data streams from global retention (#112100)
With #111972 we enable users to set up global retention for data streams that are managed by the data stream lifecycle. This will allow users of elasticsearch to have a more control over their data retention, and consequently better resource management of their clusters.

However, there is a small number of data streams that are necessary for the good operation of elasticsearch and should not follow user defined retention to avoid surprises.

For this reason, we put forth the following definition of internal data streams.

A data stream is internal if it's either a system index (system flag is true) or if its name starts with a dot.

This PR adds the `isInternalDataStream` param in the effective retention calculation making explicit that this is also used to determine the effective retention.
2024-08-28 11:28:35 +03:00
Simon Cooper
9db1778878
Use StreamOutput::writeWriteable instead of writeTo directly (#112027) 2024-08-27 09:21:43 +01:00
Michael Peterson
0d371978e8
Search coordinator uses event.ingested in cluster state to do rewrites (#111523)
* Search coordinator uses event.ingested in cluster state to do rewrites

Min/max range for the event.ingested timestamp field (part of Elastic Common
Schema) was added to IndexMetadata in cluster state for searchable snapshots
in #106252.

This commit modifies the search coordinator to rewrite searches to MatchNone
if the query searches a range of event.ingested that, from the min/max range
in cluster state, is known to not overlap. This is the same behavior we currently
have for the @timestamp field.
2024-08-26 09:53:26 -04:00
Martijn van Groningen
32b4aa3c44
Fix TSDBIndexingIT#testTrimId() test failure. (#112194)
Sometimes initial indexing results into exactly one segment.
However, multiple segments are needed to perform the force merge that purges stored fields for _id field in a later stage of the test.

This change tweaks the test such that an extra update is performed after initial indexing. This should always create an extra segment, so that this test can actual purge stored fields for _id field.

Closes #112124
2024-08-26 13:52:20 +07:00
Parker Timmins
1072f2bbab
Add interval based SLM scheduling (#110847)
Add the ability to schedule an SLM policies with a time unit interval schedule rather than a cron job schedule. For example, an slm policy can be created with the argument "schedule":"30m". This will create a policy that will run 30 minutes after the policy modification_date. It will then run again every time another 30 minutes has passed. Every time the policy is changed, the next snapshot will be re-scheduled to run one interval after the new modification date.
2024-08-22 21:15:29 -05:00
Mary Gouseti
ed60470518
Display effective retention in the relevant data stream APIs (#112019) 2024-08-22 17:42:49 +03:00