Commit graph

500 commits

Author SHA1 Message Date
Joe Gallo
370fb79471
DateProcessor refactoring (#124349) (#124411) 2025-03-09 05:00:39 +11:00
Joe Gallo
533d0a8750
Refactor RegisteredDomainProcessorTests (#124175) (#124245) 2025-03-07 04:15:31 +11:00
Joe Gallo
126388cc0d
Cleanup RegisteredDomainProcessorTests (#124118) (#124173) 2025-03-06 14:44:49 +11:00
Joe Gallo
aced4fc4d4
Cleanup RegisteredDomainProcessor (#124123) (#124155) 2025-03-06 10:23:49 +11:00
Joe Gallo
0f46b562e6
Optimize IngestCtxMap construction (#120833) (#120926) 2025-01-28 04:32:07 +11:00
Joe Gallo
a491383940
Optimize IngestDocMetadata isAvailable (#120753) (#120801) 2025-01-25 02:51:42 +11:00
Rene Groeschke
6b7cd0339e
Update Gradle wrapper to 8.12 (#118683) (#119363)
This updates the gradle wrapper to 8.12

We addressed deprecation warnings due to the update that includes:

- Fix change in TestOutputEvent api
- Fix deprecation in groovy syntax
- Use latest ospackage plugin containing our fix
- Remove project usages at execution time
- Fix deprecated project references in repository-old-versions

(cherry picked from commit ba61f8c7f7)
2024-12-31 08:36:31 +01:00
Parker Timmins
6ebee669c1
[8.x] Resolve pipelines from template if lazy rollover write (#116031) (#116132)
* Resolve pipelines from template if lazy rollover write  (#116031)

If datastream rollover on write flag is set in cluster state, resolve pipelines from templates rather than from metadata. This fixes the following bug: when a pipeline reroutes every document to another index, and rollover is called with lazy=true (setting the rollover on write flag), changes to the pipeline do not go into effect, because the lack of writes means the data stream never rolls over and pipelines in metadata are not updated. The fix is to resolve pipelines from templates if the lazy rollover flag is set. To improve efficiency we only resolve pipelines once per index in the bulk request, caching the value, and reusing for other requests to the same index.

Fixes: #112781

* Remute tests block merge

* Remute tests block merge
2024-11-03 04:25:33 +11:00
Ryan Ernst
dedf9fd6d7
Use directory name as project name for libs (#115720) (#115984)
* Use directory name as project name for libs (#115720)

The libs projects are configured to all begin with `elasticsearch-`.
While this is desireable for the artifacts to contain this consistent
prefix, it means the project names don't match up with their
directories. Additionally, it creates complexities for subproject naming
that must be manually adjusted.

This commit adjusts the project names for those under libs to be their
directory names. The resulting artifacts for these libs are kept the
same, all beginning with `elasticsearch-`.

* fixes
2024-10-31 07:52:10 +11:00
Simon Cooper
2d538c7022
Backport transport changes from #114895 to 8.x (#115909) 2024-10-30 14:17:17 +00:00
Pete Gillin
6ec7a3439d
Add a terminate ingest processor (#114157) (#114343)
This processor simply causes any remaining processors in the pipeline
to be skipped. It will normally be executed conditionally using the
`if` option. (If this pipeline is being called from another pipeline,
the calling pipeline is *not* terminated.)

For example, this:

```
POST /_ingest/pipeline/_simulate
{
  "pipeline":
  {
    "description": "Appends just 'before' to the steps field if the number field
 is present, or both 'before' and 'after' if not",
    "processors": [
      {
        "append": {
          "field": "steps",
          "value": "before"
        }
      },
      {
        "terminate": {
          "if": "ctx.error != null"
        }
      },
      {
        "append": {
          "field": "steps",
          "value": "after"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "doc1",
      "_source": {
        "name": "okay",
        "steps": []
      }
    },
    {
      "_index": "index",
      "_id": "doc2",
      "_source": {
        "name": "bad",
        "error": "oh no",
        "steps": []
      }
    }
  ]
}
```

returns something like this:

```
{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc1",
        "_source": {
          "name": "okay",
          "steps": [
            "before",
            "after"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448881Z"
        }
      }
    },
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc2",
        "_source": {
          "name": "bad",
          "error": "oh no",
          "steps": [
            "before"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448932Z"
        }
      }
    }
  ]
}
```
2024-10-09 16:44:57 +01:00
Simon Cooper
a5c05afe70
Explicitly use ISO weekfields for built-in weekyear date formats (#113787)
This is so it doesn't change when changing JDK version and locale database
2024-10-01 14:10:15 +01:00
Simon Cooper
40f1e5057e
Add blog links to locale deprecation warnings (#113474) 2024-09-25 14:24:05 +01:00
Simon Cooper
8c81222b66
Change default locale of date processors to ENGLISH (#112796) (#113438)
It is English in the docs, so this fixes the code to match the docs. Note that this really impacts Elasticsearch when run on JDK 23 with the CLDR locale database, as in the COMPAT database pre-23, root and en are essentially the same.
2024-09-24 11:04:08 +01:00
Simon Cooper
7a81384974
Add deprecation warnings for week-date specifiers (#113247)
Week dates also change on JDK 23, so add a deprecation warning if they are used on COMPAT
2024-09-20 16:49:47 +01:00
Simon Cooper
31d5967d35
Remove use of SPI locale for JDK 23+ (#113182)
On JDK 23 we're just going with what CLDR specifies for week-date calculations - the built-in locales are available for ISO weekdate uses.
2024-09-20 16:48:17 +01:00
Simon Cooper
ceb9deff89
Use deprecation logger for CLDR date format specifiers (#112917)
The addition of the logger requires several updates to tests to deal with the possible warning, or muting if there is not way to specify an allowed (but not mandatory) warning
2024-09-19 15:50:37 +01:00
Mark Vieira
0279c0a909
Add AGPLv3 as a supported license 2024-09-13 14:30:33 -07:00
Mark Vieira
24f33e95e8
Ensure rest compatibility tests are run when appropriate (#112526) 2024-09-05 08:22:48 -07:00
Panos Koutsovasilis
29453cb2ce
fix: support all allowed protocol numbers (#111528)
* fix(CommunityIdProcessor): support all allowed protocol numbers

* fix(CommunityIdProcessor): update documentation
2024-08-26 08:37:40 +03:00
Patrick Doyle
35a375329a
Move Guice to org.elasticsearch.injection.guice (#111723)
* Move files and fix imports & module exports
* Other consequences of moving Guice
2024-08-12 10:47:46 -04:00
Moritz Mack
6ca3ac253a
Track raw ingest and storage size separately to support updates by doc (#111179)
This PR starts tracking raw ingest and storage size separately for updates by document.
This is done capturing the ingest size when initially parsing the update, and storage size when 
parsing the final, merged document.

Additionally this renames DocumentSizeObserver to XContentParserDecorator / XContentMeteringParserDecorator
for better reasoning about the code. More renaming will have to follow.
---------

Co-authored-by: Przemyslaw Gomulka <przemyslaw.gomulka@elastic.co>
2024-08-02 09:26:37 +02:00
David Turner
b8af2a066e
Remove usages of more test-only request builders (#111400)
Deprecates for removal the following methods from `ClusterAdminClient`:

- `prepareSearchShards`
- `preparePutStoredScript`
- `prepareDeleteStoredScript`
- `prepareGetStoredScript`

Also replaces all usages of these methods with more suitable test
utilities. This will permit their removal, and the removal of the
corresponding `RequestBuilder` objects, in a followup.

Relates #107984
2024-07-30 07:33:19 +01:00
Ankita Kumar
5761c4afb5
Reconstruct set of indices in BulkRequest (#110672)
Reconstruct indices set in BulkRequest constructor so that the correct thread pool can be used for forwarded bulk requests. Before this fix, forwarded bulk requests were always using the system_write thread pool because the indices set was empty.

Fixes issue https://github.com/elastic/elasticsearch/issues/102792
2024-07-25 20:30:55 -04:00
kanoshiou
9fbdfcf650
Fix unnecessary mustache template evaluation (#110986)
Addresses the performance issue in the date ingest processor where Mustache template evaluation is unnecessarily applied inside a loop. The timezone and locale templates are now evaluated once before the loop, improving efficiency.

closes #110191
---------
Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-07-22 15:42:58 -05:00
Przemyslaw Gomulka
cf03c66c1f
Infrastructure to meter updates by script for ra-s nontimeseries (#108910)
this commit refactors the metering for billing api so that we can hide the implementation details of DocumentSizeObserver creation and adds additional field `originatesFromScript` on IndexRequest
There will no longer need to have a code checking if the request was already parsed in ingest service or updatehelper. This logic will be hidden in the implementation.
2024-07-11 10:49:32 +02:00
Przemyslaw Gomulka
b80b739993
Provide document size reporter with MapperService (#109794)
Instead of indexMode a mapper service is necessary to reliably determine if an index is a timeseries datastream
2024-06-18 11:40:56 +02:00
Przemyslaw Gomulka
44ae540fd7
Provide the DocumentSizeReporter with index mode (#108947)
in order to decided what logic in to apply when reporting a document size we need to know if an index is a time_series mode. This information is in indexSettings.mode.
2024-06-10 11:48:22 +02:00
Parker Timmins
3662d12c9f
Return ingest byte stats even when 0-valued (#108796)
Change the ingest byte stats to always be returned
whether or not they have a value of 0. Add human readable
form of byte stats. Update docs to reflect changes.
2024-05-20 10:52:16 -05:00
Parker Timmins
c5a3342449
Test pipeline run after reroute (#108693)
Add test confirming that pipelines are run after a reroute.
Fix test of two stage reroute. Delete pipelines during teardown
so as to not break other tests using name pipeline name.

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-05-20 10:02:04 -05:00
Parker Timmins
298c6492a5
Make ingest byte stat names more descriptive (#108786)
Current ingest byte stat fields could easily be confused.
Add more descriptive name to make it clear that they do not
count all docs processed by the pipeline.
2024-05-17 12:03:42 -05:00
Larisa Motova
a01baa3d79
Include doc size info in ingest stats (#107240)
Add ingested_in_bytes and produced_in_bytes stats to pipeline ingest stats.
These track how many bytes are ingested and produced by a given pipeline.
For efficiency, these stats are recorded for the first pipeline to process a 
document. Thus, if a pipeline is called as a final pipeline after a default pipeline,
as a pipeline processor, and after a reroute request, a document will not 
contribute to the stats for that pipeline. If a given pipeline has 0 bytes recorded
for both of these stats, due to not being the first pipeline to run any doc, these
stats will not appear in the pipeline's entry in ingest stats.
2024-05-17 08:53:24 -05:00
Przemyslaw Gomulka
1803320db5
Allow RA metrics to be reported upon parsing completed or accumulated (#108726)
RAmetric can be implemented so that they could be reported before they are being indexed (like with a new field being added)
or they could be accumulated and reported upon shard commit as an additional metadata

This commit addes new method to DocumentSizeReporter#onParsingComplted DocumentSizeAccumulator that is being used to accumulate the size inbetween the commits DocumentSizeReporter can be parametrised with a DocumentSizeAccumulator

based on #108449
2024-05-17 12:54:18 +02:00
Przemyslaw Gomulka
437e7db499
Refactor reporting of RA metrics to not to be done in TransportShardBulkAction (#108449)
previously DocumentSizeReporter was reporting upon indexing being completed in TransportShardBulkAction#onComplete
This commit renames the method to onIndexingCompleted and moves that reporting to IndexEngine in serverless plugin.
This will be followed up in a separate PR that will be reporting in an Engine#index subclass (serverless)
2024-05-16 13:57:06 +02:00
Moritz Mack
b71fc0c561
Migrate remaining usage of skip version in YAML specs to cluster_features (#108055) 2024-05-07 09:42:17 +02:00
Parker Timmins
796b0deeec
Simulate should succeed if ignore_missing_pipeline (#108106)
PipelineProcessors with non-existing pipelines should succeed (as noop)
 if ignore_missing_pipeline=true. Currently, does not work when pipelines are
 simulated with verbose=true. In this case, an error is returned and no results
 are shown for subsequent processors. This change allows following processors
 to run, and changes the status from error to error_ignored.
2024-05-02 08:35:20 -05:00
Keith Massey
f21bba6ce5
Making test document larger to reliably force StackOverflowError in GsubProcessorTests (#107724)
The size of the document used to trigger a StackOverflowError in
GsubProcessorTests.testStackOverflow() was just large enough to cause it
on a mac. On the linux CI boxes, occasionally it does not cause a
StackOverflowError, and as a result the test fails. This change makes
the document more than 3x larger, making a StackOverflowError
guaranteed. Closes #107416
2024-04-22 17:06:43 -04:00
Jonathan Buttner
a0693a59fb
Muting testStackOverflow (#107465)
Muting https://github.com/elastic/elasticsearch/issues/107416
2024-04-15 09:14:39 -04:00
Keith Massey
ef16be9303
Catching StackOverflowErrors from bad regexes in GsubProcessor and rethrowing as an Exception (#106851) 2024-04-11 15:59:53 -05:00
Moritz Mack
1f5e04b721
Migrate YAML REST tests to synthetic cluster feature check (#107068)
To simplify the migration away from version based skip checks in YAML specs, 
this PR adds a synthetic version feature `gte_vX.Y.Z` for any version at or before 8.14.0.

New test specs for 8.14 or later are expected to use respective new cluster features,
or a test-only feature supplied via ESRestTestCase#createAdditionalFeatureSpecifications
if sufficient.
2024-04-11 18:22:38 +02:00
Przemyslaw Gomulka
84d61579c1
Do not report document metering on system indices (#107041)
For system indices we don't want to emit metrics. DocumentSizeReporter will be created given an index. It will internally contain a SystemIndices instance that will verify the indexName with isSystemName
2024-04-10 13:04:40 +02:00
Keith Massey
c6a0d4f0d7
Pulling KeyValueProcessor.logAndBuildException() into AbstractProcessor (#106931) 2024-03-29 16:29:16 -05:00
Armin Braun
fc8e2b7897
Introduce Predicate Utilities for always true/false use-cases (#105881)
Just a suggetion. I think this would save us a bit of memory here and
there. We have loads of places where the always true lambdas are used
with `Predicate.or/and`. Found this initially when looking into field
caps performance where we used to heavily compose these but many spots
in security and index name resolution gain from these predicates.
The better toString also helps in some cases at least when debugging.
2024-03-04 14:01:21 +01:00
Simon Cooper
bc47d18599
Convert uses of map/set creation using a subclass to static creation methods (#105767) 2024-02-23 16:43:26 +00:00
Niels Bauman
b1fcedd7ae
Fix uri_parts processor behaviour for missing extensions (#105689)
The `uri_parts` processor was behaving incorrectly for
URI's that included a dot in the path but did not have an extension.
Also includes YAML REST tests for the same.
2024-02-22 09:41:11 +01:00
Przemyslaw Gomulka
a103e3c7a4
Infrastructure for metering the update requests (#105063)
udpate request that are sending a document (or part of it) should allow for metering the size of that doc
the update request that are using a script should not be metered - reported size 0.

this commit is following up on #104859

The parsing is of the update's document is being done in UpdateHelper - the same pattern we use to meter parsing in IngestService. If the script is being used, the size observed will be 0.
The value observed is then reported in the TransportShardBulkAction and thanks to the value being 0 or positive it will not be metering the modified document again.

This commit also renames the getDocumentParsingSupplier to getDocumentParsingProvider (this was accidentally omitted in the #104859)
2024-02-19 12:05:51 +01:00
Keith Massey
f0ec294382
Limiting the number of nested pipelines that can be executed (#105428)
Limiting the number of nested pipelines that can be executed within a single pipeline to 100
2024-02-13 16:28:31 -06:00
Keith Massey
c884945a93
Adding executedPipelines to the IngestDocument copy constructor (#105427) 2024-02-13 15:11:47 -06:00
Keith Massey
e2b2232569
Improving the performance of the ingest simulate verbose API (#105265)
This updates the simulate verbose API to run in O(N) (for number of pipelines)
time and memory like the simulate and ingest APIs rather than O(N^2).
2024-02-12 16:04:21 -06:00
Dmitry Cherniachenko
a50e58d99a
Use single-char variant of String.indexOf() where possible (#105205)
* Use single-char variant of String.indexOf() where possible

indexOf(char) is more efficient than searching for the same one-character String.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-02-12 14:14:32 -05:00