Commit graph

11966 commits

Author SHA1 Message Date
kosabogi
ff926182f1
Adds text_similarity task type to inference processor documentation (#113517) (#113612) 2024-09-27 00:38:48 +10:00
István Zoltán Szabó
cf55728d77
[DOCS] Improves semantic text documentation. (#113606) (#113611) 2024-09-27 00:34:37 +10:00
Kostas Krikellas
8539876663
[8.x] Apply auto-flattening to subobjects: auto (#113584)
* Apply auto-flattening to `subobjects: auto` (#112092)

* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
(cherry picked from commit fffe8844e9)

# Conflicts:
#	modules/dot-prefix-validation/build.gradle
#	rest-api-spec/build.gradle

* Update build.gradle
2024-09-26 20:17:11 +10:00
Keith Massey
7870e2dbe2
Adding component template substitutions to the simulate ingest API (#113276) (#113567) 2024-09-26 07:32:13 +10:00
Nik Everett
0e6bbb0bea
ESQL: TOP support for strings (#113183) (#113408)
Adds support to the `TOP` aggregation for `keyword` and `text` field
types.

Closes #109849
2024-09-26 05:18:20 +10:00
Liam Thompson
fd775317ed
[DOCS] Create Elasticsearch basics section, refactor quickstarts section (#112436) (#113543)
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-09-26 01:55:19 +10:00
David Kyle
cc3caa228d
[ML] Add deployment threading details and memory usage to telemetry (#113099) (#113516)
Adds deployment threading options and a new memory section reporting
the memory usage for each of the ml features
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
2024-09-25 22:35:09 +10:00
Sam Xiao
ce0681225b
ILM: Add total_shards_per_node setting to searchable snapshot (#112972) (#113493)
Allows setting index total_shards_per_node in the SearchableSnapshot action of ILM to remediate hot spot in shard allocation for searchable snapshot index.

Closes #112261
2024-09-25 06:53:11 +10:00
Nik Everett
f8dbda3f98
ESQL: Document esql_worker threadpool (#113203) (#113459)
Documents the thread pool we use to run ESQL operations. It's the same
size and queue depth as the `search` thread pool.

Closes #113130
2024-09-24 23:28:53 +10:00
Salvatore Campagna
9a21ca63d7
LogsDB data migration integration testing (#112710) (#113448)
Here we test reindexing logsdb indices, creating and restoring
snapshots. Note that logsdb uses synthetic source and restoring
source only snapshots fails due to missing _source.

(cherry picked from commit f7880ae85f)
2024-09-24 21:47:09 +10:00
Salvatore Campagna
bac208a154
Introduce an ignore_above index-level setting (#113121) (#113414)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.

(cherry picked from commit 208a1fe571)
2024-09-24 06:16:08 +10:00
Liam Thompson
cbe2faead8
fix typos (#113329) (#113400)
Co-authored-by: Pm Ching <41728178+pionCham@users.noreply.github.com>
2024-09-24 02:05:57 +10:00
Liam Thompson
9ae2439a34
[DOCS] Add snippet tests to retriever API docs (#113289) (#113396) 2024-09-24 01:25:32 +10:00
Felix Barnsteiner
0aebbb53d6
[8.x] Add support for multi-value dimensions (#112645) (#113369)
* Add support for multi-value dimensions (#112645)

Closes https://github.com/elastic/elasticsearch/issues/110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cbf7a)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java

* Remove skip test for 8.x

This was just needed for 8.x to 9.0 compatibility tests
2024-09-24 00:05:25 +10:00
Carlos Delgado
c3a2b19993
[8.x] ESQL QSTR function (#112590) (#113189) 2024-09-23 10:13:53 +02:00
Martijn van Groningen
b82afc1377
Added known issue entry for synthetic source bug. (#113269) (#113358)
Added known issue entry for synthetic source bug.

Co-authored-by: Oleksandr Kolomiiets <olkolomiiets@gmail.com>
2024-09-23 15:34:22 +10:00
Iraklis Psaroudakis
6f63a4e08b
fix a couple of docs typos (#112901) (#113283)
Co-authored-by: Pm Ching <41728178+pionCham@users.noreply.github.com>
2024-09-21 01:59:14 +10:00
Bogdan Pintea
6e314d6c2a
ESQL: Align year diffing to the rest of the units in DATE_DIFF: chronological (#113103) (#113258)
This will correct/switch "year" unit diffing from the current integer
subtraction to a crono subtraction. Consequently, two dates are (at
least) one year apart now if (at least) a full calendar year separates
them. The previous implementation simply subtracted the year part of the
dates.

Note: this parts with ES SQL's implementation of the same function,
which itself is aligned with MS SQL's implementation, which works
equivalent to an integer subtraction.

Fixes #112482.

(cherry picked from commit f7ff00f645)
2024-09-20 22:31:36 +10:00
István Zoltán Szabó
ec109dd9bf
[DOCS] Fixes adaptive_allocations examples (#113248) (#113254)
Co-authored-by: Jan Kuipers <148754765+jan-elastic@users.noreply.github.com>
2024-09-20 19:54:50 +10:00
Alexander Spies
afae6b2d46
ESQL Docs: Mention Discover/Field Statistics in OOM known issue in 8.15.1/2 (#113196) (#113243) 2024-09-20 19:02:58 +10:00
Pius
83ea259b7c
Update 8.15.1.asciidoc (#113221) (#113240) 2024-09-20 18:29:25 +10:00
Liam Thompson
8a5d68e390
[DOCS] Fix reranking IA, move retrievers to search api overview (#112949) (#113193) 2024-09-20 01:49:59 +10:00
Simon Cooper
ceb9deff89
Use deprecation logger for CLDR date format specifiers (#112917)
The addition of the logger requires several updates to tests to deal with the possible warning, or muting if there is not way to specify an allowed (but not mandatory) warning
2024-09-19 15:50:37 +01:00
David Turner
2ba00c2810
Mention full-cluster restart in initial_master_node docs (#112986) (#113166)
Apparently some users consider "node is restarting" not to apply to a
full-cluster restart. This commit further clarifies that you must not
set `cluster.initial_master_nodes` in a full cluster restart.
2024-09-19 20:06:24 +10:00
Stef Nestor
c9764b86c4
(Doc+) Update example SAML blog for Okta (#112934) (#113098) 2024-09-18 20:30:59 +10:00
István Zoltán Szabó
2f7ad416ce
[DOCS] Gives more details to the load data step of the semantic search tutorials (#113088) (#113094)
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-09-18 20:03:10 +10:00
Nik Everett
50703cb988
ESQL: Add known issue to 8.15 docs for OOM due to wide index pattern (#112926) (#112959)
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
2024-09-16 16:30:42 -04:00
István Zoltán Szabó
08ce93eb01
[DOCS] Fixes response object indentation in semantic text tutorial (#112915) (#112920) 2024-09-16 23:05:28 +10:00
Martijn van Groningen
47be9bb975
[8.x] Remove zstd feature flag for index codec best compression. (#112665) (#112857)
* Remove zstd feature flag for index codec best compression. (#112665)

ZStandard was added via #103374 a few months ago to snapshot builds of Elasticsearch only and benchmark results have shown that using zstd is a better trade off compared to deflate for when index.codec is set to best_compression.

This change removes the feature flag for ZStandard stored field compression for indices with index.codec set to best_compression.

* Update docs/changelog/112857.yaml

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-09-14 02:48:37 +10:00
István Zoltán Szabó
0c428b4923
[DOCS] Improves inference workflow tutorial. (#112870) (#112879) 2024-09-14 02:01:16 +10:00
István Zoltán Szabó
21183609ae
[DOCS] Simplifies semantic_text tutorial by removing copy_to field (#112864) (#112876) 2024-09-14 01:16:51 +10:00
Benjamin Trent
96cc923dcf
Update knn-query.asciidoc (#112833) (#112868) 2024-09-13 21:40:59 +10:00
Stef Nestor
d039c280af
(Docs+) Flush out Resource+Task troubleshooting (#111773) (#112818)
* (Docs+) Flush out Resource+Task troubleshooting

---------

Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Co-authored-by: David Turner <david.turner@elastic.co>
2024-09-13 00:09:58 +10:00
István Zoltán Szabó
5b2d861f5a
[DOCS] Rework semantic search main page (#112452) (#112808)
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-09-12 22:30:38 +10:00
Stef Nestor
b9662b505b
(Doc+) Inference Pipeline ignores Mapping Analyzers (#112522) (#112776)
* (Doc+) Inference Pipeline ignores Mapping Analyzers

From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-09-12 08:30:07 +10:00
Stef Nestor
98aa3f2572
(Doc+) Terminating Exit Codes (#112530) (#112774)
👋 howdy, team! Mini PR to cross-replicate [this knowledge article](https://support.elastic.co/knowledge/6610ba83) about Elasticsearch's exit codes which expands [this ES doc section](https://www.elastic.co/guide/en/elasticsearch/reference/master/stopping-elasticsearch.html#fatal-errors).
2024-09-12 07:58:18 +10:00
Stef Nestor
a5dad1fe0e
(Doc+) CAT Nodes default columns (#112715) (#112772)
👋 howdy, team!

1. Related to https://github.com/elastic/dev/issues/2631, highlights customers are usually seeking `heap.percent` instead of `ram.percent`
2. Aligns the claimed "(Default)" columns in doc to what returned for v8.15.1 test cluster
2024-09-12 07:54:54 +10:00
David Turner
f79fb8c25b
Introduce repository integrity verification API (#112348)
Adds an API which scans all the metadata (and optionally the raw data)
in a snapshot repository to look for corruptions or other
inconsistencies.

Closes https://github.com/elastic/elasticsearch/issues/52622 Closes
ES-8560
2024-09-11 23:17:59 +10:00
Mary Gouseti
c1a2d390ef
Update data stream lifecycle telemetry to track global retention (#112451)
Currently, the data stream lifecycle telemetry has the following
structure:

```
{
....
  "data_lifecycle" : {
    "available": true,
    "enabled": true,
    "count": 0,
    "default_rollover_used": true,
    "retention": {
        "minimum_millis": 0,
        "maximum_millis": 0,
        "average_millis": 0.0
    }
  }....
```

In the snippet above you can see that we track:

- The amount of data streams managed by the data stream lifecycle by `count`
- If the default rollover has been overwritten by `default_rollover_used`
- The min, max and average of the `data_retention` configured on a data stream level.

In this PR we propose the following extention:

```
....
  "data_lifecycle" : {
    "available": true,
    "enabled": true,
    "count": 0,
    "default_rollover_used": true,
    "effective_retention": { #https://github.com/elastic/dev/issues/2537
        "retained_data_streams": 5,
        "minimum_millis": 0, # Only if retained data streams > 1
        "maximum_millis": 0,
        "average_millis": 0.0
    },
    "data_retention": {
        "configured_data_streams": 5,
        "minimum_millis": 0, # Only if retained data streams > 1
        "maximum_millis": 0,
        "average_millis": 0.0
    },
    "global_retention": {
      "default": {
         "defined": true/false,
	  "affected_data_streams": 0,
         "millis": 0 
      },
      "max": {
         "defined": true/false,
	  "affected_data_streams": 0,
         "millis": 0 
      }
    }
```

With this extension we are tracking:

- The amount of data streams managed by the data stream lifecycle by `count`
- If the default rollover has been overwritten by `default_rollover_used`
- The min, max and average of the `data_retention` configured on a data stream level and the number of data streams that have it configured. We add the min, max and avg only if there are data streams with data retention configuration to avoid messing with the stats in a dashboard.
- The min, max and average of the `effective_retention` and the number of data streams that are retained. We add the min, max and avg only if there are retained data streams to avoid messing with the stats in a dashboard.
- Global retention stats, if they are defined, if the number of the affected data streams and the actual value.

The above metrics allow us to answer questions like:

- How many data streams are affected by global retention.
- How big is the difference between the longest data retention compared to max global retention.
- How much does the effective retention diverging from the data retention, this will show the impact of the global retention.
2024-09-11 18:31:04 +10:00
kosabogi
6e7a9eb629
Adds details on Kibana access credentials (#112695) 2024-09-11 06:20:08 +02:00
Stanislav Malyshev
9081a951d5
Implement CCS telemetry export as part of _cluster/stats (#112310)
* Implement CCS telemetry export as part of _cluster/stats
2024-09-10 09:31:06 -06:00
István Zoltán Szabó
3636797cfe
[DOCS] Adds path params and available task types to the PUT inference page (#112696)
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-09-10 12:43:08 +02:00
Liam Thompson
c2d4543250
[DOCS][101] Refine mappings + documents/indices overviews (#112545) 2024-09-10 12:17:10 +02:00
kosabogi
6da37658ad
#101472 Updates default index.translog.flush_threshold_size value (#112052)
* #101472 Updates default index.translog.flush_threshold_size value

* Update docs/reference/index-modules/translog.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Updates the description

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-09-10 11:08:53 +02:00
Fang Xing
e8569356ea
[ES|QL] explicit cast a string literal to date_period and time_duration in arithmetic operations (#109193)
explicit cast to date_period and time_duration in arithmic operation
2024-09-09 14:56:43 -04:00
Nik Everett
ef3a5a1385
ESQL: Fix CASE when conditions are multivalued (#112401)
When CASE hits a multivalued field it was previously either crashing on
fold or evaluating it to the first value. Since booleans are loaded in
sorted order from lucene that *usually* means `false`. This changes the
behavior to line up with the rest of ESQL - now multivalued fields are
treated as `false` with a warning.

You might say "hey wait! multivalued fields usually become `null`, not
`false`!". Yes, dear reader, you are right. Very right. But! `CASE`'s
contract is to immediatly convert its values into `true` or `false`
using the standard boolean tri-valued logic. So `null` just become
`false` immediately. This is how PostgreSQL, MySQL, and SQLite behave:

```
> SELECT CASE WHEN null THEN 1 ELSE 2 END;
2
```

They turn that `null` into a false. And we're right there with them.
Except, of course, that we're turning `[false, false]` and the like into
`null` first. See!? It's consitent. Consistently confusing, but sane at
least.

The warning message just says "treating multivalued field as false"
rather than explaining all of that.

This also fixes up a few of CASE's docs which I noticed were kind of
busted while working on CASE. I think the docs generation is having a
lot of trouble with CASE so I've manually hacked the right thing into
place, but we should figure out a better solution eventually.

Closes #112359
2024-09-10 02:32:19 +10:00
Nik Everett
cf98240950 Update docs from code 2024-09-09 11:28:31 -04:00
David Turner
1977a715df
Add links to network disconnect troubleshooting (#112330)
Makes the docs added in #112271 more discoverable.
2024-09-10 00:59:39 +10:00
Chris Berkhout
fbaeb1ee61
[ESQL] Add SPACE function (#112350)
Adds the SPACE(number) function, which is equivalent to REPEAT(" ", number).
2024-09-09 21:41:35 +10:00
Iván Cea Fontenla
fc2760cfd4
ESQL: mv_median_absolute_deviation function (#112055)
- Added mv_median_absolute_deviation function
- Added possibility of having a fixed param in Multivalue "ascending" functions
- Add surrogate to MedianAbsoluteDeviation

### Calculations used to avoid overflows
First, a quick recap of how the MAD is calculated:
1. Sort values, and get the median
2. Calculate the difference between each value with the median (`abs(median - value)`)
3. Sort the differences, and get their median

Calculating a MAD may overflow when calculating the differences (Step 2), given the type is a signed number, as the difference is a positive value, with potentially the same value as `POSITIVE_MAX - NEGATIVE_MIN`.
To solve this, some types are up-casted as follow:
- Int: Stored as longs, simple approach
- Long: Stored as longs, but switched to unsigned long representation when calculating the differences
- Unsigned long: No effect; the resulting range is the same
- Doubles: Nothing. If the values overflow to +/-infinity, they're left that way, as we'll just use those outliers to sort

Closes https://github.com/elastic/elasticsearch/issues/111590
2024-09-09 10:04:25 +02:00