Commit graph

10027 commits

Author SHA1 Message Date
Lisa Cawley
2d30bbab21
[DOCS] Semantic search endpoint (#91210) 2022-11-01 09:01:55 -07:00
Abdon Pijpelink
8abd39ab98
Fix typo in stop-tokenfilter.asciidoc (#91128) (#91207)
Since ignore_case is set to true in our custom stop words filter, the matching will be case-insensitive.

(cherry picked from commit a03fba9d77)

Co-authored-by: Siniša Subašić <68671543+sinisuba@users.noreply.github.com>
2022-11-01 15:32:16 +01:00
David Kilfoyle
56397f5d4c
[Docs] Remove feature flag from downsampling page (#91228) 2022-11-01 09:51:22 -04:00
Anthony McGlone
0249d1650f
[DOCS] Update the feature state example in the snapshot and restore docs (#90328) 2022-11-01 10:17:29 +09:00
Lisa Cawley
f0c12cdeea
[DOCS] Fix typo in knn-search.asciidoc (#91206) 2022-10-31 10:07:53 -07:00
Mary Gouseti
d55059afab
Mute reference/cluster/nodes-stats/line_2751 (#91174) 2022-10-28 11:55:53 +02:00
Julie Tibshirani
1b249639f1
Remove experimental marking from kNN search (#91065)
This commit removes the experimental tag from kNN search docs and makes some
docs improvements:
* Add a prominent warning about memory usage in the kNN search guide
* Link to the performance tuning guide from the main guide
* Clarify the memory requirements section in the tuning guide
2022-10-27 18:00:56 +02:00
Yang Wang
882fbe62b5
[Doc] Improve doc for certutil parameter applicability (#91124)
The http command does not take most of the parameters. This PR ensures
it is consistently documented for all parameters.
2022-10-27 09:38:56 +11:00
Frederic Dartayre
fe0036fdbf
Update threadpool.asciidoc (#90098)
* Update threadpool.asciidoc

Starting from 8.0 the value of the `node.processors` setting is  bounded by the number of available
processors https://github.com/elastic/elasticsearch/pull/44894

* Update docs/reference/modules/threadpool.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-26 14:04:39 -04:00
Craig Taverner
c19f642d94
Refine geo-point and geo-shape docs (#90913)
* Refine geo-point and geo-shape docs

While reviewing the docs for another issue, some deprecated
references to prefix-trees were discovered, leading to interest
in bringing the docs a little more up-to-date.

* Update docs/reference/mapping/types/geo-point.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/mapping/types/geo-shape.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-10-26 12:21:34 +02:00
Hendrik Muhs
82a71f6ef6
[Transform] add a health section to transform stats (#90760)
adds a health section to the transform stats endpoint and implements reporting assignment, indexing/search and persistence problems, together with a overall health state.
2022-10-25 09:01:21 +02:00
Flavio
83694c37a3
Update docker image (#90730) 2022-10-24 15:52:36 -04:00
Stéphane Campinas
8c44ed1442
Fix itemized list (#90855) 2022-10-24 15:14:17 -04:00
Przemysław Witek
95f484c4fd
[Transform] Expand the docs section regarding mappings deduction in transform's dest index (#91077) 2022-10-24 13:43:22 +02:00
Christos Soulios
1f265eb725
[DOCS] Add release notes for 8.5.0(#91063)
Forward port PR (#91029) with release notes for version 8.5.0
  - Add release notes for v8.5.0 after BC6 has been cut
2022-10-21 13:17:33 +03:00
Jack Conradson
f28ae4b288
Add support for indexing byte-sized knn vectors (#90774)
This change adds an element_type as an optional mapping parameter for dense vector fields as 
described in #89784. This also adds a byte element_type for dense vector fields that supports storing 
dense vectors using only 8-bits per dimension. This is only supported when the mapping parameter 
index is set to true.

The code follows a similar pattern to our NumberFieldMapper where we have an enum for 
ElementType, and it has methods that DenseVectorFieldType and DenseVectorMapper can delegate to 
to support each available type (just float and byte for now).
2022-10-20 14:45:58 -07:00
Iraklis Psaroudakis
0f4374f4fb
Explain disk headroom settings more in docs (#90763)
Relates to #81406
2022-10-20 18:45:23 +03:00
Roberto Seldner
8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Leaf-Lin
14ef513f2c
[DOCS] Add CCR limitation (#87348)
* Add CCR limitation

closes https://github.com/elastic/elasticsearch/issues/86121

* Add restored index auto follow pattern restriction

https://github.com/elastic/elasticsearch/issues/87055

* Moving content to existing CCR page + several changes

* Remove sections to consolidate limitation information

* Delete separate file

* Remove restored indices from list of things that aren't replicated

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-10-17 16:05:29 -04:00
Lisa Cawley
2dd7732553
[DOCS] Add ML CPP PRs to release notes (#90961) 2022-10-17 09:58:40 -07:00
Mary Gouseti
cfd23d512f
Disk indicator troubleshooting guides (#90504) 2022-10-14 15:24:21 +02:00
Paramdeep Singh
34ff7a9d98
Consolidated Circuit Breaker documentation to include EQL and ML infer (#90809)
Fixes #85851 

Co-authored-by: Iraklis Psaroudakis <kingherc@gmail.com>
2022-10-14 14:33:52 +03:00
Przemyslaw Gomulka
aa922754af
Add known issues entry about date rounding bug (#90721)
add entry to all affected versions

relates #90187
2022-10-14 11:51:02 +02:00
Francisco Fernández Castaño
1a3032beb6
Keep track of average shard write load (#90768)
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.

Closes #90102
2022-10-13 16:34:45 +02:00
David Kyle
9e6a784aa5
[ML] Semantic search endpoint (#90450)
Adds a {index}_semantic_search endpoint which first converts the query text into a dense vector
using a NLP text embedding model then performs a knn search against an index containing 
dense vectors created with the same embedding model.
2022-10-13 13:17:30 +01:00
David Roberts
be006e2eee
[ML] Improve categorize_text docs (#90765)
Adds more detail about the meaning of the results
fields of the `categorize_text` aggregation, and
advice about how to use these fields when searching
for messages that match the categories.

Followup to #90723
2022-10-13 10:46:53 +01:00
Julie Tibshirani
f4038b3f15
Add guide for tuning kNN search (#89782)
This 'how to' guide explains performance considerations specific to kNN search.
It takes inspiration from the 'tune for search speed' guide.
2022-10-12 14:53:53 -07:00
Nik Everett
82aeb478db
Synthetic _source: support wildcard field (#90196)
This adds synthetic `_source` support for the `wildcard` field type.
2022-10-12 15:55:13 -04:00
David Kilfoyle
cad87c4d5a
[DOCS] Add Downsampling docs (#88571)
This adds documentation for downsampling of time series indices.
2022-10-12 12:10:16 -04:00
Valeriy Khakhutskyy
95758e88a2
[ML] Explain anomaly score factors (#90675)
This PR surfaces new information about the impact of the factors on the initial anomaly score in the anomaly record:

- single bucket impact is determined by the deviation between actual and typical in the current bucket
- multi-bucket impact is determined by the deviation between actual and typical in the past 12 buckets
- anomaly characteristics are statistical properties of the current anomaly compared to the historical observations
- high variance penalty is the reduction of anomaly score in the buckets with large confidence intervals.
- incomplete bucket penalty is the reduction of anomaly score in the buckets with fewer samples than historically expected.

Additionally, we compute lower- and upper-confidence bounds and the typical value for the anomaly records. This improves the explainability of the cases where the model plot is not activated with only a slight overhead in performance (1-2%).
2022-10-12 16:57:06 +02:00
Luca Cavanna
18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
Albert Zaharovits
73cdc7b80a
DOC CCR Disaster recovery does not handle Security configuration (#85522)
We do not support and don't plan to support disaster recovery arrangements
where Security configuration is replicated between the production and the
disaster recovery cluster because the cluster-local Security APIs assume
exclusive write on the .security system index.
2022-10-12 13:53:53 +03:00
Ed Savage
f355787165
[ML] Allow overriding timestamp field to null in file structure finder (#90764)
Use a magic value of "null" for the timestamp format override to indicate to the analysis that a timestamp is not expected in the input text. This should improve performance when analysing delimited, ndjson or xml formatted text files that don't contain timestamps. For semi-structured text files without timestamps the magic value indicates to treat the text as single line log messages.

see #55219
2022-10-12 09:08:25 +01:00
Dimitris Athanasiou
16bfc550ea
[ML] Add api to update trained model deployment number_of_allocations (#90728)
This commit adds a new API that users can use calling:

```
POST _ml/trained_models/{model_id}/deployment/_update
{
  "number_of_allocations": 4
}
```

This allows a user to update the number of allocations for a deployment
that is `started`.

If the allocations are increased we rebalance and let the assignment
planner find how to allocate the additional allocations.

If the allocations are decreased we cannot use the assignment planner.
Instead, we implement the reduction in a new class `AllocationReducer`
that tries to reduce the allocations so that:

  1. availability zone balance is maintained
  2. assignments that can be completely stopped are preferred to release memory
2022-10-12 10:04:23 +03:00
David Roberts
bfccd20155
[ML] Add a regex to the output of the categorize_text aggregation (#90723)
The new `regex` field in `categorize_text` output is created in
the same way as the `regex` field that appears in the category
definitions created by anomaly detection jobs that do categorization.

It consists of the terms that occur in the same order for every
message that matches the category, separated with a `.+?` wildcard.
It therefore matches the category messages and enforces the order
of the terms that occurred in the same order for all messages used
to create the category.

It is not recommended to use the regex as the primary mechanism for
searching for the original documents that were categorized. Search
using a regular expression is very slow. Instead the terms of the
category should be used to search for matching documents, as a
terms search can use the inverted index and hence be much faster.
However, there may be situations where it is useful to use the
`regex` field to test whether a small set of messages that have not
been indexed match the category.
2022-10-10 11:41:16 +01:00
Andrei Dan
b55f5fd77b
Rename the fields reported under details by the disk indicator (#90717)
Currently, we report the count of affected nodes and indices as part of
the disk indicator using a leaky abstraction. Namely we use the status
we assign to nodes internally to nodes based on their disk usage (red,
yellow, green, unknown).

However, these statuses don't have an explicit meaning outside the
implementation details e.g. a red node would probably convey it's a node
experiencing disk issues but not what kind

This proposes being explicit in what we return to our health API users
e.g.
```
"details": {
  "indices_with_readonly_block": 2,
  "nodes_with_enough_disk_space": 0,
  "nodes_with_unknown_disk_status": 0,
  "nodes_over_high_watermark": 0,
  "nodes_over_flood_watermark": 2
}
```
2022-10-10 11:30:03 +01:00
Lisa Cawley
db2882cbb5
[DOCS] Add links to clear trained model deployment cache API (#90727) 2022-10-06 10:10:55 -07:00
Brandon Morelli
ced1447db0
docs: update fleet/agent pipeline docs (#90659)
* docs: update fleet/agent pipeline docs

* Apply suggestions from code review

Co-authored-by: Adam Locke <adam.locke@elastic.co>

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-05 13:06:58 -07:00
Jack Conradson
8b0d0716d1
Add profiling and documentation for dfs phase (#90536)
Adds profiling statistics for the dfs phase, and adds documentation for both the dfs phase profiling 
and kNN profiling.

Closes #89713
2022-10-05 09:54:36 -07:00
Lisa Cawley
c5c1f46fba
[DOCS] Remove coming tag from 8.4.3 release notes (#90683) 2022-10-05 08:05:41 -07:00
Ievgen Degtiarenko
4d6d979e0e
Deprecate state field in /_cluster/reroute response (#90399) 2022-10-05 08:18:27 +02:00
Lee Hinman
4fe9fc488c
Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
2022-10-04 01:04:40 +10:30
Adam Locke
52feb5540b
[Doc] Release notes for v8.4.3 (#90443) (#90538)
* Update docs for v8.4.3 release

* Update release highlights for 8.4.3 version.

* Update docs/reference/release-notes/8.4.3.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Update docs/reference/release-notes/8.4.3.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Update docs/reference/release-notes/highlights.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Make link external type

* Update release notes to include #90319 PR after creating BC2.

* Remove release note for #90302

* Minor grammar fix

Co-authored-by: Adam Locke <adam.locke@elastic.co>
(cherry picked from commit 25a196f214)

# Conflicts:
#	docs/reference/release-notes.asciidoc
#	docs/reference/release-notes/highlights.asciidoc

Co-authored-by: Slobodan Adamović <slobodanadamovic@users.noreply.github.com>
2022-09-30 16:10:26 -04:00
Iraklis Psaroudakis
ad8d064de5
Redefine section on sizing data nodes (#90274)
Now that we have the estimated field mappings heap overhead
in nodes stats, we can refer to them in the guide for sizing
data nodes appropriately.

Relates to #86639
2022-09-30 12:37:21 +03:00
debadair
ef7aaec815
[DOCS] Fixed footnote. Closes #89403 (#90541) 2022-09-29 16:48:02 -07:00
David Turner
c95fb2f3e8
More opinionated docs about http.max_content_length (#90500)
Adds to the docs a note that the `100mb` default for
`http.max_content_length` is the recommended maximum, along with
suggestions for what to do when hitting this limit.
2022-09-29 16:07:38 +01:00
David Kyle
17579ae1af
[ML] Add stat for non cache hit inference time (#90464) 2022-09-29 12:18:27 +01:00
Christos Soulios
1c0f064599
[DOCS] Add release notes for 8.5.0 (#90394) (#90485)
Forward port https://github.com/elastic/elasticsearch/pull/90394 to main
branch

> Added release notes and highlights for release 8.5.0 >  > Relates to
#90202 and #90201 >
2022-09-29 04:44:43 +09:30
Craig Taverner
4c5d24610f
Centroid aggregation for cartesian points and shapes (#89216)
Added Cartesian support for centroid aggregation

* First draft of cartesian-centroid docs
  However, this is largely a duplicate of geo-centroid docs since they are essentially identical behaviour. We should consider merging them.
* Work on isAggregatable caused a minor logic conflict. When that work was done, Point and Shape were not aggregatable, but now they are.
2022-09-28 17:14:30 +02:00
David Roberts
d9ea080d10
[ML] Release native inference functionality as beta (#90418)
Previously this functionality was tech preview (aka experimental).
This PR changes it to beta.
2022-09-28 11:09:02 +01:00