Adding text to clarify that the default pipeline only applies to indexing requests, not updates.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit 4e6e4eab22)
Co-authored-by: Mike Barretta <mike.barretta@elastic.co>
This commit adds support for MPNet based models.
MPNet models differ from BERT style models in that:
- Special tokens are different
- Input to the model doesn't require token positions.
To configure an MPNet tokenizer for your pytorch MPNet based model:
```
"tokenization": {
"mpnet": {...}
}
```
The options provided to `mpnet` are the same as the previously supported `bert` configuration.
The migrate to data tiers routing API required ILM to be stopped. This
is fine for "live" runs, but for dry runs this isn't a requirement.
This changes the dry_run to allow the API to run irrespective of the ILM
status.
This fixes the migrate to data tiers routing API to take into account
the scenario where the node attribute configuration for an index is more
accurate than the existing `_tier_preference` configuration.
Previously we would simply remove the node attributes routing if there
was a `_tier_preference` configured for the index.
With this commit, we'll look if either the `require.data` or
`include.data` custom routings are colder than the existing `_tier_preference`
configuration (ie. `cold` vs `data_warm,data_hot`) and update the tier
routing accordingly.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold",
index.routing.allocation.include._tier_preference: "data_hot"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
This also removes the existing invariant that had the `require.data`
configuration take precedence over a possible `include.data`
configuration, and will now migrate the coldest configuration to the
corresponding `_tier_preference`.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
As outlined in elastic/elasticsearch#81604, including the `searchable_snapshot` action in both the hot and cold phases can result in indices not automatically migrating to the cold tier during the cold phase.
This adds a related warning.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Notes that the query string query's `default_field` and `fields` parameters support wildcards.
* Adds an xref to the `index.query.default_field` docs to the `default_field` parameter.
As part of the effort of making JDBC driver self sufficient, remove the
ES lib geo dependencies without any replacement.
Currently the JDBC driver takes the WKT text and instantiates a geo
object based on the ES lib geo.
Moving forward the driver will return the WKT string representation
without any conversion letting the user pick the geo library desired.
That can be ES lib geo, jts, spatial4j or others.
Note this is a breaking change.
Relates #80277
We (mostly I) were initially advocating for the auto-generated files to
use unique names (the name containing a timestamp particle), in order to
avoid that subsequent invocations of the config step conflict with
itself. Moreover, I was wishing that these files will not have to be
handled directly by admins (that the enrollment process was to be used).
However, experience proved us otherwise, admins have to manipulate these
files, and unique configuration names are hard to deal with in scripts
and docs, so this PR is all about using a fixed name for all the
generated files. _Labeling as a bug fix because the feedback is that it
very negatively impacts usabilty._ Closes
https://github.com/elastic/elasticsearch/issues/81057
This improves reporting of trained model size in the response of the stats API.
In particular, it removes the `model_size_bytes` from the `deployment_stats` section and
replaces it with a top-level `model_size_stats` object that contains:
- `model_size_bytes`: the actual model size
- `required_native_memory_bytes`: the amount of memory required to load a model
In addition, these are now reported for PyTorch models regardless of their deployment state.
Add JwtRealmSettings
Include unit tests and realm security settings documentation. Covers all settings except client authentication mTLS option, and HTTP proxy option.
Refactor Open ID Connect realm to reuse ClaimSetting.java and ClaimParser.java for JWT realm.
This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
After 7.16.2, we'll no longer produce Windows MSI installer packages for Elasticsearch. These packages were previously released in beta and didn't receive widespread adoption.
### Changes:
* Adds a related 7.17 breaking change.
* Adds a related 7.16 deprecation.
* Removes the MSI installation instructions.
* Removes references to the MSI installer.
I plan to port the applicable changes to 8.1 (main), 8.0, 7.17, and 7.16. In the 7.16 ports, I'll leave in the MSI install docs and add related deprecation notes to them instead.
Removes a section covering configuration management tools from the
installation instructions.
After 7.16.2, Elastic will no longer maintain these tools. Previously,
the tools were only supported on a "best effort" basis.
For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater.
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Combines several 8.0 breaking changes for the removal of API endpoints that contain mapping types. These items were separate because we previously organized breaking changes by area.
This is a follow-on to #79162.
This commit deprecates the indices.query.bool.max_clause_count node setting,
and instead configures the maximum clause count for lucene based on the available
heap and the size of the thread pool.
Closes#46433
This PR adds four new templates that are automatically installed from the Monitoring plugin.
In 8.x, Metricbeat will be writing its data in ECS compliant format, even when used with xpack
mode enabled (stack monitoring). In order to continue to support the legacy data format, new
mappings have been created with the new ECS fields for indexing data, and alias fields for the
legacy format which point to the corresponding ECS fields.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Mat Schaffer <mat@schaffer.me>
* [DOCS] Enroll additional nodes on Docker
* Remove -p option for second node
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
* Rename nodes to align with other Docker docs
* Add elastic network to first node docker run command
* Remove hyphen from node names
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
Updates the remote clusters version compatibility table to include 7.17 and 8.x versions.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
If a node reaches the flood stage watermark then we automatically apply
the `read_only_allow_delete` block to all its indices to prevent any
further growth in data. Users are expected to fix the disk space issue
by adding more space or deleting indices. However some users may prefer
to fix the disk space issues by modifying some of the index settings,
perhaps removing replicas or adjusting an allocation filter to move
shards onto nodes with more space. Today this isn't possible since the
`read_only_allow_delete` block also applies to metadata writes. Blocking
metadata writes isn't necessary to protect against further increases in
disk usage, and makes it harder for users to resolve the disk space
issue, so this commit removes the `METADATA_WRITE` level from the block
definition.
per issue 60780, decision from team to remove experimental language from HDR Histogram percentiles and ranks. Feature has been in production for quite some time.
closes#60780
* [DOCS] Add docs for verifying CA fingerprint
* Update openssl command and explanatory text
* Explain copying CA cert if fingerprint validation isn't possible
* Incorporate new section into the main security config page
* Clarify how cert is used
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Split into two, separate sections
* Rename file and update text based on feedback
* Update ref to use new filename
* Remove extra word
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Remove sentence about security being disabled by default
* Updating introduction
* Remove minimal security page
* Clarify configuring security before starting ES
* Clarifications
* Remove old file
* Add set passwords page
* Update change passwords page, clarify TLS adjustments, and other edits
* Update test
* Minor clarification to intro text
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Previously the ML model snapshot upgrade endpoint did not
provide a way to reliably monitor progress. This could lead
to the upgrade assistant UI thinking that a model snapshot
upgrade had finished when it actually hadn't.
This change adds a new "stats" API that allows external
interested parties to find out the status of each model
snapshot upgrade and which node (if any) each is running on.
Fixes#81519
We say to mark repos as readonly to prevent corruption, but there's
other ways to prevent corruption that people sometimes use instead (e.g.
denying writes at the filesystem/bucket level). It's reasonable to think
that the readonly flag is redundant in that situation but it's not: they
should still mark the repo as readonly tho to bypass the cache and
re-read its contents on each access. This commit adds docs to that
effect.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Reverts an anchor change from #46711.
Previous versions of the docs use the `_shrinking_an_index` anchor for this
section. Preserving that anchor will prevent doc build breaks in future releases.
* Expose the index age in ILM explain output.
ILM already exposes the `age` that ILM will use to transition to the next phase, based on that phase's `min_age`. The `index_age` is based only on the index creation date and it's used to trigger a rollover.
Resolves#64429
Force merge action is a very costly action. It may take several hours to run for big indices. But current force merge rest api do not support wait_for_completion parameter.
This adds support for the wait_for_completion parameter.
`GET _nodes/stats` returns statistics about indexing pressure for each node.
With this commit `GET _cluster/stats` now returns stats about indexing pressure
computed by aggregating the indexing pressure stats of each node in the
cluster.
Closes#79788
Today the _Size your shards_ docs focus on shard size and count, but in
fact index count and field count are also important. This commit expands
these docs a bit to cover this observation too.
Today the same-shard allocation decider falls back to checking the
hostname if the node has no host address. In practice nodes will always
have an address so the fallback is dead code. This commit removes that
dead code.
Relates #80702 which will add the ability to distinguish nodes by
hostname regardless of whether they have an address or not, and #80767
which optimizes this area of code - this refactoring should make the
optimization simpler.