diff --git a/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md b/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md index 76a05164b125..4913043d1eff 100644 --- a/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md +++ b/docs/reference/aggregations/_snippets/search-aggregations-metrics-percentile-aggregation-approximate.md @@ -1,6 +1,6 @@ There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. -Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. +Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)). diff --git a/docs/reference/aggregations/pipeline.md b/docs/reference/aggregations/pipeline.md index 86dcfc365618..b1f7386e393f 100644 --- a/docs/reference/aggregations/pipeline.md +++ b/docs/reference/aggregations/pipeline.md @@ -230,7 +230,7 @@ An alternate syntax is supported to cope with aggregations or metrics which have ## Dealing with gaps in the data [gap-policy] -Data in the real world is often noisy and sometimes contains **gaps** — places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being: +Data in the real world is often noisy and sometimes contains **gaps** — places where data simply doesn’t exist. This can occur for a variety of reasons, the most common being: * Documents falling into a bucket do not contain a required field * There are no documents matching the query for one or more buckets diff --git a/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md b/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md index bca1a0dce406..03fc0477eb01 100644 --- a/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation.md @@ -606,7 +606,7 @@ PUT my-index-000001 ``` 1. This index is sorted by `username` first then by `timestamp`. -2. …​ in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations: +2. … in ascending order for the `username` field and in descending order for the `timestamp` field.1. could be used to optimize these composite aggregations: diff --git a/docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md b/docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md index 7169b1d28da1..6ff26774e3ec 100644 --- a/docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation.md @@ -679,7 +679,7 @@ Response: } ``` -The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday…​ 7 for Sunday. +The response will contain all the buckets having the relative day of the week as key : 1 for Monday, 2 for Tuesday… 7 for Sunday. diff --git a/docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md b/docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md index 3c17abd2db2b..07ac4e2c8f91 100644 --- a/docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-bucket-rare-terms-aggregation.md @@ -7,7 +7,7 @@ mapped_pages: # Rare terms aggregation [search-aggregations-bucket-rare-terms-aggregation] -A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation +A multi-bucket value source based aggregation which finds "rare" terms — terms that are at the long-tail of the distribution and are not frequent. Conceptually, this is like a `terms` aggregation that is sorted by `_count` ascending. As noted in the [terms aggregation docs](/reference/aggregations/search-aggregations-bucket-terms-aggregation.md#search-aggregations-bucket-terms-aggregation-order), actually ordering a `terms` agg by count ascending has unbounded error. Instead, you should use the `rare_terms` aggregation ## Syntax [_syntax_3] @@ -117,7 +117,7 @@ This does, however, mean that a large number of results can be returned if chose ## Max Bucket Limit [search-aggregations-bucket-rare-terms-aggregation-max-buckets] -The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node…​ but a shard may have already tripped the `max_buckets` soft limit and aborted the request. +The Rare Terms aggregation is more liable to trip the `search.max_buckets` soft limit than other aggregations due to how it works. The `max_bucket` soft-limit is evaluated on a per-shard basis while the aggregation is collecting results. It is possible for a term to be "rare" on a shard but become "not rare" once all the shard results are merged together. This means that individual shards tend to collect more buckets than are truly rare, because they only have their own local view. This list is ultimately pruned to the correct, smaller list of rare terms on the coordinating node… but a shard may have already tripped the `max_buckets` soft limit and aborted the request. When aggregating on fields that have potentially many "rare" terms, you may need to increase the `max_buckets` soft limit. Alternatively, you might need to find a way to filter the results to return fewer rare values (smaller time span, filter by category, etc), or re-evaluate your definition of "rare" (e.g. if something appears 100,000 times, is it truly "rare"?) diff --git a/docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md b/docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md index 7eaf9b06bd8b..6175926e8002 100644 --- a/docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-bucket-significanttext-aggregation.md @@ -21,7 +21,7 @@ Re-analyzing *large* result sets will require a lot of time and memory. It is re * Suggesting "H5N1" when users search for "bird flu" to help expand queries * Suggesting keywords relating to stock symbol $ATI for use in an automated news classifier -In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they* …​). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency. +In these cases the words being selected are not simply the most popular terms in results. The most popular words tend to be very boring (*and, of, the, we, I, they* … ). The significant words are the ones that have undergone a significant change in popularity measured between a *foreground* and *background* set. If the term "H5N1" only exists in 5 documents in a 10 million document index and yet is found in 4 of the 100 documents that make up a user’s search results that is significant and probably very relevant to their search. 5/10,000,000 vs 4/100 is a big swing in frequency. ## Basic use [_basic_use_2] diff --git a/docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md b/docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md index f84deb2d6297..598d8f761752 100644 --- a/docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-bucket-terms-aggregation.md @@ -696,7 +696,7 @@ When aggregating on multiple indices the type of the aggregated field may not be ### Failed Trying to Format Bytes [_failed_trying_to_format_bytes] -When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes…​". This is usually caused by two of the indices not having the same mapping type for the field being aggregated. +When running a terms aggregation (or other aggregation, but in practice usually terms) over multiple indices, you may get an error that starts with "Failed trying to format bytes… ". This is usually caused by two of the indices not having the same mapping type for the field being aggregated. **Use an explicit `value_type`** Although it’s best to correct the mappings, you can work around this issue if the field is unmapped in one of the indices. Setting the `value_type` parameter can resolve the issue by coercing the unmapped field into the correct type. diff --git a/docs/reference/aggregations/search-aggregations-metrics-boxplot-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-boxplot-aggregation.md index d8997339a954..5966978a3bc2 100644 --- a/docs/reference/aggregations/search-aggregations-metrics-boxplot-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-metrics-boxplot-aggregation.md @@ -126,7 +126,7 @@ GET latency/_search 1. Compression controls memory usage and approximation error -The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`. +The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`. Therefore, by increasing the compression value, you can increase the accuracy of your percentiles at the cost of more memory. Larger compression values also make the algorithm slower since the underlying tree data structure grows in size, resulting in more expensive operations. The default compression value is `100`. diff --git a/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md index 9d1695300774..b45a126bda75 100644 --- a/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-metrics-percentile-aggregation.md @@ -60,7 +60,7 @@ By default, the `percentile` metric will generate a range of percentiles: `[ 1, As you can see, the aggregation will return a calculated value for each percentile in the default range. If we assume response times are in milliseconds, it is immediately obvious that the webpage normally loads in 10-720ms, but occasionally spikes to 940-980ms. -Often, administrators are only interested in outliers — the extreme percentiles. We can specify just the percents we are interested in (requested percentiles must be a value between 0-100 inclusive): +Often, administrators are only interested in outliers — the extreme percentiles. We can specify just the percents we are interested in (requested percentiles must be a value between 0-100 inclusive): ```console GET latency/_search @@ -177,7 +177,7 @@ GET latency/_search There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`. -Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. +Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated. The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)). @@ -222,7 +222,7 @@ GET latency/_search 1. Compression controls memory usage and approximation error -The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`. +The TDigest algorithm uses a number of "nodes" to approximate percentiles — the more nodes available, the higher the accuracy (and large memory footprint) proportional to the volume of data. The `compression` parameter limits the maximum number of nodes to `20 * compression`. Therefore, by increasing the compression value, you can increase the accuracy of your percentiles at the cost of more memory. Larger compression values also make the algorithm slower since the underlying tree data structure grows in size, resulting in more expensive operations. The default compression value is `100`. diff --git a/docs/reference/aggregations/search-aggregations-metrics-weight-avg-aggregation.md b/docs/reference/aggregations/search-aggregations-metrics-weight-avg-aggregation.md index ea83567f7fdf..6829795ae7bc 100644 --- a/docs/reference/aggregations/search-aggregations-metrics-weight-avg-aggregation.md +++ b/docs/reference/aggregations/search-aggregations-metrics-weight-avg-aggregation.md @@ -9,7 +9,7 @@ mapped_pages: A `single-value` metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents. -When calculating a regular average, each datapoint has an equal "weight" …​ it contributes equally to the final value. Weighted averages, on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document. +When calculating a regular average, each datapoint has an equal "weight" … it contributes equally to the final value. Weighted averages, on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document. As a formula, a weighted average is the `∑(value * weight) / ∑(weight)` diff --git a/docs/reference/elasticsearch-plugins/analysis-kuromoji-analyzer.md b/docs/reference/elasticsearch-plugins/analysis-kuromoji-analyzer.md index 6b1b96e841f6..bed37ed53763 100644 --- a/docs/reference/elasticsearch-plugins/analysis-kuromoji-analyzer.md +++ b/docs/reference/elasticsearch-plugins/analysis-kuromoji-analyzer.md @@ -22,7 +22,7 @@ It supports the `mode` and `user_dictionary` settings from [`kuromoji_tokenizer` The `kuromoji_tokenizer` tokenizer uses characters from the MeCab-IPADIC dictionary to split text into tokens. The dictionary includes some full-width characters, such as `o` and `f`. If a text contains full-width characters, the tokenizer can produce unexpected tokens. -For example, the `kuromoji_tokenizer` tokenizer converts the text `Culture of Japan` to the tokens `[ culture, o, f, japan ]` instead of `[ culture, of, japan ]`. +For example, the `kuromoji_tokenizer` tokenizer converts the text `Culture of Japan` to the tokens `[ culture, o, f, japan ]` instead of `[ culture, of, japan ]`. To avoid this, add the [`icu_normalizer` character filter](/reference/elasticsearch-plugins/analysis-icu-normalization-charfilter.md) to a custom analyzer based on the `kuromoji` analyzer. The `icu_normalizer` character filter converts full-width characters to their normal equivalents. diff --git a/docs/reference/elasticsearch-plugins/integrations.md b/docs/reference/elasticsearch-plugins/integrations.md index 56e94ffc8c5d..c8aa783593e1 100644 --- a/docs/reference/elasticsearch-plugins/integrations.md +++ b/docs/reference/elasticsearch-plugins/integrations.md @@ -31,7 +31,7 @@ Integrations are not plugins, but are external tools or modules that make it eas * [Ingest processor template](https://github.com/spinscale/cookiecutter-elasticsearch-ingest-processor): A template for creating new ingest processors. * [Kafka Standalone Consumer (Indexer)](https://github.com/BigDataDevs/kafka-elasticsearch-consumer): Kafka Standalone Consumer [Indexer] will read messages from Kafka in batches, processes(as implemented) and bulk-indexes them into Elasticsearch. Flexible and scalable. More documentation in above GitHub repo’s Wiki. * [Scrutineer](https://github.com/Aconex/scrutineer): A high performance consistency checker to compare what you’ve indexed with your source of truth content (e.g. DB) -* [FS Crawler](https://github.com/dadoonet/fscrawler): The File System (FS) crawler allows to index documents (PDF, Open Office…​) from your local file system and over SSH. (by David Pilato) +* [FS Crawler](https://github.com/dadoonet/fscrawler): The File System (FS) crawler allows to index documents (PDF, Open Office… ) from your local file system and over SSH. (by David Pilato) * [Elasticsearch Evolution](https://github.com/senacor/elasticsearch-evolution): A library to migrate elasticsearch mappings. * [PGSync](https://pgsync.com): A tool for syncing data from Postgres to Elasticsearch. diff --git a/docs/reference/elasticsearch/configuration-reference/networking-settings.md b/docs/reference/elasticsearch/configuration-reference/networking-settings.md index 85f2287d5a0a..d657d514095d 100644 --- a/docs/reference/elasticsearch/configuration-reference/networking-settings.md +++ b/docs/reference/elasticsearch/configuration-reference/networking-settings.md @@ -428,7 +428,7 @@ The `transport.compress` setting always configures local cluster request compres ### Response compression [response-compression] -The compression settings do not configure compression for responses. {{es}} will compress a response if the inbound request was compressed—​even when compression is not enabled. Similarly, {{es}} will not compress a response if the inbound request was uncompressed—​even when compression is enabled. The compression scheme used to compress a response will be the same scheme the remote node used to compress the request. +The compression settings do not configure compression for responses. {{es}} will compress a response if the inbound request was compressed— even when compression is not enabled. Similarly, {{es}} will not compress a response if the inbound request was uncompressed— even when compression is enabled. The compression scheme used to compress a response will be the same scheme the remote node used to compress the request. diff --git a/docs/reference/elasticsearch/index-settings/sorting-conjunctions.md b/docs/reference/elasticsearch/index-settings/sorting-conjunctions.md index ad37b309c0be..f2d5a92f353e 100644 --- a/docs/reference/elasticsearch/index-settings/sorting-conjunctions.md +++ b/docs/reference/elasticsearch/index-settings/sorting-conjunctions.md @@ -5,7 +5,7 @@ mapped_pages: # Use index sorting to speed up conjunctions [index-modules-index-sorting-conjunctions] -Index sorting can be useful in order to organize Lucene doc ids (not to be conflated with `_id`) in a way that makes conjunctions (a AND b AND …​) more efficient. In order to be efficient, conjunctions rely on the fact that if any clause does not match, then the entire conjunction does not match. By using index sorting, we can put documents that do not match together, which will help skip efficiently over large ranges of doc IDs that do not match the conjunction. +Index sorting can be useful in order to organize Lucene doc ids (not to be conflated with `_id`) in a way that makes conjunctions (a AND b AND … ) more efficient. In order to be efficient, conjunctions rely on the fact that if any clause does not match, then the entire conjunction does not match. By using index sorting, we can put documents that do not match together, which will help skip efficiently over large ranges of doc IDs that do not match the conjunction. This trick only works with low-cardinality fields. A rule of thumb is that you should sort first on fields that both have a low cardinality and are frequently used for filtering. The sort order (`asc` or `desc`) does not matter as we only care about putting values that would match the same clauses close to each other. diff --git a/docs/reference/elasticsearch/index-settings/sorting.md b/docs/reference/elasticsearch/index-settings/sorting.md index 567aa876d226..9c123c002087 100644 --- a/docs/reference/elasticsearch/index-settings/sorting.md +++ b/docs/reference/elasticsearch/index-settings/sorting.md @@ -35,7 +35,7 @@ PUT my-index-000001 ``` 1. This index is sorted by the `date` field -2. …​ in descending order. +2. … in descending order. It is also possible to sort the index by more than one field: @@ -64,7 +64,7 @@ PUT my-index-000001 ``` 1. This index is sorted by `username` first then by `date` -2. …​ in ascending order for the `username` field and in descending order for the `date` field. +2. … in ascending order for the `username` field and in descending order for the `date` field. Index sorting supports the following settings: diff --git a/docs/reference/elasticsearch/mapping-reference/array.md b/docs/reference/elasticsearch/mapping-reference/array.md index c53fd182abec..5650d98198ff 100644 --- a/docs/reference/elasticsearch/mapping-reference/array.md +++ b/docs/reference/elasticsearch/mapping-reference/array.md @@ -26,7 +26,7 @@ When adding a field dynamically, the first value in the array determines the fie Arrays with a mixture of data types are *not* supported: [ `10`, `"some string"` ] -An array may contain `null` values, which are either replaced by the configured [`null_value`](/reference/elasticsearch/mapping-reference/null-value.md) or skipped entirely. An empty array `[]` is treated as a missing field — a field with no values. +An array may contain `null` values, which are either replaced by the configured [`null_value`](/reference/elasticsearch/mapping-reference/null-value.md) or skipped entirely. An empty array `[]` is treated as a missing field — a field with no values. Nothing needs to be pre-configured in order to use arrays in documents, they are supported out of the box: diff --git a/docs/reference/elasticsearch/mapping-reference/eager-global-ordinals.md b/docs/reference/elasticsearch/mapping-reference/eager-global-ordinals.md index 65f46ec49250..2265661d0c19 100644 --- a/docs/reference/elasticsearch/mapping-reference/eager-global-ordinals.md +++ b/docs/reference/elasticsearch/mapping-reference/eager-global-ordinals.md @@ -41,7 +41,7 @@ PUT my-index-000001/_mapping } ``` -When `eager_global_ordinals` is enabled, global ordinals are built when a shard is [refreshed](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-refresh) — Elasticsearch always loads them before exposing changes to the content of the index. This shifts the cost of building global ordinals from search to index-time. Elasticsearch will also eagerly build global ordinals when creating a new copy of a shard, as can occur when increasing the number of replicas or relocating a shard onto a new node. +When `eager_global_ordinals` is enabled, global ordinals are built when a shard is [refreshed](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-refresh) — Elasticsearch always loads them before exposing changes to the content of the index. This shifts the cost of building global ordinals from search to index-time. Elasticsearch will also eagerly build global ordinals when creating a new copy of a shard, as can occur when increasing the number of replicas or relocating a shard onto a new node. Eager loading can be disabled at any time by updating the `eager_global_ordinals` setting: diff --git a/docs/reference/elasticsearch/mapping-reference/flattened.md b/docs/reference/elasticsearch/mapping-reference/flattened.md index f9866ac3c348..80d45d0ed8ee 100644 --- a/docs/reference/elasticsearch/mapping-reference/flattened.md +++ b/docs/reference/elasticsearch/mapping-reference/flattened.md @@ -90,7 +90,7 @@ Currently, flattened object fields can be used with the following query types: When querying, it is not possible to refer to field keys using wildcards, as in `{ "term": {"labels.time*": 1541457010}}`. Note that all queries, including `range`, treat the values as string keywords. Highlighting is not supported on `flattened` fields. -It is possible to sort on a flattened object field, as well as perform simple keyword-style aggregations such as `terms`. As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically. +It is possible to sort on a flattened object field, as well as perform simple keyword-style aggregations such as `terms`. As with queries, there is no special support for numerics — all values in the JSON object are treated as keywords. When sorting, this implies that values are compared lexicographically. Flattened object fields currently cannot be stored. It is not possible to specify the [`store`](/reference/elasticsearch/mapping-reference/mapping-store.md) parameter in the mapping. diff --git a/docs/reference/elasticsearch/mapping-reference/mapping-date-format.md b/docs/reference/elasticsearch/mapping-reference/mapping-date-format.md index 603b2422ac7c..7827dc55a574 100644 --- a/docs/reference/elasticsearch/mapping-reference/mapping-date-format.md +++ b/docs/reference/elasticsearch/mapping-reference/mapping-date-format.md @@ -23,7 +23,7 @@ PUT my-index-000001 } ``` -Many APIs which support date values also support [date math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, such as `now-1m/d` — the current time, minus one month, rounded down to the nearest day. +Many APIs which support date values also support [date math](/reference/elasticsearch/rest-apis/common-options.md#date-math) expressions, such as `now-1m/d` — the current time, minus one month, rounded down to the nearest day. ## Custom date formats [custom-date-formats] diff --git a/docs/reference/elasticsearch/mapping-reference/mapping-index-field.md b/docs/reference/elasticsearch/mapping-reference/mapping-index-field.md index 1b6f4f6cf006..07d0be350fe7 100644 --- a/docs/reference/elasticsearch/mapping-reference/mapping-index-field.md +++ b/docs/reference/elasticsearch/mapping-reference/mapping-index-field.md @@ -57,7 +57,7 @@ GET index_1,index_2/_search 4. Accessing the `_index` field in scripts -The `_index` field is exposed virtually — it is not added to the Lucene index as a real field. This means that you can use the `_index` field in a `term` or `terms` query (or any query that is rewritten to a `term` query, such as the `match`, `query_string` or `simple_query_string` query), as well as `prefix` and `wildcard` queries. However, it does not support `regexp` and `fuzzy` queries. +The `_index` field is exposed virtually — it is not added to the Lucene index as a real field. This means that you can use the `_index` field in a `term` or `terms` query (or any query that is rewritten to a `term` query, such as the `match`, `query_string` or `simple_query_string` query), as well as `prefix` and `wildcard` queries. However, it does not support `regexp` and `fuzzy` queries. Queries on the `_index` field accept index aliases in addition to concrete index names. diff --git a/docs/reference/elasticsearch/mapping-reference/parent-join.md b/docs/reference/elasticsearch/mapping-reference/parent-join.md index 365b10b90aa6..23d0e34441b9 100644 --- a/docs/reference/elasticsearch/mapping-reference/parent-join.md +++ b/docs/reference/elasticsearch/mapping-reference/parent-join.md @@ -139,7 +139,7 @@ The only case where the join field makes sense is if your data contains a one-to ## Searching with parent-join [_searching_with_parent_join] -The parent-join creates one field to index the name of the relation within the document (`my_parent`, `my_child`, …​). +The parent-join creates one field to index the name of the relation within the document (`my_parent`, `my_child`, … ). It also creates one field per parent/child relation. The name of this field is the name of the `join` field followed by `#` and the name of the parent in the relation. So for instance for the `my_parent` → [`my_child`, `another_child`] relation, the `join` field creates an additional field named `my_join_field#my_parent`. diff --git a/docs/reference/elasticsearch/rest-apis/api-conventions.md b/docs/reference/elasticsearch/rest-apis/api-conventions.md index dcddd77cde79..574e96e03ac9 100644 --- a/docs/reference/elasticsearch/rest-apis/api-conventions.md +++ b/docs/reference/elasticsearch/rest-apis/api-conventions.md @@ -55,7 +55,7 @@ For example, the following `traceparent` value would produce the following `trac ## GET and POST requests [get-requests] -A number of {{es}} GET APIs—​most notably the search API—​support a request body. While the GET action makes sense in the context of retrieving information, GET requests with a body are not supported by all HTTP libraries. All {{es}} GET APIs that require a body can also be submitted as POST requests. Alternatively, you can pass the request body as the [`source` query string parameter](#api-request-body-query-string) when using GET. +A number of {{es}} GET APIs— most notably the search API— support a request body. While the GET action makes sense in the context of retrieving information, GET requests with a body are not supported by all HTTP libraries. All {{es}} GET APIs that require a body can also be submitted as POST requests. Alternatively, you can pass the request body as the [`source` query string parameter](#api-request-body-query-string) when using GET. ## Cron expressions [api-cron-expressions] @@ -120,10 +120,10 @@ All elements are required except for `year`. See [Cron special characters](#cron : Increment. Use to separate values when specifying a time increment. The first value represents the starting point, and the second value represents the interval. For example, if you want the schedule to trigger every 20 minutes starting at the top of the hour, you could specify `0/20` in the `minutes` field. Similarly, specifying `1/5` in `day_of_month` field will trigger every 5 days starting on the first day of the month. `L` -: Last. Use in the `day_of_month` field to mean the last day of the month—​day 31 for January, day 28 for February in non-leap years, day 30 for April, and so on. Use alone in the `day_of_week` field in place of `7` or `SAT`, or after a particular day of the week to select the last day of that type in the month. For example `6L` means the last Friday of the month. You can specify `LW` in the `day_of_month` field to specify the last weekday of the month. Avoid using the `L` option when specifying lists or ranges of values, as the results likely won’t be what you expect. +: Last. Use in the `day_of_month` field to mean the last day of the month— day 31 for January, day 28 for February in non-leap years, day 30 for April, and so on. Use alone in the `day_of_week` field in place of `7` or `SAT`, or after a particular day of the week to select the last day of that type in the month. For example `6L` means the last Friday of the month. You can specify `LW` in the `day_of_month` field to specify the last weekday of the month. Avoid using the `L` option when specifying lists or ranges of values, as the results likely won’t be what you expect. `W` -: Weekday. Use to specify the weekday (Monday-Friday) nearest the given day. As an example, if you specify `15W` in the `day_of_month` field and the 15th is a Saturday, the schedule will trigger on the 14th. If the 15th is a Sunday, the schedule will trigger on Monday the 16th. If the 15th is a Tuesday, the schedule will trigger on Tuesday the 15th. However if you specify `1W` as the value for `day_of_month`, and the 1st is a Saturday, the schedule will trigger on Monday the 3rd—​it won’t jump over the month boundary. You can specify `LW` in the `day_of_month` field to specify the last weekday of the month. You can only use the `W` option when the `day_of_month` is a single day—​it is not valid when specifying a range or list of days. +: Weekday. Use to specify the weekday (Monday-Friday) nearest the given day. As an example, if you specify `15W` in the `day_of_month` field and the 15th is a Saturday, the schedule will trigger on the 14th. If the 15th is a Sunday, the schedule will trigger on Monday the 16th. If the 15th is a Tuesday, the schedule will trigger on Tuesday the 15th. However if you specify `1W` as the value for `day_of_month`, and the 1st is a Saturday, the schedule will trigger on Monday the 3rd— it won’t jump over the month boundary. You can specify `LW` in the `day_of_month` field to specify the last weekday of the month. You can only use the `W` option when the `day_of_month` is a single day— it is not valid when specifying a range or list of days. `#` : Nth XXX day in a month. Use in the `day_of_week` field to specify the nth XXX day of the month. For example, if you specify `6#1`, the schedule will trigger on the first Friday of the month. Note that if you specify `3#5` and there are not 5 Tuesdays in a particular month, the schedule won’t trigger that month. diff --git a/docs/reference/elasticsearch/rest-apis/common-options.md b/docs/reference/elasticsearch/rest-apis/common-options.md index b51918052fee..44e9abb580cd 100644 --- a/docs/reference/elasticsearch/rest-apis/common-options.md +++ b/docs/reference/elasticsearch/rest-apis/common-options.md @@ -23,7 +23,7 @@ Statistics are returned in a format suitable for humans (e.g. `"exists_time": "1 ## Date Math [date-math] -Most parameters which accept a formatted date value — such as `gt` and `lt` in [`range` queries](/reference/query-languages/query-dsl/query-dsl-range-query.md), or `from` and `to` in [`daterange` aggregations](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md) — understand date maths. +Most parameters which accept a formatted date value — such as `gt` and `lt` in [`range` queries](/reference/query-languages/query-dsl/query-dsl-range-query.md), or `from` and `to` in [`daterange` aggregations](/reference/aggregations/search-aggregations-bucket-daterange-aggregation.md) — understand date maths. The expression starts with an anchor date, which can either be `now`, or a date string ending with `||`. This anchor date can optionally be followed by one or more maths expressions: @@ -264,7 +264,7 @@ By default `flat_settings` is set to `false`. Some queries and APIs support parameters to allow inexact *fuzzy* matching, using the `fuzziness` parameter. -When querying `text` or `keyword` fields, `fuzziness` is interpreted as a [Levenshtein Edit Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) — the number of one character changes that need to be made to one string to make it the same as another string. +When querying `text` or `keyword` fields, `fuzziness` is interpreted as a [Levenshtein Edit Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) — the number of one character changes that need to be made to one string to make it the same as another string. The `fuzziness` parameter can be specified as: diff --git a/docs/reference/elasticsearch/rest-apis/compatibility.md b/docs/reference/elasticsearch/rest-apis/compatibility.md index d2d4a19e6081..c4f57b788fa9 100644 --- a/docs/reference/elasticsearch/rest-apis/compatibility.md +++ b/docs/reference/elasticsearch/rest-apis/compatibility.md @@ -16,7 +16,7 @@ When an API is targeted for removal or is going to be changed in a non-compatibl When you request REST API compatibility, {{es}} attempts to honor the previous REST API version. {{es}} attempts to apply the most compatible URL, request body, response body, and HTTP parameters. -For compatible APIs, this has no effect—​it only impacts calls to APIs that have breaking changes from the previous version. An error can still be returned in compatibility mode if {{es}} cannot automatically resolve the incompatibilities. +For compatible APIs, this has no effect— it only impacts calls to APIs that have breaking changes from the previous version. An error can still be returned in compatibility mode if {{es}} cannot automatically resolve the incompatibilities. ::::{important} REST API compatibility does not guarantee the same behavior as the prior version. It instructs {{es}} to automatically resolve any incompatibilities so the request can be processed instead of returning an error. diff --git a/docs/reference/elasticsearch/rest-apis/paginate-search-results.md b/docs/reference/elasticsearch/rest-apis/paginate-search-results.md index 36c389a9f420..9fd0484ea759 100644 --- a/docs/reference/elasticsearch/rest-apis/paginate-search-results.md +++ b/docs/reference/elasticsearch/rest-apis/paginate-search-results.md @@ -324,7 +324,7 @@ POST /_search/scroll } ``` -1. `GET` or `POST` can be used and the URL should not include the `index` name — this is specified in the original `search` request instead. +1. `GET` or `POST` can be used and the URL should not include the `index` name — this is specified in the original `search` request instead. 2. The `scroll` parameter tells Elasticsearch to keep the search context open for another `1m`. 3. The `scroll_id` parameter @@ -332,7 +332,7 @@ POST /_search/scroll The `size` parameter allows you to configure the maximum number of hits to be returned with each batch of results. Each call to the `scroll` API returns the next batch of results until there are no more results left to return, ie the `hits` array is empty. ::::{important} -The initial search request and each subsequent scroll request each return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t always change — in any case, only the most recently received `_scroll_id` should be used. +The initial search request and each subsequent scroll request each return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t always change — in any case, only the most recently received `_scroll_id` should be used. :::: @@ -360,7 +360,7 @@ GET /_search?scroll=1m A scroll returns all the documents which matched the search at the time of the initial search request. It ignores any subsequent changes to these documents. The `scroll_id` identifies a *search context* which keeps track of everything that {{es}} needs to return the correct documents. The search context is created by the initial request and kept alive by subsequent requests. -The `scroll` parameter (passed to the `search` request and to every `scroll` request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. `1m`, see [Time units](/reference/elasticsearch/rest-apis/api-conventions.md#time-units)) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each `scroll` request (with the `scroll` parameter) sets a new expiry time. If a `scroll` request doesn’t pass in the `scroll` parameter, then the search context will be freed as part of *that* `scroll` request. +The `scroll` parameter (passed to the `search` request and to every `scroll` request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. `1m`, see [Time units](/reference/elasticsearch/rest-apis/api-conventions.md#time-units)) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each `scroll` request (with the `scroll` parameter) sets a new expiry time. If a `scroll` request doesn’t pass in the `scroll` parameter, then the search context will be freed as part of *that* `scroll` request. Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted since they are still in use. diff --git a/docs/reference/elasticsearch/rest-apis/retrieve-selected-fields.md b/docs/reference/elasticsearch/rest-apis/retrieve-selected-fields.md index d39aaa3f0c21..51a965bc4001 100644 --- a/docs/reference/elasticsearch/rest-apis/retrieve-selected-fields.md +++ b/docs/reference/elasticsearch/rest-apis/retrieve-selected-fields.md @@ -575,7 +575,7 @@ Stored field values fetched from the document itself are always returned as an a Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored. ::::{note} -On its own, `stored_fields` cannot be used to load fields in nested objects — if a field contains a nested object in its path, then no data will be returned for that stored field. To access nested fields, `stored_fields` must be used within an [`inner_hits`](/reference/elasticsearch/rest-apis/retrieve-inner-hits.md) block. +On its own, `stored_fields` cannot be used to load fields in nested objects — if a field contains a nested object in its path, then no data will be returned for that stored field. To access nested fields, `stored_fields` must be used within an [`inner_hits`](/reference/elasticsearch/rest-apis/retrieve-inner-hits.md) block. :::: diff --git a/docs/reference/elasticsearch/rest-apis/search-suggesters.md b/docs/reference/elasticsearch/rest-apis/search-suggesters.md index 1a39320b573b..937bf7a7e9a8 100644 --- a/docs/reference/elasticsearch/rest-apis/search-suggesters.md +++ b/docs/reference/elasticsearch/rest-apis/search-suggesters.md @@ -147,7 +147,7 @@ Common suggest options include: % : The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified, then the number cannot be fractional. The shard level document frequencies are used for this option. % % `max_term_freq` -% : The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. +% : The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. % % `string_distance` % : Which string distance implementation to use for comparing how similar suggested terms are. Five possible values can be specified: @@ -305,7 +305,7 @@ Basic phrase suggest API parameters include: : Sets up suggestion highlighting. If not provided then no `highlighted` field is returned. If provided must contain exactly `pre_tag` and `post_tag`, which are wrapped around the changed tokens. If multiple tokens in a row are changed the entire phrase of changed tokens is wrapped rather than each token. `collate` -: Checks each suggestion against the specified `query` to prune suggestions for which no matching docs exist in the index. The collate query for a suggestion is run only on the local shard from which the suggestion has been generated from. The `query` must be specified and it can be templated. Refer to [Search templates](docs-content://solutions/search/search-templates.md). The current suggestion is automatically made available as the `{{suggestion}}` variable, which should be used in your query. You can still specify your own template `params` — the `suggestion` value will be added to the variables you specify. Additionally, you can specify a `prune` to control if all phrase suggestions will be returned; when set to `true` the suggestions will have an additional option `collate_match`, which will be `true` if matching documents for the phrase was found, `false` otherwise. The default value for `prune` is `false`. +: Checks each suggestion against the specified `query` to prune suggestions for which no matching docs exist in the index. The collate query for a suggestion is run only on the local shard from which the suggestion has been generated from. The `query` must be specified and it can be templated. Refer to [Search templates](docs-content://solutions/search/search-templates.md). The current suggestion is automatically made available as the `{{suggestion}}` variable, which should be used in your query. You can still specify your own template `params` — the `suggestion` value will be added to the variables you specify. Additionally, you can specify a `prune` to control if all phrase suggestions will be returned; when set to `true` the suggestions will have an additional option `collate_match`, which will be `true` if matching documents for the phrase was found, `false` otherwise. The default value for `prune` is `false`. ```console POST test/_search @@ -417,7 +417,7 @@ The parameters that direct generators support include: : The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified, then the number cannot be fractional. The shard level document frequencies are used for this option. `max_term_freq` -: The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. +: The maximum threshold in number of documents in which a suggest text token can exist in order to be included. Can be a relative percentage number (e.g., 0.4) or an absolute number to represent document frequencies. If a value higher than 1 is specified, then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms — which are usually spelled correctly — from being spellchecked. This also improves the spellcheck performance. The shard level document frequencies are used for this option. `pre_filter` : A filter (analyzer) that is applied to each of the tokens passed to this candidate generator. This filter is applied to the original token before candidates are generated. @@ -719,7 +719,7 @@ When set to true, this option can slow down search because more suggestions need ### Fuzzy queries [fuzzy] -The completion suggester also supports fuzzy queries — this means you can have a typo in your search and still get results back. +The completion suggester also supports fuzzy queries — this means you can have a typo in your search and still get results back. ```console POST music/_search?pretty diff --git a/docs/reference/elasticsearch/rest-apis/shard-request-cache.md b/docs/reference/elasticsearch/rest-apis/shard-request-cache.md index e8448c157934..f65b0bdb1c14 100644 --- a/docs/reference/elasticsearch/rest-apis/shard-request-cache.md +++ b/docs/reference/elasticsearch/rest-apis/shard-request-cache.md @@ -6,7 +6,7 @@ mapped_pages: When a search request is run against an index or against many indices, each involved shard runs the search locally and returns its local results to the coordinating node, which combines these shard-level results into a global result set. -The shard-level request cache module caches the local results on each shard. This allows frequently used (and potentially heavy) search requests to return results almost instantly. The requests cache is a very good fit for the logging use case, where only the most recent index is being actively updated — results from older indices will be served directly from the cache. +The shard-level request cache module caches the local results on each shard. This allows frequently used (and potentially heavy) search requests to return results almost instantly. The requests cache is a very good fit for the logging use case, where only the most recent index is being actively updated — results from older indices will be served directly from the cache. You can control the size and expiration of the cache at the node level using the [shard request cache settings](/reference/elasticsearch/configuration-reference/shard-request-cache-settings.md). @@ -20,7 +20,7 @@ Scripted queries that use the API calls which are non-deterministic, such as `Ma ## Cache invalidation [_cache_invalidation] -The cache is smart — it keeps the same *near real-time* promise as uncached search. +The cache is smart — it keeps the same *near real-time* promise as uncached search. Cached results are invalidated automatically whenever the shard refreshes to pick up changes to the documents or when you update the mapping. In other words you will always get the same results from the cache as you would for an uncached search request. @@ -76,7 +76,7 @@ Requests where `size` is greater than `0` will not be cached even if the request ## Cache key [_cache_key] -A hash of the whole JSON body is used as the cache key. This means that if the JSON changes — for instance if keys are output in a different order — then the cache key will not be recognised. +A hash of the whole JSON body is used as the cache key. This means that if the JSON changes — for instance if keys are output in a different order — then the cache key will not be recognised. ::::{tip} Most JSON libraries support a canonical mode, which ensures that JSON keys are always emitted in the same order. This canonical mode can be used in the application to ensure that a request is always serialized in the same way. diff --git a/docs/reference/elasticsearch/rest-apis/sort-search-results.md b/docs/reference/elasticsearch/rest-apis/sort-search-results.md index 812fc68d54af..455dd7258fa0 100644 --- a/docs/reference/elasticsearch/rest-apis/sort-search-results.md +++ b/docs/reference/elasticsearch/rest-apis/sort-search-results.md @@ -113,7 +113,7 @@ Elasticsearch supports sorting by array or multi-valued fields. The `mode` optio `median` : Use the median of all values as sort value. Only applicable for number based array fields. -The default sort mode in the ascending sort order is `min` — the lowest value is picked. The default sort mode in the descending order is `max` — the highest value is picked. +The default sort mode in the ascending sort order is `min` — the lowest value is picked. The default sort mode in the descending order is `max` — the highest value is picked. ### Sort mode example usage [_sort_mode_example_usage] diff --git a/docs/reference/enrich-processor/attachment.md b/docs/reference/enrich-processor/attachment.md index 78cda9958740..02fa88bee4d8 100644 --- a/docs/reference/enrich-processor/attachment.md +++ b/docs/reference/enrich-processor/attachment.md @@ -21,7 +21,7 @@ $$$attachment-options$$$ | `target_field` | no | attachment | The field that will hold the attachment information | | `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit. | | `indexed_chars_field` | no | `null` | Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`. | -| `properties` | no | all properties |  Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language` | +| `properties` | no | all properties | Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language` | | `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document | | `remove_binary` | encouraged | `false` | If `true`, the binary `field` will be removed from the document. This option is not required, but setting it explicitly is encouraged, and omitting it will result in a warning. | | `resource_name` | no | | Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable [Resource Name Based Detection](https://tika.apache.org/1.24.1/detection.html#Resource_Name_Based_Detection). | diff --git a/docs/reference/enrich-processor/dissect-processor.md b/docs/reference/enrich-processor/dissect-processor.md index 6d2c5723db1a..e769a9f05caf 100644 --- a/docs/reference/enrich-processor/dissect-processor.md +++ b/docs/reference/enrich-processor/dissect-processor.md @@ -89,9 +89,9 @@ $$$dissect-key-modifiers-table$$$ ### Right padding modifier (`->`) [dissect-modifier-skip-right-padding] -The algorithm that performs the dissection is very strict in that it requires all characters in the pattern to match the source string. For example, the pattern `%{{fookey}} %{{barkey}}` (1 space), will match the string "foo bar" (1 space), but will not match the string "foo  bar" (2 spaces) since the pattern has only 1 space and the source string has 2 spaces. +The algorithm that performs the dissection is very strict in that it requires all characters in the pattern to match the source string. For example, the pattern `%{{fookey}} %{{barkey}}` (1 space), will match the string "foo bar" (1 space), but will not match the string "foo bar" (2 spaces) since the pattern has only 1 space and the source string has 2 spaces. -The right padding modifier helps with this case. Adding the right padding modifier to the pattern `%{fookey->} %{{barkey}}`, It will now will match "foo bar" (1 space) and "foo  bar" (2 spaces) and even "foo          bar" (10 spaces). +The right padding modifier helps with this case. Adding the right padding modifier to the pattern `%{fookey->} %{{barkey}}`, It will now will match "foo bar" (1 space) and "foo bar" (2 spaces) and even "foo bar" (10 spaces). Use the right padding modifier to allow for repetition of the characters after a `%{keyname->}`. @@ -102,7 +102,7 @@ Right padding modifier example | | | | --- | --- | | **Pattern** | `%{ts->} %{{level}}` | -| **Input** | 1998-08-10T17:15:42,466          WARN | +| **Input** | 1998-08-10T17:15:42,466 WARN | | **Result** | * ts = 1998-08-10T17:15:42,466
* level = WARN
| The right padding modifier may be used with an empty key to help skip unwanted data. For example, the same input string, but wrapped with brackets requires the use of an empty right padded key to achieve the same result. @@ -112,7 +112,7 @@ Right padding modifier with empty key example | | | | --- | --- | | **Pattern** | `[%{{ts}}]%{->}[%{{level}}]` | -| **Input** | [1998-08-10T17:15:42,466]            [WARN] | +| **Input** | [1998-08-10T17:15:42,466] [WARN] | | **Result** | * ts = 1998-08-10T17:15:42,466
* level = WARN
| diff --git a/docs/reference/query-languages/esql/esql-process-data-with-dissect-grok.md b/docs/reference/query-languages/esql/esql-process-data-with-dissect-grok.md index 0e21b34b4bb0..ee4edeb7f91e 100644 --- a/docs/reference/query-languages/esql/esql-process-data-with-dissect-grok.md +++ b/docs/reference/query-languages/esql/esql-process-data-with-dissect-grok.md @@ -120,9 +120,9 @@ $$$esql-dissect-key-modifiers-table$$$ #### Right padding modifier (`->`) [esql-dissect-modifier-skip-right-padding] -The algorithm that performs the dissection is very strict in that it requires all characters in the pattern to match the source string. For example, the pattern `%{{fookey}} %{{barkey}}` (1 space), will match the string "foo bar" (1 space), but will not match the string "foo  bar" (2 spaces) since the pattern has only 1 space and the source string has 2 spaces. +The algorithm that performs the dissection is very strict in that it requires all characters in the pattern to match the source string. For example, the pattern `%{{fookey}} %{{barkey}}` (1 space), will match the string "foo bar" (1 space), but will not match the string "foo bar" (2 spaces) since the pattern has only 1 space and the source string has 2 spaces. -The right padding modifier helps with this case. Adding the right padding modifier to the pattern `%{fookey->} %{{barkey}}`, It will now will match "foo bar" (1 space) and "foo  bar" (2 spaces) and even "foo          bar" (10 spaces). +The right padding modifier helps with this case. Adding the right padding modifier to the pattern `%{fookey->} %{{barkey}}`, It will now will match "foo bar" (1 space) and "foo bar" (2 spaces) and even "foo bar" (10 spaces). Use the right padding modifier to allow for repetition of the characters after a `%{keyname->}`. diff --git a/docs/reference/query-languages/query-dsl/compound-queries.md b/docs/reference/query-languages/query-dsl/compound-queries.md index bfcdf6c4c45a..d51f728472eb 100644 --- a/docs/reference/query-languages/query-dsl/compound-queries.md +++ b/docs/reference/query-languages/query-dsl/compound-queries.md @@ -10,7 +10,7 @@ Compound queries wrap other compound or leaf queries, either to combine their re The queries in this group are: [`bool` query](/reference/query-languages/query-dsl/query-dsl-bool-query.md) -: The default query for combining multiple leaf or compound query clauses, as `must`, `should`, `must_not`, or `filter` clauses. The `must` and `should` clauses have their scores combined — the more matching clauses, the better — while the `must_not` and `filter` clauses are executed in filter context. +: The default query for combining multiple leaf or compound query clauses, as `must`, `should`, `must_not`, or `filter` clauses. The `must` and `should` clauses have their scores combined — the more matching clauses, the better — while the `must_not` and `filter` clauses are executed in filter context. [`boosting` query](/reference/query-languages/query-dsl/query-dsl-boosting-query.md) : Return documents which match a `positive` query, but reduce the score of documents which also match a `negative` query. diff --git a/docs/reference/query-languages/query-dsl/query-dsl-bool-query.md b/docs/reference/query-languages/query-dsl/query-dsl-bool-query.md index cb4dc712979f..5f64e7d462b0 100644 --- a/docs/reference/query-languages/query-dsl/query-dsl-bool-query.md +++ b/docs/reference/query-languages/query-dsl/query-dsl-bool-query.md @@ -104,7 +104,7 @@ While nesting `bool` queries can be powerful, it can also lead to complex and sl ## Scoring with `bool.filter` [score-bool-filter] -Queries specified under the `filter` element have no effect on scoring — scores are returned as `0`. Scores are only affected by the query that has been specified. For instance, all three of the following queries return all documents where the `status` field contains the term `active`. +Queries specified under the `filter` element have no effect on scoring — scores are returned as `0`. Scores are only affected by the query that has been specified. For instance, all three of the following queries return all documents where the `status` field contains the term `active`. This first query assigns a score of `0` to all documents, as no scoring query has been specified: diff --git a/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md b/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md index cb6c642fe257..6cac2464b05c 100644 --- a/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md +++ b/docs/reference/query-languages/query-dsl/query-dsl-function-score-query.md @@ -341,7 +341,7 @@ GET /_search : The point of origin used for calculating distance. Must be given as a number for numeric field, date for date fields and geo point for geo fields. Required for geo and numeric field. For date fields the default is `now`. Date math (for example `now-1h`) is supported for origin. `scale` -: Required for all types. Defines the distance from origin + offset at which the computed score will equal `decay` parameter. For geo fields: Can be defined as number+unit (1km, 12m,…​). Default unit is meters. For date fields: Can to be defined as a number+unit ("1h", "10d",…​). Default unit is milliseconds. For numeric field: Any number. +: Required for all types. Defines the distance from origin + offset at which the computed score will equal `decay` parameter. For geo fields: Can be defined as number+unit (1km, 12m,… ). Default unit is meters. For date fields: Can to be defined as a number+unit ("1h", "10d",… ). Default unit is milliseconds. For numeric field: Any number. `offset` : If an `offset` is defined, the decay function will only compute the decay function for documents with a distance greater than the defined `offset`. The default is 0. diff --git a/docs/reference/query-languages/query-dsl/query-dsl-multi-match-query.md b/docs/reference/query-languages/query-dsl/query-dsl-multi-match-query.md index 8d1e4c9c1536..38b818ae769e 100644 --- a/docs/reference/query-languages/query-dsl/query-dsl-multi-match-query.md +++ b/docs/reference/query-languages/query-dsl/query-dsl-multi-match-query.md @@ -144,7 +144,7 @@ Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`, `fuzzines :name: operator-min -The `best_fields` and `most_fields` types are *field-centric* — they generate a `match` query **per field**. This means that the `operator` and `minimum_should_match` parameters are applied to each field individually, which is probably not what you want. +The `best_fields` and `most_fields` types are *field-centric* — they generate a `match` query **per field**. This means that the `operator` and `minimum_should_match` parameters are applied to each field individually, which is probably not what you want. Take this query for example: diff --git a/docs/reference/query-languages/query-dsl/query-dsl-query-string-query.md b/docs/reference/query-languages/query-dsl/query-dsl-query-string-query.md index 1292362cdf68..72f0062037ee 100644 --- a/docs/reference/query-languages/query-dsl/query-dsl-query-string-query.md +++ b/docs/reference/query-languages/query-dsl/query-dsl-query-string-query.md @@ -168,9 +168,9 @@ The `time_zone` parameter does **not** affect the [date math](/reference/elastic The query string mini-language is used by the Query string and by the `q` query string parameter in the [`search` API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search). -The query string is parsed into a series of *terms* and *operators*. A term can be a single word — `quick` or `brown` — or a phrase, surrounded by double quotes — `"quick brown"` — which searches for all the words in the phrase, in the same order. +The query string is parsed into a series of *terms* and *operators*. A term can be a single word — `quick` or `brown` — or a phrase, surrounded by double quotes — `"quick brown"` — which searches for all the words in the phrase, in the same order. -Operators allow you to customize the search — the available options are explained below. +Operators allow you to customize the search — the available options are explained below. #### Field names [_field_names] @@ -221,7 +221,7 @@ Wildcard searches can be run on individual terms, using `?` to replace a single ``` qu?ck bro* ``` -Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string `"a* b* c*"`. +Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string `"a* b* c*"`. ::::{warning} Pure wildcards `\*` are rewritten to [`exists`](/reference/query-languages/query-dsl/query-dsl-exists-query.md) queries for efficiency. As a consequence, the wildcard `"field:*"` would match documents with an empty value like the following: @@ -405,7 +405,7 @@ states that: * `fox` must be present * `news` must not be present -* `quick` and `brown` are optional — their presence increases the relevance +* `quick` and `brown` are optional — their presence increases the relevance The familiar boolean operators `AND`, `OR` and `NOT` (also written `&&`, `||` and `!`) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together. For instance the previous query could be rewritten as: diff --git a/docs/reference/query-languages/query-dsl/span-queries.md b/docs/reference/query-languages/query-dsl/span-queries.md index b5bcbd130c2f..7f1db44312bf 100644 --- a/docs/reference/query-languages/query-dsl/span-queries.md +++ b/docs/reference/query-languages/query-dsl/span-queries.md @@ -32,7 +32,7 @@ The queries in this group are: : Wraps another span query, and excludes any documents which match that query. [`span_or` query](/reference/query-languages/query-dsl/query-dsl-span-query.md) -: Combines multiple span queries — returns documents which match any of the specified queries. +: Combines multiple span queries — returns documents which match any of the specified queries. [`span_term` query](/reference/query-languages/query-dsl/query-dsl-span-term-query.md) : The equivalent of the [`term` query](/reference/query-languages/query-dsl/query-dsl-term-query.md) but for use with other span queries. diff --git a/docs/reference/query-languages/sql/sql-functions-conditional.md b/docs/reference/query-languages/sql/sql-functions-conditional.md index e65ff92c63ff..2d1f1b06a19a 100644 --- a/docs/reference/query-languages/sql/sql-functions-conditional.md +++ b/docs/reference/query-languages/sql/sql-functions-conditional.md @@ -154,7 +154,7 @@ COALESCE( 2. 2nd expression -…​ +… **N**th expression @@ -196,7 +196,7 @@ GREATEST( 2. 2nd expression -…​ +… **N**th expression @@ -355,7 +355,7 @@ LEAST( 2. 2nd expression -…​ +… **N**th expression diff --git a/docs/reference/query-languages/sql/sql-functions-datetime.md b/docs/reference/query-languages/sql/sql-functions-datetime.md index bd9aa6bb762b..0d9c8d3755a8 100644 --- a/docs/reference/query-languages/sql/sql-functions-datetime.md +++ b/docs/reference/query-languages/sql/sql-functions-datetime.md @@ -1153,7 +1153,7 @@ DAY_NAME(datetime_exp) <1> **Output**: string -**Description**: Extract the day of the week from a date/datetime in text format (`Monday`, `Tuesday`…​). +**Description**: Extract the day of the week from a date/datetime in text format (`Monday`, `Tuesday`… ). ```sql SELECT DAY_NAME(CAST('2018-02-19T10:23:27Z' AS TIMESTAMP)) AS day; @@ -1321,7 +1321,7 @@ MONTH_NAME(datetime_exp) <1> **Output**: string -**Description**: Extract the month from a date/datetime in text format (`January`, `February`…​). +**Description**: Extract the month from a date/datetime in text format (`January`, `February`… ). ```sql SELECT MONTH_NAME(CAST('2018-02-19T10:23:27Z' AS TIMESTAMP)) AS month; diff --git a/docs/reference/query-languages/sql/sql-lexical-structure.md b/docs/reference/query-languages/sql/sql-lexical-structure.md index 855edf57a16d..06f313b13f9e 100644 --- a/docs/reference/query-languages/sql/sql-lexical-structure.md +++ b/docs/reference/query-languages/sql/sql-lexical-structure.md @@ -19,7 +19,7 @@ Take the following example: SELECT * FROM table ``` -This query has four tokens: `SELECT`, `*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are *key words* meaning words that have a fixed meaning in SQL. The token `table` is an *identifier* meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc…​ +This query has four tokens: `SELECT`, `*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are *key words* meaning words that have a fixed meaning in SQL. The token `table` is an *identifier* meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc… As one can see, both key words and identifiers have the *same* lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the [reserved appendix](/reference/query-languages/sql/sql-syntax-reserved.md). Do note that key words are case-insensitive meaning the previous example can be written as: @@ -127,7 +127,7 @@ A few characters that are not alphanumeric have a dedicated meaning different fr | --- | --- | | `*` | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions. | | `,` | Commas are used to enumerate the elements of a list. | -| `.` | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc…​). | +| `.` | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc… ). | | `()` | Parentheses are used for specific SQL commands, function declarations or to enforce precedence. | diff --git a/docs/reference/query-languages/sql/sql-syntax-select.md b/docs/reference/query-languages/sql/sql-syntax-select.md index f1d131c029a1..3eea3d1394e9 100644 --- a/docs/reference/query-languages/sql/sql-syntax-select.md +++ b/docs/reference/query-languages/sql/sql-syntax-select.md @@ -120,7 +120,7 @@ where: `table_name` : Represents the name (optionally qualified) of an existing table, either a concrete or base one (actual index) or alias. -If the table name contains special SQL characters (such as `.`,`-`,`*`,etc…​) use double quotes to escape them: +If the table name contains special SQL characters (such as `.`,`-`,`*`,etc… ) use double quotes to escape them: ```sql SELECT * FROM "emp" LIMIT 1; diff --git a/docs/reference/scripting-languages/painless/painless-operators.md b/docs/reference/scripting-languages/painless/painless-operators.md index 9dcebd835535..06f0e559f4f7 100644 --- a/docs/reference/scripting-languages/painless/painless-operators.md +++ b/docs/reference/scripting-languages/painless/painless-operators.md @@ -22,9 +22,9 @@ An operator is the most basic action that can be taken to evaluate values in a s | [Map Initialization](/reference/scripting-languages/painless/painless-operators-reference.md#map-initialization-operator) | [Reference](/reference/scripting-languages/painless/painless-operators-reference.md) | [:] | 1 | left → right | | [Map Access](/reference/scripting-languages/painless/painless-operators-reference.md#map-access-operator) | [Reference](/reference/scripting-languages/painless/painless-operators-reference.md) | [] | 1 | left → right | | [Post Increment](/reference/scripting-languages/painless/painless-operators-numeric.md#post-increment-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | ++ | 1 | left → right | -| [Post Decrement](/reference/scripting-languages/painless/painless-operators-numeric.md#post-decrement-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) |  —  | 1 | left → right | +| [Post Decrement](/reference/scripting-languages/painless/painless-operators-numeric.md#post-decrement-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | — | 1 | left → right | | [Pre Increment](/reference/scripting-languages/painless/painless-operators-numeric.md#pre-increment-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | ++ | 2 | right → left | -| [Pre Decrement](/reference/scripting-languages/painless/painless-operators-numeric.md#pre-decrement-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) |  —  | 2 | right → left | +| [Pre Decrement](/reference/scripting-languages/painless/painless-operators-numeric.md#pre-decrement-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | — | 2 | right → left | | [Unary Positive](/reference/scripting-languages/painless/painless-operators-numeric.md#unary-positive-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | + | 2 | right → left | | [Unary Negative](/reference/scripting-languages/painless/painless-operators-numeric.md#unary-negative-operator) | [Numeric](/reference/scripting-languages/painless/painless-operators-numeric.md) | - | 2 | right → left | | [Boolean Not](/reference/scripting-languages/painless/painless-operators-boolean.md#boolean-not-operator) | [Boolean](/reference/scripting-languages/painless/painless-operators-boolean.md) | ! | 2 | right → left | diff --git a/docs/reference/search-connectors/index.md b/docs/reference/search-connectors/index.md index 7f1ed16cf0a4..30452ad9b11a 100644 --- a/docs/reference/search-connectors/index.md +++ b/docs/reference/search-connectors/index.md @@ -14,14 +14,14 @@ $$$es-connectors-native$$$ :::{note} -This page is about Search connectors that synchronize third-party data into {{es}}. If you’re looking for Kibana connectors to integrate with services like generative AI model providers, refer to [Kibana Connectors](docs-content://deploy-manage/manage-connectors.md). +This page is about Search connectors that synchronize third-party data into {{es}}. If you’re looking for Kibana connectors to integrate with services like generative AI model providers, refer to [Kibana Connectors](docs-content://deploy-manage/manage-connectors.md). ::: -A _connector_ is an Elastic integration that syncs data from an original data source to {{es}}. Use connectors to create searchable, read-only replicas of your data in {{es}}. +A _connector_ is an Elastic integration that syncs data from an original data source to {{es}}. Use connectors to create searchable, read-only replicas of your data in {{es}}. Each connector extracts the original files, records, or objects; and transforms them into documents within {{es}}. -These connectors are written in Python and the source code is available in the [`elastic/connectors`](https://github.com/elastic/connectors/tree/main/connectors/sources) repo. +These connectors are written in Python and the source code is available in the [`elastic/connectors`](https://github.com/elastic/connectors/tree/main/connectors/sources) repo. ## Available connectors @@ -87,6 +87,6 @@ In order to set up, configure, and run a connector you’ll be moving between yo ### Data source prerequisites -The first decision you need to make before deploying a connector is which third party service (data source) you want to sync to {{es}}. See the list of [available connectors](#available-connectors). +The first decision you need to make before deploying a connector is which third party service (data source) you want to sync to {{es}}. See the list of [available connectors](#available-connectors). -Note that each data source will have specific prerequisites you’ll need to meet to authorize the connector to access its data. For example, certain data sources may require you to create an OAuth application, or create a service account. You’ll need to check the [individual connector documentation](connector-reference.md) for these details. \ No newline at end of file +Note that each data source will have specific prerequisites you’ll need to meet to authorize the connector to access its data. For example, certain data sources may require you to create an OAuth application, or create a service account. You’ll need to check the [individual connector documentation](connector-reference.md) for these details. \ No newline at end of file diff --git a/docs/reference/text-analysis/analysis-edgengram-tokenizer.md b/docs/reference/text-analysis/analysis-edgengram-tokenizer.md index f53276722e78..b8b5b4af99df 100644 --- a/docs/reference/text-analysis/analysis-edgengram-tokenizer.md +++ b/docs/reference/text-analysis/analysis-edgengram-tokenizer.md @@ -59,12 +59,12 @@ See [Limitations of the `max_gram` parameter](#max-gram-limits). Character classes may be any of the following: - * `letter` —  for example `a`, `b`, `ï` or `京` - * `digit` —  for example `3` or `7` - * `whitespace` —  for example `" "` or `"\n"` - * `punctuation` — for example `!` or `"` - * `symbol` —  for example `$` or `√` - * `custom` —  custom characters which need to be set using the `custom_token_chars` setting. + * `letter` — for example `a`, `b`, `ï` or `京` + * `digit` — for example `3` or `7` + * `whitespace` — for example `" "` or `"\n"` + * `punctuation` — for example `!` or `"` + * `symbol` — for example `$` or `√` + * `custom` — custom characters which need to be set using the `custom_token_chars` setting. `custom_token_chars` diff --git a/docs/reference/text-analysis/analysis-ngram-tokenizer.md b/docs/reference/text-analysis/analysis-ngram-tokenizer.md index cf0e64d8f914..2af849d0568e 100644 --- a/docs/reference/text-analysis/analysis-ngram-tokenizer.md +++ b/docs/reference/text-analysis/analysis-ngram-tokenizer.md @@ -46,12 +46,12 @@ The `ngram` tokenizer accepts the following parameters: Character classes may be any of the following: - * `letter` —  for example `a`, `b`, `ï` or `京` - * `digit` —  for example `3` or `7` - * `whitespace` —  for example `" "` or `"\n"` - * `punctuation` — for example `!` or `"` - * `symbol` —  for example `$` or `√` - * `custom` —  custom characters which need to be set using the `custom_token_chars` setting. + * `letter` — for example `a`, `b`, `ï` or `京` + * `digit` — for example `3` or `7` + * `whitespace` — for example `" "` or `"\n"` + * `punctuation` — for example `!` or `"` + * `symbol` — for example `$` or `√` + * `custom` — custom characters which need to be set using the `custom_token_chars` setting. `custom_token_chars`