mirror of
https://github.com/elastic/elasticsearch.git
synced 2025-06-29 09:54:06 -04:00
Edits to text of Profile API documentation (#38742)
Minor edits of text.
This commit is contained in:
parent
c9f0ca00d8
commit
703908ad7f
1 changed files with 24 additions and 24 deletions
|
@ -204,16 +204,16 @@ by a unique ID
|
|||
|
||||
Because a search request may be executed against one or more shards in an index, and a search may cover
|
||||
one or more indices, the top level element in the profile response is an array of `shard` objects.
|
||||
Each shard object lists it's `id` which uniquely identifies the shard. The ID's format is
|
||||
Each shard object lists its `id` which uniquely identifies the shard. The ID's format is
|
||||
`[nodeID][indexName][shardID]`.
|
||||
|
||||
The profile itself may consist of one or more "searches", where a search is a query executed against the underlying
|
||||
Lucene index. Most Search Requests submitted by the user will only execute a single `search` against the Lucene index.
|
||||
Lucene index. Most search requests submitted by the user will only execute a single `search` against the Lucene index.
|
||||
But occasionally multiple searches will be executed, such as including a global aggregation (which needs to execute
|
||||
a secondary "match_all" query for the global context).
|
||||
|
||||
Inside each `search` object there will be two arrays of profiled information:
|
||||
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc
|
||||
a `query` array and a `collector` array. Alongside the `search` object is an `aggregations` object that contains the profile information for the aggregations. In the future, more sections may be added, such as `suggest`, `highlight`, etc.
|
||||
|
||||
There will also be a `rewrite` metric showing the total time spent rewriting the query (in nanoseconds).
|
||||
|
||||
|
@ -344,12 +344,12 @@ The meaning of the stats are as follows:
|
|||
`build_scorer`::
|
||||
|
||||
This parameter shows how long it takes to build a Scorer for the query. A Scorer is the mechanism that
|
||||
iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?).
|
||||
iterates over matching documents and generates a score per-document (e.g. how well does "foo" match the document?).
|
||||
Note, this records the time required to generate the Scorer object, not actually score the documents. Some
|
||||
queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc.
|
||||
{empty} +
|
||||
{empty} +
|
||||
This may also showing timing associated with caching, if enabled and/or applicable for the query
|
||||
This may also show timing associated with caching, if enabled and/or applicable for the query
|
||||
|
||||
`next_doc`::
|
||||
|
||||
|
@ -369,7 +369,7 @@ The meaning of the stats are as follows:
|
|||
|
||||
`matches`::
|
||||
|
||||
Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is
|
||||
Some queries, such as phrase queries, match documents using a "two-phase" process. First, the document is
|
||||
"approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous
|
||||
(and expensive) process. The second phase verification is what the `matches` statistic measures.
|
||||
{empty} +
|
||||
|
@ -384,7 +384,7 @@ The meaning of the stats are as follows:
|
|||
|
||||
`score`::
|
||||
|
||||
This records the time taken to score a particular document via it's Scorer
|
||||
This records the time taken to score a particular document via its Scorer
|
||||
|
||||
`*_count`::
|
||||
Records the number of invocations of the particular method. For example, `"next_doc_count": 2,`
|
||||
|
@ -394,7 +394,7 @@ The meaning of the stats are as follows:
|
|||
==== `collectors` Section
|
||||
|
||||
The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector"
|
||||
which is responsible for coordinating the traversal, scoring and collection of matching documents. Collectors
|
||||
which is responsible for coordinating the traversal, scoring, and collection of matching documents. Collectors
|
||||
are also how a single query can record aggregation results, execute unscoped "global" queries, execute post-query
|
||||
filters, etc.
|
||||
|
||||
|
@ -422,16 +422,16 @@ Looking at the previous example:
|
|||
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
|
||||
|
||||
We see a single collector named `SimpleTopScoreDocCollector` wrapped into `CancellableCollector`. `SimpleTopScoreDocCollector` is the default "scoring and sorting"
|
||||
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain english description of the class name. The
|
||||
`Collector` used by Elasticsearch. The `reason` field attempts to give a plain English description of the class name. The
|
||||
`time_in_nanos` is similar to the time in the Query tree: a wall-clock time inclusive of all children. Similarly, `children` lists
|
||||
all sub-collectors. The `CancellableCollector` that wraps `SimpleTopScoreDocCollector` is used by Elasticsearch to detect if the current
|
||||
search was cancelled and stop collecting documents as soon as it occurs.
|
||||
|
||||
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined
|
||||
It should be noted that Collector times are **independent** from the Query times. They are calculated, combined,
|
||||
and normalized independently! Due to the nature of Lucene's execution, it is impossible to "merge" the times
|
||||
from the Collectors into the Query section, so they are displayed in separate portions.
|
||||
|
||||
For reference, the various collector reason's are:
|
||||
For reference, the various collector reasons are:
|
||||
|
||||
[horizontal]
|
||||
`search_sorted`::
|
||||
|
@ -457,7 +457,7 @@ For reference, the various collector reason's are:
|
|||
`search_multi`::
|
||||
|
||||
A collector that wraps several other collectors. This is seen when combinations of search, aggregations,
|
||||
global aggs and post_filters are combined in a single search.
|
||||
global aggs, and post_filters are combined in a single search.
|
||||
|
||||
`search_timeout`::
|
||||
|
||||
|
@ -473,7 +473,7 @@ For reference, the various collector reason's are:
|
|||
`global_aggregation`::
|
||||
|
||||
A collector that executes an aggregation against the global query scope, rather than the specified query.
|
||||
Because the global scope is necessarily different from the executed query, it must execute it's own
|
||||
Because the global scope is necessarily different from the executed query, it must execute its own
|
||||
match_all query (which you will see added to the Query section) to collect your entire dataset
|
||||
|
||||
|
||||
|
@ -648,9 +648,9 @@ And the response:
|
|||
// TESTRESPONSE[s/\.\.\.//]
|
||||
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
|
||||
// TESTRESPONSE[s/"id": "\[P6-vulHtQRWuD4YnubWb7A\]\[test\]\[0\]"/"id": $body.profile.shards.0.id/]
|
||||
<1> The ``"aggregations"` portion has been omitted because it will be covered in the next section
|
||||
<1> The `"aggregations"` portion has been omitted because it will be covered in the next section
|
||||
|
||||
As you can see, the output is significantly verbose from before. All the major portions of the query are
|
||||
As you can see, the output is significantly more verbose than before. All the major portions of the query are
|
||||
represented:
|
||||
|
||||
1. The first `TermQuery` (user:test) represents the main `term` query
|
||||
|
@ -662,14 +662,14 @@ The Collector tree is fairly straightforward, showing how a single CancellableCo
|
|||
|
||||
==== Understanding MultiTermQuery output
|
||||
|
||||
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex and fuzzy
|
||||
A special note needs to be made about the `MultiTermQuery` class of queries. This includes wildcards, regex, and fuzzy
|
||||
queries. These queries emit very verbose responses, and are not overly structured.
|
||||
|
||||
Essentially, these queries rewrite themselves on a per-segment basis. If you imagine the wildcard query `b*`, it technically
|
||||
can match any token that begins with the letter "b". It would be impossible to enumerate all possible combinations,
|
||||
so Lucene rewrites the query in context of the segment being evaluated. E.g. one segment may contain the tokens
|
||||
so Lucene rewrites the query in context of the segment being evaluated, e.g., one segment may contain the tokens
|
||||
`[bar, baz]`, so the query rewrites to a BooleanQuery combination of "bar" and "baz". Another segment may only have the
|
||||
token `[bakery]`, so query rewrites to a single TermQuery for "bakery".
|
||||
token `[bakery]`, so the query rewrites to a single TermQuery for "bakery".
|
||||
|
||||
Due to this dynamic, per-segment rewriting, the clean tree structure becomes distorted and no longer follows a clean
|
||||
"lineage" showing how one query rewrites into the next. At present time, all we can do is apologize, and suggest you
|
||||
|
@ -729,7 +729,7 @@ GET /twitter/_search
|
|||
// TEST[s/_search/_search\?filter_path=profile.shards.aggregations/]
|
||||
// TEST[continued]
|
||||
|
||||
Which yields the following aggregation profile output
|
||||
This yields the following aggregation profile output:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -797,7 +797,7 @@ Which yields the following aggregation profile output
|
|||
|
||||
From the profile structure we can see that the `my_scoped_agg` is internally being run as a `LongTermsAggregator` (because the field it is
|
||||
aggregating, `likes`, is a numeric field). At the same level, we see a `GlobalAggregator` which comes from `my_global_agg`. That
|
||||
aggregation then has a child `LongTermsAggregator` which from the second terms aggregation on `likes`.
|
||||
aggregation then has a child `LongTermsAggregator` which comes from the second term's aggregation on `likes`.
|
||||
|
||||
The `time_in_nanos` field shows the time executed by each aggregation, and is inclusive of all children. While the overall time is useful,
|
||||
the `breakdown` field will give detailed stats about how the time was spent.
|
||||
|
@ -859,7 +859,7 @@ The meaning of the stats are as follows:
|
|||
==== Performance Notes
|
||||
|
||||
Like any profiler, the Profile API introduces a non-negligible overhead to search execution. The act of instrumenting
|
||||
low-level method calls such as `collect`, `advance` and `next_doc` can be fairly expensive, since these methods are called
|
||||
low-level method calls such as `collect`, `advance`, and `next_doc` can be fairly expensive, since these methods are called
|
||||
in tight loops. Therefore, profiling should not be enabled in production settings by default, and should not
|
||||
be compared against non-profiled query times. Profiling is just a diagnostic tool.
|
||||
|
||||
|
@ -871,11 +871,11 @@ not have a drastic effect compared to other components in the profiled query.
|
|||
==== Limitations
|
||||
|
||||
- Profiling currently does not measure the search fetch phase nor the network overhead
|
||||
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node or
|
||||
additional work like e.g. building global ordinals (an internal data structure used to speed up search)
|
||||
- Profiling also does not account for time spent in the queue, merging shard responses on the coordinating node, or
|
||||
additional work such as building global ordinals (an internal data structure used to speed up search)
|
||||
- Profiling statistics are currently not available for suggestions, highlighting, `dfs_query_then_fetch`
|
||||
- Profiling of the reduce phase of aggregation is currently not available
|
||||
- The Profiler is still highly experimental. The Profiler is instrumenting parts of Lucene that were
|
||||
never designed to be exposed in this manner, and so all results should be viewed as a best effort to provide detailed
|
||||
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures or
|
||||
diagnostics. We hope to improve this over time. If you find obviously wrong numbers, strange query structures, or
|
||||
other bugs, please report them!
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue