docs: 7:15 APM updates (#112775)
32
docs/apm/dependencies.asciidoc
Normal file
|
@ -0,0 +1,32 @@
|
|||
[role="xpack"]
|
||||
[[dependencies]]
|
||||
=== Dependencies
|
||||
|
||||
APM agents collect details about external calls made from instrumented services.
|
||||
Sometimes, these external calls resolve into a downstream service that's instrumented -- in these cases,
|
||||
you can utilize <<distributed-tracing,distributed tracing>> to drill down into problematic downstream services.
|
||||
Other times, though, it's not possible to instrument a downstream dependency --
|
||||
like with a database or third-party service.
|
||||
**Dependencies** gives you a window into these uninstrumented, downstream dependencies.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/dependencies.png[Dependencies view in the APM app in Kibana]
|
||||
|
||||
Many application issues are caused by slow or unresponsive downstream dependencies.
|
||||
And because a single, slow dependency can significantly impact the end-user experience,
|
||||
it's important to be able to quickly identify these problems and determine the root cause.
|
||||
|
||||
Select a dependency to see detailed latency, throughput, and failed transaction rate metrics.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/dependencies-drilldown.png[Dependencies drilldown view in the APM app in Kibana]
|
||||
|
||||
When viewing a dependency, consider your pattern of usage with that dependency.
|
||||
If your usage pattern _hasn't_ increased or decreased,
|
||||
but the experience has been negatively effected -- either with an increase in latency or errors,
|
||||
there's likely a problem with the dependency that needs to be addressed.
|
||||
|
||||
If your usage pattern _has_ changed, the dependency view can quickly show you whether
|
||||
that pattern change exists in all upstream services, or just a subset of your services.
|
||||
You might then start digging into traces coming from
|
||||
impacted services to determine why that pattern change has occurred.
|
|
@ -4,19 +4,21 @@
|
|||
|
||||
TIP: {apm-overview-ref-v}/errors.html[Errors] are groups of exceptions with a similar exception or log message.
|
||||
|
||||
The *Errors* overview provides a high-level view of the error message and culprit,
|
||||
the number of occurrences, and the most recent occurrence.
|
||||
Just like the transaction overview, you'll notice we group together like errors.
|
||||
This makes it very easy to quickly see which errors are affecting your services,
|
||||
The *Errors* overview provides a high-level view of the exceptions that APM agents catch,
|
||||
or that users manually report with APM agent APIs.
|
||||
Like errors are grouped together to make it easy to quickly see which errors are affecting your services,
|
||||
and to take actions to rectify them.
|
||||
|
||||
A service returning a 5xx code from a request handler, controller, etc., will not create
|
||||
an exception that an APM agent can catch, and will therefore not show up in this view.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-errors-overview.png[Example view of the errors overview in the APM app in Kibana]
|
||||
image::apm/images/apm-errors-overview.png[APM Errors overview]
|
||||
|
||||
Selecting an error group ID or error message brings you to the *Error group*.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-error-group.png[Example view of the error group page in the APM app in Kibana]
|
||||
image::apm/images/apm-error-group.png[APM Error group]
|
||||
|
||||
Here, you'll see the error message, culprit, and the number of occurrences over time.
|
||||
|
||||
|
|
|
@ -29,6 +29,7 @@ start with:
|
|||
|
||||
* <<services>>
|
||||
* <<traces>>
|
||||
* <<dependencies>>
|
||||
* <<service-maps>>
|
||||
|
||||
Notice something awry? Select a service or trace and dive deeper with:
|
||||
|
@ -46,6 +47,8 @@ include::services.asciidoc[]
|
|||
|
||||
include::traces.asciidoc[]
|
||||
|
||||
include::dependencies.asciidoc[]
|
||||
|
||||
include::service-maps.asciidoc[]
|
||||
|
||||
include::service-overview.asciidoc[]
|
||||
|
|
Before Width: | Height: | Size: 194 KiB After Width: | Height: | Size: 68 KiB |
Before Width: | Height: | Size: 230 KiB After Width: | Height: | Size: 361 KiB |
Before Width: | Height: | Size: 550 KiB After Width: | Height: | Size: 508 KiB |
Before Width: | Height: | Size: 545 KiB After Width: | Height: | Size: 546 KiB |
Before Width: | Height: | Size: 281 KiB After Width: | Height: | Size: 456 KiB |
Before Width: | Height: | Size: 191 KiB After Width: | Height: | Size: 140 KiB |
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 78 KiB |
Before Width: | Height: | Size: 307 KiB After Width: | Height: | Size: 252 KiB |
Before Width: | Height: | Size: 531 KiB After Width: | Height: | Size: 516 KiB |
Before Width: | Height: | Size: 221 KiB After Width: | Height: | Size: 340 KiB |
BIN
docs/apm/images/dependencies-drilldown.png
Normal file
After Width: | Height: | Size: 453 KiB |
BIN
docs/apm/images/dependencies.png
Normal file
After Width: | Height: | Size: 373 KiB |
Before Width: | Height: | Size: 186 KiB After Width: | Height: | Size: 207 KiB |
Before Width: | Height: | Size: 224 KiB After Width: | Height: | Size: 192 KiB |
|
@ -4,8 +4,7 @@
|
|||
|
||||
[partintro]
|
||||
--
|
||||
The APM app in {kib} is provided with the basic license.
|
||||
It allows you to monitor your software services and applications in real-time;
|
||||
The APM app in {kib} allows you to monitor your software services and applications in real-time;
|
||||
visualize detailed performance information on your services,
|
||||
identify and analyze errors,
|
||||
and monitor host-level and agent-specific metrics like JVM and Go runtime metrics.
|
||||
|
|
|
@ -69,34 +69,43 @@ image::apm/images/traffic-transactions.png[Traffic and transactions]
|
|||
|
||||
[discrete]
|
||||
[[service-error-rates]]
|
||||
=== Error rate and errors
|
||||
=== Failed transaction rate and errors
|
||||
|
||||
The *Error rate* chart displays the average error rates relating to the service, within a specific time range.
|
||||
An HTTP response code greater than 400 does not necessarily indicate a failed transaction.
|
||||
<<transaction-error-rate,Learn more>>.
|
||||
The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
|
||||
It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
|
||||
+
|
||||
[TIP]
|
||||
====
|
||||
HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
|
||||
because the failure was caused by the caller, not the HTTP server. Thus, `event.outcome=success` and there will be no increase in failed transaction rate.
|
||||
|
||||
HTTP **spans** from the client perspective however, are considered failures if the HTTP status code is ≥ 400.
|
||||
These spans will set `event.outcome=failure` and increase the failed transaction rate.
|
||||
|
||||
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
|
||||
====
|
||||
|
||||
The *Errors* table provides a high-level view of each error message when it first and last occurred,
|
||||
along with the total number of occurrences. This makes it very easy to quickly see which errors affect
|
||||
your services and take actions to rectify them. To do so, click *View errors*.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/error-rate.png[Error rate and errors]
|
||||
image::apm/images/error-rate.png[failed transaction rate and errors]
|
||||
|
||||
[discrete]
|
||||
[[service-span-duration]]
|
||||
=== Span types average duration and dependencies
|
||||
|
||||
The *Average duration by span type* chart visualizes each span type's average duration and helps you determine
|
||||
The *Time spent by span type* chart visualizes each span type's average duration and helps you determine
|
||||
which spans could be slowing down transactions. The "app" label displayed under the
|
||||
chart indicates that something was happening within the application. This could signal that the
|
||||
agent does not have auto-instrumentation for whatever was happening during that time or that the time was spent in the
|
||||
application code and not in database or external requests.
|
||||
|
||||
The *Dependencies* table displays a list of downstream services or external connections relevant
|
||||
to the service at the selected time range. The table displays latency, traffic, error rate, and the impact of
|
||||
to the service at the selected time range. The table displays latency, throughput, failed transaction rate, and the impact of
|
||||
each dependency. By default, dependencies are sorted by _Impact_ to show the most used and the slowest dependency.
|
||||
If there is a particular dependency you are interested in, click *View service map* to view the related
|
||||
<<service-maps, service map>>.
|
||||
If there is a particular dependency you are interested in, click *<<dependencies,View dependencies>>* to learn more about it.
|
||||
|
||||
NOTE: Displaying dependencies for services instrumented with the Real User Monitoring (RUM) agent
|
||||
requires an agent version ≥ v5.6.3.
|
||||
|
@ -106,11 +115,11 @@ image::apm/images/spans-dependencies.png[Span type duration and dependencies]
|
|||
|
||||
[discrete]
|
||||
[[service-instances]]
|
||||
=== All instances
|
||||
=== Instances
|
||||
|
||||
The *All instances* table displays a list of all the available service instances within the selected time range.
|
||||
Depending on how the service runs, the instance could be a host or a container. The table displays latency, traffic,
|
||||
errors, CPU usage, and memory usage for each instance. By default, instances are sorted by _Throughput_.
|
||||
The *Instances* table displays a list of all the available service instances within the selected time range.
|
||||
Depending on how the service runs, the instance could be a host or a container. The table displays latency, throughput,
|
||||
failed transaction, CPU usage, and memory usage for each instance. By default, instances are sorted by _Throughput_.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/all-instances.png[All instances]
|
||||
|
|
|
@ -8,7 +8,7 @@
|
|||
|
||||
APM is available via the navigation sidebar in {Kib}.
|
||||
If you have not already installed and configured Elastic APM,
|
||||
the *Setup Instructions* in Kibana will get you started.
|
||||
the *Add data* page will get you started.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-setup.png[Installation instructions on the APM page in Kibana]
|
||||
|
@ -17,10 +17,9 @@ image::apm/images/apm-setup.png[Installation instructions on the APM page in Kib
|
|||
[[apm-configure-index-pattern]]
|
||||
=== Load the index pattern
|
||||
|
||||
Index patterns tell Kibana which Elasticsearch indices you want to explore.
|
||||
Index patterns tell {kib} which {es} indices you want to explore.
|
||||
An APM index pattern is necessary for certain features in the APM app, like the query bar.
|
||||
To set up the correct index pattern,
|
||||
simply click *Load Kibana objects* at the bottom of the Setup Instructions.
|
||||
To set up the correct index pattern, on the *Add data* page, click *Load Kibana objects*.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-index-pattern.png[Setup index pattern for APM in Kibana]
|
||||
|
|
|
@ -8,7 +8,7 @@ APM agents automatically collect performance metrics on HTTP requests, database
|
|||
[role="screenshot"]
|
||||
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]
|
||||
|
||||
The *Latency*, *transactions per minute*, *Error rate*, and *Average duration by span type*
|
||||
The *Latency*, *transactions per minute*, *Failed transaction rate*, and *Average duration by span type*
|
||||
charts display information on all transactions associated with the selected service:
|
||||
|
||||
*Latency*::
|
||||
|
@ -23,17 +23,17 @@ Useful for determining if more responses than usual are being served with a part
|
|||
Like in the latency graph, you can zoom in on anomalies to further investigate them.
|
||||
|
||||
[[transaction-error-rate]]
|
||||
*Error rate*::
|
||||
The error rate represents the percentage of failed transactions from the perspective of the selected service.
|
||||
*Failed transaction rate*::
|
||||
The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
|
||||
It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
|
||||
+
|
||||
[TIP]
|
||||
====
|
||||
HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
|
||||
because the failure was caused by the caller, not the HTTP server. Thus, there will be no increase in error rate.
|
||||
because the failure was caused by the caller, not the HTTP server. Thus, `event.outcome=success` and there will be no increase in failed transaction rate.
|
||||
|
||||
HTTP **spans** from the client perspective however, are considered failures if the HTTP status code is ≥ 400.
|
||||
These spans will increase the error rate.
|
||||
These spans will set `event.outcome=failure` and increase the failed transaction rate.
|
||||
|
||||
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
|
||||
====
|
||||
|
@ -97,7 +97,7 @@ This page is visually similar to the transaction overview, but it shows data fro
|
|||
the selected transaction group.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]
|
||||
image::apm/images/apm-transactions-overview.png[Example view of response time distribution]
|
||||
|
||||
[[transaction-duration-distribution]]
|
||||
==== Latency distribution
|
||||
|
@ -110,10 +110,10 @@ It's the requests on the right, the ones taking longer than average, that we pro
|
|||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-duration-dist.png[Example view of latency distribution graph]
|
||||
|
||||
Select a latency duration _bucket_ to display up to ten trace samples.
|
||||
Click and drag to select a latency duration _bucket_ to display up to 500 trace samples.
|
||||
|
||||
[[transaction-trace-sample]]
|
||||
==== Trace sample
|
||||
==== Trace samples
|
||||
|
||||
Trace samples are based on the _bucket_ selection in the *Latency distribution* chart;
|
||||
update the samples by selecting a new _bucket_.
|
||||
|
@ -167,4 +167,11 @@ and solve problems.
|
|||
[role="screenshot"]
|
||||
image::apm/images/apm-logs-tab.png[APM logs tab]
|
||||
|
||||
// To do: link to log correlation
|
||||
[[transaction-latency-correlations]]
|
||||
==== Correlations
|
||||
|
||||
Correlations surface attributes of your data that are potentially correlated with high-latency or erroneous transactions.
|
||||
To learn more, see <<correlations>>.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/correlations-hover.png[APM lattency correlations]
|
||||
|
|