docs: 7:15 APM updates (#112775)

This commit is contained in:
Brandon Morelli 2021-09-23 11:52:10 -07:00 committed by GitHub
parent 1084ce7124
commit bc1c6dca47
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
21 changed files with 85 additions and 34 deletions

View file

@ -0,0 +1,32 @@
[role="xpack"]
[[dependencies]]
=== Dependencies
APM agents collect details about external calls made from instrumented services.
Sometimes, these external calls resolve into a downstream service that's instrumented -- in these cases,
you can utilize <<distributed-tracing,distributed tracing>> to drill down into problematic downstream services.
Other times, though, it's not possible to instrument a downstream dependency --
like with a database or third-party service.
**Dependencies** gives you a window into these uninstrumented, downstream dependencies.
[role="screenshot"]
image::apm/images/dependencies.png[Dependencies view in the APM app in Kibana]
Many application issues are caused by slow or unresponsive downstream dependencies.
And because a single, slow dependency can significantly impact the end-user experience,
it's important to be able to quickly identify these problems and determine the root cause.
Select a dependency to see detailed latency, throughput, and failed transaction rate metrics.
[role="screenshot"]
image::apm/images/dependencies-drilldown.png[Dependencies drilldown view in the APM app in Kibana]
When viewing a dependency, consider your pattern of usage with that dependency.
If your usage pattern _hasn't_ increased or decreased,
but the experience has been negatively effected -- either with an increase in latency or errors,
there's likely a problem with the dependency that needs to be addressed.
If your usage pattern _has_ changed, the dependency view can quickly show you whether
that pattern change exists in all upstream services, or just a subset of your services.
You might then start digging into traces coming from
impacted services to determine why that pattern change has occurred.

View file

@ -4,19 +4,21 @@
TIP: {apm-overview-ref-v}/errors.html[Errors] are groups of exceptions with a similar exception or log message.
The *Errors* overview provides a high-level view of the error message and culprit,
the number of occurrences, and the most recent occurrence.
Just like the transaction overview, you'll notice we group together like errors.
This makes it very easy to quickly see which errors are affecting your services,
The *Errors* overview provides a high-level view of the exceptions that APM agents catch,
or that users manually report with APM agent APIs.
Like errors are grouped together to make it easy to quickly see which errors are affecting your services,
and to take actions to rectify them.
A service returning a 5xx code from a request handler, controller, etc., will not create
an exception that an APM agent can catch, and will therefore not show up in this view.
[role="screenshot"]
image::apm/images/apm-errors-overview.png[Example view of the errors overview in the APM app in Kibana]
image::apm/images/apm-errors-overview.png[APM Errors overview]
Selecting an error group ID or error message brings you to the *Error group*.
[role="screenshot"]
image::apm/images/apm-error-group.png[Example view of the error group page in the APM app in Kibana]
image::apm/images/apm-error-group.png[APM Error group]
Here, you'll see the error message, culprit, and the number of occurrences over time.

View file

@ -29,6 +29,7 @@ start with:
* <<services>>
* <<traces>>
* <<dependencies>>
* <<service-maps>>
Notice something awry? Select a service or trace and dive deeper with:
@ -46,6 +47,8 @@ include::services.asciidoc[]
include::traces.asciidoc[]
include::dependencies.asciidoc[]
include::service-maps.asciidoc[]
include::service-overview.asciidoc[]

Binary file not shown.

Before

Width:  |  Height:  |  Size: 194 KiB

After

Width:  |  Height:  |  Size: 68 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 230 KiB

After

Width:  |  Height:  |  Size: 361 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 550 KiB

After

Width:  |  Height:  |  Size: 508 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 545 KiB

After

Width:  |  Height:  |  Size: 546 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 281 KiB

After

Width:  |  Height:  |  Size: 456 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 191 KiB

After

Width:  |  Height:  |  Size: 140 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 78 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 307 KiB

After

Width:  |  Height:  |  Size: 252 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 531 KiB

After

Width:  |  Height:  |  Size: 516 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 221 KiB

After

Width:  |  Height:  |  Size: 340 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 453 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 373 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 186 KiB

After

Width:  |  Height:  |  Size: 207 KiB

Before After
Before After

Binary file not shown.

Before

Width:  |  Height:  |  Size: 224 KiB

After

Width:  |  Height:  |  Size: 192 KiB

Before After
Before After

View file

@ -4,8 +4,7 @@
[partintro]
--
The APM app in {kib} is provided with the basic license.
It allows you to monitor your software services and applications in real-time;
The APM app in {kib} allows you to monitor your software services and applications in real-time;
visualize detailed performance information on your services,
identify and analyze errors,
and monitor host-level and agent-specific metrics like JVM and Go runtime metrics.

View file

@ -69,34 +69,43 @@ image::apm/images/traffic-transactions.png[Traffic and transactions]
[discrete]
[[service-error-rates]]
=== Error rate and errors
=== Failed transaction rate and errors
The *Error rate* chart displays the average error rates relating to the service, within a specific time range.
An HTTP response code greater than 400 does not necessarily indicate a failed transaction.
<<transaction-error-rate,Learn more>>.
The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
+
[TIP]
====
HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
because the failure was caused by the caller, not the HTTP server. Thus, `event.outcome=success` and there will be no increase in failed transaction rate.
HTTP **spans** from the client perspective however, are considered failures if the HTTP status code is ≥ 400.
These spans will set `event.outcome=failure` and increase the failed transaction rate.
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
====
The *Errors* table provides a high-level view of each error message when it first and last occurred,
along with the total number of occurrences. This makes it very easy to quickly see which errors affect
your services and take actions to rectify them. To do so, click *View errors*.
[role="screenshot"]
image::apm/images/error-rate.png[Error rate and errors]
image::apm/images/error-rate.png[failed transaction rate and errors]
[discrete]
[[service-span-duration]]
=== Span types average duration and dependencies
The *Average duration by span type* chart visualizes each span type's average duration and helps you determine
The *Time spent by span type* chart visualizes each span type's average duration and helps you determine
which spans could be slowing down transactions. The "app" label displayed under the
chart indicates that something was happening within the application. This could signal that the
agent does not have auto-instrumentation for whatever was happening during that time or that the time was spent in the
application code and not in database or external requests.
The *Dependencies* table displays a list of downstream services or external connections relevant
to the service at the selected time range. The table displays latency, traffic, error rate, and the impact of
to the service at the selected time range. The table displays latency, throughput, failed transaction rate, and the impact of
each dependency. By default, dependencies are sorted by _Impact_ to show the most used and the slowest dependency.
If there is a particular dependency you are interested in, click *View service map* to view the related
<<service-maps, service map>>.
If there is a particular dependency you are interested in, click *<<dependencies,View dependencies>>* to learn more about it.
NOTE: Displaying dependencies for services instrumented with the Real User Monitoring (RUM) agent
requires an agent version ≥ v5.6.3.
@ -106,11 +115,11 @@ image::apm/images/spans-dependencies.png[Span type duration and dependencies]
[discrete]
[[service-instances]]
=== All instances
=== Instances
The *All instances* table displays a list of all the available service instances within the selected time range.
Depending on how the service runs, the instance could be a host or a container. The table displays latency, traffic,
errors, CPU usage, and memory usage for each instance. By default, instances are sorted by _Throughput_.
The *Instances* table displays a list of all the available service instances within the selected time range.
Depending on how the service runs, the instance could be a host or a container. The table displays latency, throughput,
failed transaction, CPU usage, and memory usage for each instance. By default, instances are sorted by _Throughput_.
[role="screenshot"]
image::apm/images/all-instances.png[All instances]

View file

@ -8,7 +8,7 @@
APM is available via the navigation sidebar in {Kib}.
If you have not already installed and configured Elastic APM,
the *Setup Instructions* in Kibana will get you started.
the *Add data* page will get you started.
[role="screenshot"]
image::apm/images/apm-setup.png[Installation instructions on the APM page in Kibana]
@ -17,10 +17,9 @@ image::apm/images/apm-setup.png[Installation instructions on the APM page in Kib
[[apm-configure-index-pattern]]
=== Load the index pattern
Index patterns tell Kibana which Elasticsearch indices you want to explore.
Index patterns tell {kib} which {es} indices you want to explore.
An APM index pattern is necessary for certain features in the APM app, like the query bar.
To set up the correct index pattern,
simply click *Load Kibana objects* at the bottom of the Setup Instructions.
To set up the correct index pattern, on the *Add data* page, click *Load Kibana objects*.
[role="screenshot"]
image::apm/images/apm-index-pattern.png[Setup index pattern for APM in Kibana]

View file

@ -8,7 +8,7 @@ APM agents automatically collect performance metrics on HTTP requests, database
[role="screenshot"]
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]
The *Latency*, *transactions per minute*, *Error rate*, and *Average duration by span type*
The *Latency*, *transactions per minute*, *Failed transaction rate*, and *Average duration by span type*
charts display information on all transactions associated with the selected service:
*Latency*::
@ -23,17 +23,17 @@ Useful for determining if more responses than usual are being served with a part
Like in the latency graph, you can zoom in on anomalies to further investigate them.
[[transaction-error-rate]]
*Error rate*::
The error rate represents the percentage of failed transactions from the perspective of the selected service.
*Failed transaction rate*::
The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
+
[TIP]
====
HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
because the failure was caused by the caller, not the HTTP server. Thus, there will be no increase in error rate.
because the failure was caused by the caller, not the HTTP server. Thus, `event.outcome=success` and there will be no increase in failed transaction rate.
HTTP **spans** from the client perspective however, are considered failures if the HTTP status code is ≥ 400.
These spans will increase the error rate.
These spans will set `event.outcome=failure` and increase the failed transaction rate.
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
====
@ -97,7 +97,7 @@ This page is visually similar to the transaction overview, but it shows data fro
the selected transaction group.
[role="screenshot"]
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]
image::apm/images/apm-transactions-overview.png[Example view of response time distribution]
[[transaction-duration-distribution]]
==== Latency distribution
@ -110,10 +110,10 @@ It's the requests on the right, the ones taking longer than average, that we pro
[role="screenshot"]
image::apm/images/apm-transaction-duration-dist.png[Example view of latency distribution graph]
Select a latency duration _bucket_ to display up to ten trace samples.
Click and drag to select a latency duration _bucket_ to display up to 500 trace samples.
[[transaction-trace-sample]]
==== Trace sample
==== Trace samples
Trace samples are based on the _bucket_ selection in the *Latency distribution* chart;
update the samples by selecting a new _bucket_.
@ -167,4 +167,11 @@ and solve problems.
[role="screenshot"]
image::apm/images/apm-logs-tab.png[APM logs tab]
// To do: link to log correlation
[[transaction-latency-correlations]]
==== Correlations
Correlations surface attributes of your data that are potentially correlated with high-latency or erroneous transactions.
To learn more, see <<correlations>>.
[role="screenshot"]
image::apm/images/correlations-hover.png[APM lattency correlations]