[docs] 7.10 APM docs updates (#80605)
|
@ -18,12 +18,22 @@ image::apm/images/apm-alert.png[Create an alert in the APM app]
|
|||
For a walkthrough of the alert flyout panel, including detailed information on each configurable property,
|
||||
see Kibana's <<defining-alerts,defining alerts>>.
|
||||
|
||||
The APM app supports two different types of threshold alerts: transaction duration, and error rate.
|
||||
Below, we'll create one of each.
|
||||
The APM app supports four different types of alerts:
|
||||
|
||||
* Transaction duration anomaly:
|
||||
alerts when the service's transaction duration reaches a certain anomaly score
|
||||
* Transaction duration threshold:
|
||||
alerts when the service's transaction duration exceeds a given time limit over a given time frame
|
||||
* Transaction error rate threshold:
|
||||
alerts when the service's transaction error rate is above the selected rate over a given time frame
|
||||
* Error count threshold:
|
||||
alerts when service exceeds a selected number of errors over a given time frame
|
||||
|
||||
Below, we'll walk through the creation of two of these alerts.
|
||||
|
||||
[float]
|
||||
[[apm-create-transaction-alert]]
|
||||
=== Create a transaction duration alert
|
||||
=== Example: create a transaction duration alert
|
||||
|
||||
Transaction duration alerts trigger when the duration of a specific transaction type in a service exceeds a defined threshold.
|
||||
This guide will create an alert for the `opbeans-java` service based on the following criteria:
|
||||
|
@ -57,9 +67,9 @@ Enter a name for the connector,
|
|||
and paste the webhook URL.
|
||||
See Slack's webhook documentation if you need to create one.
|
||||
|
||||
Add a message body in markdown format.
|
||||
A default message is provided as a starting point for your alert.
|
||||
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
|
||||
to pass alert values at the time a condition is detected to an action.
|
||||
to pass additional alert values at the time a condition is detected to an action.
|
||||
A list of available variables can be accessed by selecting the
|
||||
**add variable** button image:apm/images/add-variable.png[add variable button].
|
||||
|
||||
|
@ -67,7 +77,7 @@ Select **Save**. The alert has been created and is now active!
|
|||
|
||||
[float]
|
||||
[[apm-create-error-alert]]
|
||||
=== Create an error rate alert
|
||||
=== Example: create an error rate alert
|
||||
|
||||
Error rate alerts trigger when the number of errors in a service exceeds a defined threshold.
|
||||
This guide creates an alert for the `opbeans-python` service based on the following criteria:
|
||||
|
@ -94,9 +104,9 @@ Based on the alert criteria, define the following alert details:
|
|||
Select the **Email** action type and click **Create a connector**.
|
||||
Fill out the required details: sender, host, port, etc., and click **save**.
|
||||
|
||||
Add a message body in markdown format.
|
||||
A default message is provided as a starting point for your alert.
|
||||
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
|
||||
to pass alert values at the time a condition is detected to an action.
|
||||
to pass additional alert values at the time a condition is detected to an action.
|
||||
A list of available variables can be accessed by selecting the
|
||||
**add variable** button image:apm/images/add-variable.png[add variable button].
|
||||
|
||||
|
|
|
@ -69,7 +69,7 @@ the host filter will still be applied.
|
|||
|
||||
These filters are very useful for quickly and easily removing noise from your data.
|
||||
With just a click, you can filter your transactions by the transaction result,
|
||||
host, container ID, and more.
|
||||
host, container ID, Kubernetes pod, and more.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/local-filter.png[Local filters available in the APM app in Kibana]
|
Before Width: | Height: | Size: 568 KiB After Width: | Height: | Size: 448 KiB |
Before Width: | Height: | Size: 216 KiB After Width: | Height: | Size: 230 KiB |
Before Width: | Height: | Size: 294 KiB After Width: | Height: | Size: 301 KiB |
Before Width: | Height: | Size: 290 KiB After Width: | Height: | Size: 288 KiB |
Before Width: | Height: | Size: 1.5 MiB After Width: | Height: | Size: 366 KiB |
Before Width: | Height: | Size: 182 KiB After Width: | Height: | Size: 220 KiB |
Before Width: | Height: | Size: 480 KiB After Width: | Height: | Size: 485 KiB |
Before Width: | Height: | Size: 477 KiB After Width: | Height: | Size: 590 KiB |
Before Width: | Height: | Size: 265 KiB After Width: | Height: | Size: 385 KiB |
Before Width: | Height: | Size: 231 KiB After Width: | Height: | Size: 185 KiB |
Before Width: | Height: | Size: 312 KiB After Width: | Height: | Size: 373 KiB |
Before Width: | Height: | Size: 426 KiB After Width: | Height: | Size: 363 KiB |
Before Width: | Height: | Size: 256 KiB After Width: | Height: | Size: 168 KiB |
Before Width: | Height: | Size: 435 KiB After Width: | Height: | Size: 394 KiB |
Before Width: | Height: | Size: 249 KiB After Width: | Height: | Size: 224 KiB |
Before Width: | Height: | Size: 159 KiB After Width: | Height: | Size: 152 KiB |
Before Width: | Height: | Size: 266 KiB After Width: | Height: | Size: 355 KiB |
Before Width: | Height: | Size: 247 KiB After Width: | Height: | Size: 357 KiB |
Before Width: | Height: | Size: 524 KiB After Width: | Height: | Size: 584 KiB |
Before Width: | Height: | Size: 352 KiB After Width: | Height: | Size: 549 KiB |
BIN
docs/apm/images/service-quick-health.png
Normal file
After Width: | Height: | Size: 179 KiB |
Before Width: | Height: | Size: 212 KiB After Width: | Height: | Size: 219 KiB |
|
@ -14,7 +14,12 @@ Machine learning jobs are created per environment, and are based on a service's
|
|||
Because jobs are created at the environment level,
|
||||
you can add new services to your existing environments without the need for additional machine learning jobs.
|
||||
|
||||
After a machine learning job is created, results are shown in two places:
|
||||
Results from machine learning jobs are shown in multiple places throughout the APM app:
|
||||
|
||||
* The **Services overview** provides a quick-glance view of the general health of all of your services.
|
||||
+
|
||||
[role="screenshot"]
|
||||
image::apm/images/service-quick-health.png[Example view of anomaly scores on response times in the APM app]
|
||||
|
||||
* The transaction duration chart will show the expected bounds and add an annotation when the anomaly score is 75 or above.
|
||||
+
|
||||
|
|
|
@ -33,7 +33,7 @@ distributed tracing will not work, and the connection will not be drawn on the m
|
|||
Select the **Service Map** tab to get started.
|
||||
By default, all instrumented services and connections are shown.
|
||||
Whether you're onboarding a new engineer, or just trying to grasp the big picture,
|
||||
click around, zoom in and out, and begin to visualize how your services are connected.
|
||||
drag things around, zoom in and out, and begin to visualize how your services are connected.
|
||||
|
||||
If there's a specific service that interests you, select that service to highlight its connections.
|
||||
Clicking **Focus map** will refocus the map on that specific service and lock the connection highlighting.
|
||||
|
|
|
@ -2,8 +2,13 @@
|
|||
[[services]]
|
||||
=== Services overview
|
||||
|
||||
The *Services* overview gives you quick insights into the health and general performance of all of your instrumented services.
|
||||
Services are sorted by the `service.name` configured in each of the {apm-agents-ref}[APM agents] you’ve installed.
|
||||
The *Services* overview page provides a quick, high-level overview of the health and general
|
||||
performance of all instrumented services.
|
||||
|
||||
To help surface potential issues, services are sorted by their health status:
|
||||
**critical** > **warning** > **healthy** > **unknown**.
|
||||
Health status is powered by machine learning and requires anomaly detection to be enabled.
|
||||
Learn more in <<machine-learning-integration,machine learning>>.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
|
||||
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
=== Trace sample timeline
|
||||
|
||||
The trace sample timeline visualization is a bird's-eye view of what your application was doing while it was trying to respond to a request.
|
||||
This makes it useful for visualizing where the selected transaction spent most of its time.
|
||||
This makes it useful for visualizing where a selected transaction spent most of its time.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-sample.png[Example of distributed trace colors in the APM app in Kibana]
|
||||
|
@ -43,9 +43,12 @@ this makes finding possible bottlenecks throughout your application much easier
|
|||
image::apm/images/apm-distributed-tracing.png[Example view of the distributed tracing in APM app in Kibana]
|
||||
|
||||
Don't forget; by definition, a distributed trace includes more than one transaction.
|
||||
When viewing these distributed traces in the timeline waterfall, you'll see this image:apm/images/transaction-icon.png[APM icon] icon,
|
||||
When viewing distributed traces in the timeline waterfall,
|
||||
you'll see this icon: image:apm/images/transaction-icon.png[APM icon],
|
||||
which indicates the next transaction in the trace.
|
||||
These transactions can be expanded and viewed in detail by clicking on them.
|
||||
For easier problem isolation, transactions can be collapsed in the waterfall by clicking
|
||||
the icon to the left of the transactions.
|
||||
Transactions can also be expanded and viewed in detail by clicking on them.
|
||||
|
||||
After exploring these traces,
|
||||
you can return to the full trace by clicking *View full trace*.
|
||||
|
|
|
@ -7,7 +7,8 @@ and which services were part of it.
|
|||
In addition to the Traces overview, you can view your application traces in the <<spans,trace sample timeline waterfall>>.
|
||||
|
||||
The *Traces* overview displays the entry transaction for all traces in your application.
|
||||
If you're using <<distributed-tracing>>, this view is key to finding the critical paths within your application.
|
||||
If you're using <<distributed-tracing,distributed tracing>>,
|
||||
this view is key to finding the critical paths within your application.
|
||||
Transactions with the same name are grouped together and only shown once in this table.
|
||||
|
||||
By default, transactions are sorted by _Impact_.
|
||||
|
|
|
@ -10,7 +10,24 @@ Selecting a <<services,*service*>> brings you to the *transactions* overview.
|
|||
[role="screenshot"]
|
||||
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]
|
||||
|
||||
The *time spent by span type*, *transaction duration*, and *requests per minute* chart display information on all transactions associated with the selected service:
|
||||
The *transaction duration*, *transactions per minute*, *transaction error rate*, and *time spent by span type*
|
||||
charts display information on all transactions associated with the selected service:
|
||||
|
||||
*Transaction duration*::
|
||||
Response times for this service, broken down into average, 95th, and 99th percentile.
|
||||
If there's a weird spike that you'd like to investigate,
|
||||
you can simply zoom in on the graph - this will adjust the specific time range,
|
||||
and all of the data on the page will update accordingly.
|
||||
|
||||
*Transactions per minute*::
|
||||
Visualize response codes: `2xx`, `3xx`, `4xx`, etc.,
|
||||
and is useful for determining if you're serving more of one code than you typically do.
|
||||
Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.
|
||||
|
||||
*Transaction error rate*::
|
||||
Visualize the total number of transactions with errors divided by the total number of transactions.
|
||||
Any unexpected increases, decreases, or irregular patterns can be investigated further
|
||||
with the <<errors,errors overview>>.
|
||||
|
||||
*Time spent by span type*::
|
||||
Visualize where your application is spending most of its time.
|
||||
|
@ -22,17 +39,6 @@ This could be a sign that the agent does not have auto-instrumentation for whate
|
|||
+
|
||||
It's important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.
|
||||
|
||||
*Transaction duration*::
|
||||
Response times for this service, broken down into average, 95th, and 99th percentile.
|
||||
If there's a weird spike that you'd like to investigate,
|
||||
you can simply zoom in on the graph - this will adjust the specific time range,
|
||||
and all of the data on the page will update accordingly.
|
||||
|
||||
*Requests per minute*::
|
||||
Visualize response codes: `2xx`, `3xx`, `4xx`, etc.,
|
||||
and is useful for determining if you're serving more of one code than you typically do.
|
||||
Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.
|
||||
|
||||
[[transactions-table]]
|
||||
==== Transactions table
|
||||
|
||||
|
@ -61,42 +67,45 @@ refer to the documentation for each {apm-agents-ref}[APM Agent] you've implement
|
|||
==== RUM Transaction overview
|
||||
|
||||
The transaction overview page is customized for the JavaScript RUM Agent.
|
||||
This page highlights things like *page load times*, *transactions per minute*, and even the *average page load duration distribution by country*.
|
||||
Specifically, the page highlights *page load times* for your service:
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-geo-ui.png[average page load duration distribution]
|
||||
|
||||
This data is available due to the geo-ip and user agent pipelines being enabled by default,
|
||||
which allows for the capture of geo-location and user agent data.
|
||||
These visualizations make it easy for you to visualize performance information about your
|
||||
end-users' experience based on their location.
|
||||
Additional RUM goodies, like core vitals, and visitor breakdown by browser, location, and device,
|
||||
are available in the Observability User Experience tab.
|
||||
// To do
|
||||
// Add link to the Observability UE docs when complete
|
||||
|
||||
[[transaction-details]]
|
||||
==== Transaction details
|
||||
|
||||
Selecting a transaction group will bring you to the *transaction* details.
|
||||
Transaction details include a high-level overview of the time spent by span type,
|
||||
transaction group duration, requests per minute, and transaction group duration distribution.
|
||||
It's important to note that all of these graphs show data from every transaction within the selected transaction group.
|
||||
This page is visually similar to the transaction overview, but it shows data from all transactions within
|
||||
the selected transaction group.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]
|
||||
|
||||
Up to ten sampled transactions are also displayed.
|
||||
These sampled transactions are based on your selection in the *Transactions duration distribution*.
|
||||
You can update the sampled transactions by selecting a new _bucket_ in the transactions duration distribution graph.
|
||||
The number of requests per bucket is displayed when hovering over the graph, and the selected bucket is highlighted to stand out.
|
||||
These sampled transactions are based on the _bucket_ selection in the *Transactions duration distribution* chart.
|
||||
You can update the sampled transactions by selecting a new _bucket_.
|
||||
The number of requests per bucket is displayed when hovering over the graph,
|
||||
and the selected bucket is highlighted to stand out.
|
||||
|
||||
The screenshot below shows a typical distribution, and indicates most of our requests were served quickly--awesome!
|
||||
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-duration-dist.png[Example view of transactions duration distribution graph]
|
||||
|
||||
This graph shows a typical distribution, and indicates most of our requests were served quickly--awesome!
|
||||
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.
|
||||
|
||||
When you select one of these buckets,
|
||||
When you select a bucket,
|
||||
you're presented with up to ten trace samples.
|
||||
Each sample has a trace timeline waterfall that shows what a typical request in that bucket was doing.
|
||||
By investigating this timeline waterfall, we can hopefully determine _why_ this request was slow and then implement a fix.
|
||||
Each sample has a trace timeline waterfall that shows how a typical request in that bucket executed.
|
||||
This waterfall is useful for understanding the parent/child hierarchy of transactions and spans,
|
||||
and ultimately determining _why_ a request was slow.
|
||||
For large waterfalls, expand problematic transactions and collapse well-performing ones
|
||||
for easier problem isolation and troubleshooting.
|
||||
|
||||
[role="screenshot"]
|
||||
image::apm/images/apm-transaction-sample.png[Example view of transactions sample]
|
||||
|
|
|
@ -14,6 +14,7 @@ Also, check out the https://discuss.elastic.co/c/apm[APM discussion forum].
|
|||
* <<troubleshooting-too-many-transactions>>
|
||||
* <<troubleshooting-unknown-route>>
|
||||
* <<troubleshooting-fields-unsearchable>>
|
||||
* <<service-map-rum-connections>>
|
||||
|
||||
[float]
|
||||
[[no-apm-data-found]]
|
||||
|
@ -180,3 +181,19 @@ setup.template.append_fields:
|
|||
type: object
|
||||
dynamic: true
|
||||
----
|
||||
|
||||
[float]
|
||||
[[service-map-rum-connections]]
|
||||
=== Service maps: no connection between client and server
|
||||
|
||||
If the service map is not showing an expected connection between the client and server,
|
||||
it's likely because you haven't configured
|
||||
{apm-agent-rum}/configuration.html#distributed-tracing-origins[`distributedTracingOrigins`].
|
||||
|
||||
|
||||
This setting is necessary, for example, for cross-origin requests.
|
||||
If you have a basic web application that provides data via an API on `localhost:4000`,
|
||||
and serves HTML from `localhost:4001`, you'd need to set `distributedTracingOrigins: ['https://localhost:4000']`
|
||||
to ensure the origin is monitored as a part of distributed tracing.
|
||||
In other words, `distributedTracingOrigins` is consulted prior to the agent adding the
|
||||
distributed tracing `traceparent` header to each request.
|
||||
|
|