mirror of
https://github.com/elastic/kibana.git
synced 2025-04-24 09:48:58 -04:00
### Summary - Closes https://github.com/elastic/observability-docs/issues/2632. - Closes https://github.com/elastic/observability-docs/issues/2631. - Closes https://github.com/elastic/observability-docs/issues/2633. - Closes https://github.com/elastic/observability-docs/issues/2630. - Closes https://github.com/elastic/observability-docs/issues/1339.
196 lines
7.3 KiB
Text
196 lines
7.3 KiB
Text
[role="xpack"]
|
||
[[service-overview]]
|
||
=== Service overview
|
||
|
||
Selecting a non-mobile <<services,*service*>> brings you to the *Service overview*.
|
||
The *Service overview* contains a wide variety of charts and tables that provide
|
||
high-level visibility into how a service is performing across your infrastructure:
|
||
|
||
* Service details like service version, runtime version, framework, and APM agent name and version
|
||
* Container and orchestration information
|
||
* Cloud provider, machine type, service name, region, and availability zone
|
||
* Serverless function names and event trigger type
|
||
* Latency, throughput, and errors over time
|
||
* Service dependencies
|
||
|
||
[discrete]
|
||
[[service-time-comparison]]
|
||
=== Time series and expected bounds comparison
|
||
|
||
For insight into the health of your services, you can compare how a service
|
||
performs relative to a previous time frame or to the expected bounds from the
|
||
corresponding {anomaly-job}. For example, has latency been slowly increasing
|
||
over time, did the service experience a sudden spike, is the throughput similar
|
||
to what the {ml} job expects – enabling a comparison can provide the answer.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/time-series-expected-bounds-comparison.png[Time series and expected bounds comparison]
|
||
|
||
Select the *Comparison* box to apply a time-based or expected bounds comparison.
|
||
The time-based comparison options are based on the selected time filter range:
|
||
|
||
[options="header"]
|
||
|====
|
||
|Time filter | Time comparison options
|
||
|
||
|≤ 24 hours
|
||
|One day or one week
|
||
|
||
|> 24 hours and ≤ 7 days
|
||
|One week
|
||
|
||
|> 7 days
|
||
|An identical amount of time immediately before the selected time range
|
||
|====
|
||
|
||
You can use the expected bounds comparison if {ml-jobs} exist in your selected
|
||
environment and you have
|
||
{ml-docs}/setup.html#kib-visibility-spaces[access to the {ml-features}].
|
||
|
||
[discrete]
|
||
[[service-latency]]
|
||
=== Latency
|
||
|
||
Response times for the service. You can filter the *Latency* chart to display the average,
|
||
95th, or 99th percentile latency times for the service.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/latency.png[Service latency]
|
||
|
||
[discrete]
|
||
[[service-throughput-transactions]]
|
||
=== Throughput and transactions
|
||
|
||
// tag::throughput-transactions[]
|
||
The *Throughput* chart visualizes the average number of transactions per minute for the selected service.
|
||
|
||
The *Transactions* table displays a list of _transaction groups_ for the
|
||
selected service and includes the latency, traffic, error rate, and the impact for each transaction.
|
||
Transactions that share the same name are grouped, and only one entry is displayed for each group.
|
||
|
||
By default, transaction groups are sorted by _Impact_ to show the most used and slowest endpoints in your
|
||
service. If there is a particular endpoint you are interested in, click *View transactions* to view a
|
||
list of similar transactions on the <<transactions, transactions overview>> page.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/traffic-transactions.png[Traffic and transactions]
|
||
// end::throughput-transactions[]
|
||
|
||
[discrete]
|
||
[[service-error-rates]]
|
||
=== Failed transaction rate and errors
|
||
|
||
// tag::ftr[]
|
||
The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
|
||
It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
|
||
|
||
[TIP]
|
||
====
|
||
HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
|
||
because the failure was caused by the caller, not the HTTP server. Thus, `event.outcome=success` and there will be no increase in failed transaction rate.
|
||
|
||
HTTP **spans** from the client perspective however, are considered failures if the HTTP status code is ≥ 400.
|
||
These spans will set `event.outcome=failure` and increase the failed transaction rate.
|
||
|
||
If there is no HTTP status, both transactions and spans are considered successful unless an error is reported.
|
||
====
|
||
// end::ftr[]
|
||
|
||
The *Errors* table provides a high-level view of each error message when it first and last occurred,
|
||
along with the total number of occurrences. This makes it very easy to quickly see which errors affect
|
||
your services and take actions to rectify them. To do so, click *View errors*.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/error-rate.png[failed transaction rate and errors]
|
||
|
||
[discrete]
|
||
[[service-span-duration]]
|
||
=== Span types average duration and dependencies
|
||
|
||
The *Time spent by span type* chart visualizes each span type's average duration and helps you determine
|
||
which spans could be slowing down transactions. The "app" label displayed under the
|
||
chart indicates that something was happening within the application. This could signal that the APM
|
||
agent does not have auto-instrumentation for whatever was happening during that time or that the time was spent in the
|
||
application code and not in database or external requests.
|
||
|
||
// tag::dependencies[]
|
||
The *Dependencies* table displays a list of downstream services or external connections relevant
|
||
to the service at the selected time range. The table displays latency, throughput, failed transaction rate, and the impact of
|
||
each dependency. By default, dependencies are sorted by _Impact_ to show the most used and the slowest dependency.
|
||
If there is a particular dependency you are interested in, click *<<dependencies,View dependencies>>* to learn more about it.
|
||
|
||
NOTE: Displaying dependencies for services instrumented with the Real User Monitoring (RUM) agent
|
||
requires an agent version ≥ v5.6.3.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/spans-dependencies.png[Span type duration and dependencies]
|
||
// end::dependencies[]
|
||
|
||
[discrete]
|
||
[[service-cold-start]]
|
||
=== Cold start rate
|
||
|
||
The cold start rate chart is specific to serverless services, and displays the
|
||
percentage of requests that trigger a cold start of a serverless function.
|
||
A cold start occurs when a serverless function has not been used for a certain period of time.
|
||
Analyzing the cold start rate can be useful for deciding how much memory to allocate to a function,
|
||
or when to remove a large dependency.
|
||
|
||
The cold start rate chart is currently supported for <<apm-lambda-cold-start-info,AWS Lambda>>
|
||
functions and Azure functions.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/lambda-cold-start.png[lambda cold start graph]
|
||
|
||
[discrete]
|
||
[[service-instances]]
|
||
=== Instances
|
||
|
||
The *Instances* table displays a list of all the available service instances within the selected time range.
|
||
Depending on how the service runs, the instance could be a host or a container. The table displays latency, throughput,
|
||
failed transaction, CPU usage, and memory usage for each instance. By default, instances are sorted by _Throughput_.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/all-instances.png[All instances]
|
||
|
||
[discrete]
|
||
[[service-metadata]]
|
||
=== Service metadata
|
||
|
||
To view metadata relating to the service agent, and if relevant, the container and cloud provider,
|
||
click on each icon located at the top of the page beside the service name.
|
||
|
||
[role="screenshot"]
|
||
image::apm/images/metadata-icons.png[Service metadata]
|
||
|
||
*Service information*
|
||
|
||
* Service version
|
||
* Runtime name and version
|
||
* Framework name
|
||
* APM agent name and version
|
||
|
||
*Container information*
|
||
|
||
* Operating system
|
||
* Containerized - Yes or no.
|
||
* Total number of instances
|
||
* Orchestration
|
||
|
||
*Cloud provider information*
|
||
|
||
* Cloud provider
|
||
* Cloud service name
|
||
* Availability zones
|
||
* Machine types
|
||
* Project ID
|
||
* Region
|
||
|
||
*Serverless information*
|
||
|
||
* Function name(s)
|
||
* Event trigger type
|
||
|
||
*Alerts*
|
||
|
||
* Recently fired alerts
|