[APM] Document serverless-specific UI (#135178)

Co-authored-by: Alexander Wert <AlexanderWert@users.noreply.github.com>
2025-04-23 17:28:26 -04:00 · 2022-07-14 15:04:57 -06:00 · 2022-07-14 15:04:57 -06:00 · 3044cb7ba5
commit 3044cb7ba5
parent 4498161a47
8 changed files with 90 additions and 12 deletions
--- a/docs/apm/correlations.asciidoc
+++ b/docs/apm/correlations.asciidoc
@ -21,7 +21,7 @@ NOTE: Queries within the {apm-app} are also applied to the correlations.
 ==== Find high transaction latency correlations

 The correlations on the *Latency correlations* tab help you discover which
-attributes are contributing to increased transaction latency. 
+attributes are contributing to increased transaction latency.

 [role="screenshot"]
 image::apm/images/correlations-hover.png[Latency correlations]
@ -74,7 +74,7 @@ The table is sorted by scores, which are mapped to high, medium, or low impact
 levels. Attributes with high impact levels are more likely to contribute to
 failed transactions. By default, the attribute with the highest score is added
 to the chart. To see a different attribute in the chart, select its row in the
-table. 
+table.

 For example, in the screenshot below, there are attributes such as a specific
 node and pod name that have medium impact on the failed transactions.
@ -86,4 +86,4 @@ Select the `+` filter to create a new query in the {apm-app} for transactions
 with one or more of these attributes. If you are unfamiliar with a field, click
 the icon beside its name to view its most popular values and optionally filter
 on those values too. Each time that you add another attribute, it is filtering
-out more and more noise and bringing you closer to a diagnosis.
+out more and more noise and bringing you closer to a diagnosis.
--- a/docs/apm/how-to-guides.asciidoc
+++ b/docs/apm/how-to-guides.asciidoc
@ -12,6 +12,7 @@ Learn how to perform common APM app tasks.
 * <<filters>>
 * <<correlations>>
 * <<machine-learning-integration>>
+* <<apm-lambda>>
 * <<advanced-queries>>
 * <<transactions-annotations>>

@ -30,6 +31,8 @@ include::correlations.asciidoc[]

 include::machine-learning.asciidoc[]

+include::lambda.asciidoc[]
+
 include::advanced-queries.asciidoc[]

 include::deployment-annotations.asciidoc[]
--- a/docs/apm/images/lambda-cold-start-trace.png
+++ b/docs/apm/images/lambda-cold-start-trace.png
--- a/docs/apm/images/lambda-cold-start.png
+++ b/docs/apm/images/lambda-cold-start.png
--- a/docs/apm/images/lambda-correlations.png
+++ b/docs/apm/images/lambda-correlations.png
--- a/docs/apm/lambda.asciidoc
+++ b/docs/apm/lambda.asciidoc
@ -0,0 +1,51 @@
+[role="xpack"]
+[[apm-lambda]]
+=== Observe Lambda functions
+
+Elastic APM provides performance and error monitoring for AWS Lambda functions.
+Get insight into function execution and runtime behavior, as well as visibility into how your Lambda functions relate to and depend on other services.
+
+To set up Lambda monitoring, see the relevant
+{apm-guide-ref}/monitoring-aws-lambda.html[quick start guide].
+
+[float]
+[[apm-lambda-cold-start-info]]
+==== Cold starts
+
+A cold start occurs when a Lambda function has not been used for a certain period of time. A lambda worker receives a request to run the function and prepares an execution environment.
+
+Cold starts are an unavoidable byproduct of the serverless world, but visibility into how they impact your services can help you make better decisions about factors like how much memory to allocate to a function, whether to enable provisioned concurrency, or if it's time to consider removing a large dependency.
+
+[float]
+[[apm-lambda-cold-start-rate]]
+===== Cold start rate
+
+The cold start rate (i.e. proportion of requests that experience a cold start) is displayed per service and per transaction.
+
+[role="screenshot"]
+image::apm/images/lambda-cold-start.png[lambda cold start graph]
+
+Cold start is also displayed in the trace waterfall, where you can drill-down into individual traces and see trace metadata like AWS request ID, trigger type, and trigger request ID.
+
+[role="screenshot"]
+image::apm/images/lambda-cold-start-trace.png[lambda cold start trace]
+
+[float]
+[[apm-lambda-cold-start-latency]]
+===== Latency distribution correlation
+
+The <<correlations-latency,latency correlations>> feature can be used to visualize the impact of Lambda cold starts on latency--just select the `faas.coldstart` field.
+
+[role="screenshot"]
+image::apm/images/lambda-correlations.png[lambda correlations example]
+
+[float]
+[[apm-lambda-service-config]]
+==== AWS Lambda function grouping
+
+The default APM agent configuration results in one APM service per AWS Lambda function,
+where the Lambda function name is the service name.
+
+In some use cases, it makes more sense to logically group multiple lambda functions under a single
+APM service. You can achieve this by setting the `ELASTIC_APM_SERVICE_NAME` environment variable
+on related Lambda functions to the same value.
--- a/docs/apm/service-overview.asciidoc
+++ b/docs/apm/service-overview.asciidoc
@ -8,7 +8,8 @@ high-level visibility into how a service is performing across your infrastructur

 * Service details like service version, runtime version, framework, and agent name and version
 * Container and orchestration information
-* Cloud provider, machine type, and availability zone
+* Cloud provider, machine type, service name, region, and availability zone
+* Serverless function names and event trigger type
 * Latency, throughput, and errors over time
 * Service dependencies

@ -16,10 +17,10 @@ high-level visibility into how a service is performing across your infrastructur
 [[service-time-comparison]]
 === Time series and expected bounds comparison

-For insight into the health of your services, you can compare how a service 
-performs relative to a previous time frame or to the expected bounds from the 
-corresponding {anomaly-job}. For example, has latency been slowly increasing 
-over time, did the service experience a sudden spike, is the throughput similar 
+For insight into the health of your services, you can compare how a service
+performs relative to a previous time frame or to the expected bounds from the
+corresponding {anomaly-job}. For example, has latency been slowly increasing
+over time, did the service experience a sudden spike, is the throughput similar
 to what the {ml} job expects – enabling a comparison can provide the answer.

 [role="screenshot"]
@ -42,8 +43,8 @@ The time-based comparison options are based on the selected time filter range:
 |An identical amount of time immediately before the selected time range
 |====

-You can use the expected bounds comparison if {ml-jobs} exist in your selected 
-environment and you have 
+You can use the expected bounds comparison if {ml-jobs} exist in your selected
+environment and you have
 {ml-docs}/setup.html#kib-visibility-spaces[access to the {ml-features}].

 [discrete]
@ -79,7 +80,7 @@ image::apm/images/traffic-transactions.png[Traffic and transactions]

 The failed transaction rate represents the percentage of failed transactions from the perspective of the selected service.
 It's useful for visualizing unexpected increases, decreases, or irregular patterns in a service's transactions.
-+
+
 [TIP]
 ====
 HTTP **transactions** from the HTTP server perspective do not consider a `4xx` status code (client error) as a failure
@ -119,6 +120,17 @@ requires an agent version ≥ v5.6.3.
 [role="screenshot"]
 image::apm/images/spans-dependencies.png[Span type duration and dependencies]

+[discrete]
+[[service-cold-start]]
+=== Cold start rate
+
+The cold start rate chart is specific to serverless services.
+It displays the percentage of requests that trigger a cold start of a serverless function.
+See <<apm-lambda-cold-start-info>> for more information.
+
+[role="screenshot"]
+image::apm/images/lambda-cold-start.png[lambda cold start graph]
+
 [discrete]
 [[service-instances]]
 === Instances
@ -157,9 +169,16 @@ image::apm/images/metadata-icons.png[Service metadata]
 *Cloud provider information*

 * Cloud provider
+* Cloud service name
 * Availability zones
 * Machine types
 * Project ID
+* Region
+
+*Serverless information*
+
+* Function name(s)
+* Event trigger type

 *Alerts*

--- a/docs/apm/transactions.asciidoc
+++ b/docs/apm/transactions.asciidoc
@ -8,7 +8,7 @@ APM agents automatically collect performance metrics on HTTP requests, database
 [role="screenshot"]
 image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]

-The *Latency*, *transactions per minute*, *Failed transaction rate*, and *Average duration by span type*
+The *Latency*, *Throughput*, *Failed transaction rate*, *Average duration by span type*, and *Cold start rate*
 charts display information on all transactions associated with the selected service:

 *Latency*::
@ -48,6 +48,10 @@ This could be a sign that the agent does not have auto-instrumentation for whate
 +
 It's important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.

+*Cold start rate*::
+Only applicable to serverless transactions, this chart displays the percentage of requests that trigger a cold start of a serverless function.
+See <<apm-lambda-cold-start-info>> for more information.
+
 [discrete]
 [[transactions-table]]
 === Transactions table
@ -149,6 +153,7 @@ Learn more about a trace sample in the *Metadata* tab:
 * Agent information
 * URL
 * User - Requires additional configuration, but allows you to see which user experienced the current transaction.
+* FaaS information, like cold start, AWS request ID, trigger type, and trigger request ID

 TIP: All of this data is stored in documents in Elasticsearch.
 This means you can select "Actions - View transaction in Discover" to see the actual Elasticsearch document under the discover tab.