Previously we had mapped host ips as keywords as well as ips in order to
work around #140266. This has been fixed with #154111 so we can rely on
the IP field and remove the duplicate.
Relates #154111
Relates #140266
Read the ES mappings from JSON files instead of having them hard-coded as JS/TS.
We currently need the same mappings in the another repository in JSON format.
Using JSON files in both places eases automated comparison to detect diversions.
---------
Co-authored-by: Tim Rühsen <tim.ruehsen@gmx.de>
With this commit we avoid storing doc values for
`stackframe.function.name` to save disk space as doc values are not
needed due to our access pattern. We also sort stack traces by stack
frame ids to improve disk layout.
This PR adds initial support for initial stackframes as described in
elastic/prodfiler#2918.
It also adds tests and a minor refactor to account for the removal of
synthetic source from stackframes (see elastic/prodfiler#2850).
For stackframes, the profiling stack is composed of multiple write paths
into Elasticsearch and multiple read paths out of Elasticsearch:
* there are three services that can write into Elasticsearch (`APM
agent`, `pf-elastic-collector`, and `pf-elastic-symbolizer`).
* there are also two ways to read from Elasticsearch (the profiling
plugin in Elasticsearch, and a combination of `search` and `mget`
calls).
This PR was written to handle all permutations of these paths. For those
reviewers that wish to try the PR, please keep this in mind. I also
wrote tests to handle these permutations.
Note: Future PRs will add full support for inline stackframes. At this
time, we only read the first inlined stackframe since the UI does not
support inline stackframes.
---------
Co-authored-by: Tim Rühsen <tim.ruhsen@elastic.co>
## Summary
Flamegraph and TopN Functions don't display currently.
The reason is a missing backwards compatibility in the handling/parsing
of stackframes.
This is quick fix to avoid blocking other PRs.
See https://github.com/elastic/prodfiler/issues/2951
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
## Summary
We don't use the source type for frames, but parse it from the query
response.
In the near future, we remove this field completely from the
`profiling-stackframes` index, because we have no reliable way to
determine the type of source code for a frame of type 'native' or
'kernel'.
For interpreted languages we have this information stored in the
`profiling-stacktraces` index.
This PR adds two endpoints to clear each respective cache:
* `DELETE {BASE_KIBANA_PATH}/api/profiling/v1/cache/executables`
* `DELETE {BASE_KIBANA_PATH}/api/profiling/v1/cache/stackframes`
Related to https://github.com/elastic/prodfiler/issues/2759
#### Design choices
1. The `DELETE` method was chosen instead of `PUT` or `POST` since the
given semantics of `DELETE` matches the expected behavior for the
related issue.
2. Each endpoint will remove all items from the respective cache. A
separate API for each cache allows us to selectively clear the necessary
cache without the downsides of a catch-all endpoint to clear all caches.
This gives us the flexibility to add more endpoints if needed. Given the
tradeoff between complexity now or later, it was decided to implement
general invalidation now with the option to invalidate specific items
later.
3. The RESTful design allows us to clear specific items later (e.g.
`DELETE {BASE_KIBANA_PATH}/api/profiling/v1/cache/executables/{ID}`
could clear only executable `ID` from the cache).
4. Each endpoint returns an empty payload on success. However, the
Kibana logs reflect the actions taken and how many cache items were
affected.
5. The stacktrace cache was ignored since it was not affected by symbols
written to Elasticsearch.
6. Each endpoint is not directly accessible from UI since it is expected
that the endpoints will be called outside of Kibana. However, the
endpoints can be called manually from the browser's console.
* Derive address and file ID from base64 encoding
This skips the intermediate deserialization step to a buffer object.
* Move run-length encoding methods
* Decode run-length directly from base64 encoding
This skips the intermediate deserialization step to a buffer object.
* Minor refactor
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* Remove total and sampled traces from API
* Remove Samples array from flamegraph API
These values are redundant with CountInclusive so could be removed
without issue.
* Remove totalCount and eventsIndex
These values are no longer needed.
* Remove samples from callee tree
* Refactor columnar view model into separate file
* Add more lazy-loaded flamegraph calculations
* Fix spacing in frame label
* Remove frame information API
* Improve test coverage
* Fix type error
* Replace fnv-plus with custom 64-bit FNV1-a
* Add exceptions for linting errors
* Add workaround for frame type truncation bug
* Replace prior workaround for truncation bug
This fix supercedes the prior workaround and addresses the truncation at
its source.
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
* Replace non-null assertion with nullish coalescing
* Remove createFrameGroup
* Remove callers
* Use adjacency list representation for tree
* Move frame type map outside function
* Inline frame group name
* Replace FrameGroupID with ID
* Create columnar view model in client
* Add instrumentation for flamegraph
I inlined the construction of the flamegraph into the respective route
so that we could add fine-grained instrumentation. We now use APM and
console logging to understand how long flamegraph construction takes.
* Remove unnecessary Set usage
* Remove superfluous clone
This was likely added when we needed to avoid infinite recursion when
serializing to JSON. This no longer has a useful function.
* Pass in pre-calculated frame group info
I noticed that we were creating frame group info multiple times so I
added it as a parameter for the intermediate node.
* Sort callees in one place
Callees should be sorted first by samples decreasing and then by frame
groups. Combining the two sorts makes the post-processing clearer to
future readers and/or maintainers.
* Capitalize fields in preparation of merging
* Align both node data structures
* Pass metadata instead of copying fields
* Refactor frame label method
* Use pre-calculated array length
* Use pre-allocated array
* Refactor intermediate node
* Remove intermediate node structure
* Move if statement out of for loop
* Fix comments
* Sort sibling nodes by frame group ID
* Calculate graph size during creation
* Add missing groupStackFrameMetadataByStackTrace
* Fix formatting
* Fix generated callee source
* Fix creation of frame group
* Fix test
* Remove filter for relevant traces
* Stop passing frame group
* Create root node inside createCallerCalleeGraph
* Fix timestamps
* Remove frame group comparator
* Add instrumentation for topN functions
* Allow for missing stacktraces
* Use Date.now instead