mirror of
https://github.com/elastic/kibana.git
synced 2025-04-23 09:19:04 -04:00
# Backport This will backport the following commits from `main` to `8.6`: - [Fleet Usage telemetry extension (#145353)](https://github.com/elastic/kibana/pull/145353) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Julia Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2022-11-23T09:22:20Z","message":"Fleet Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet to the telemetry that I added for each requirement.\r\nPlease review and let me know if any changes are needed.\r\nAlso asked a few questions below. @jlind23 @kpollich \r\n\r\n6. is blocked by [elasticsearch\r\nchange](https://github.com/elastic/elasticsearch/pull/91701) to give\r\nkibana_system the missing privilege to read logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning from\r\nhttps://github.com/elastic/kibana/pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n- [x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent running: `agent.version` field on\r\n`.fleet-agents` documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n ],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can query for `.fleet-policies` where some `input` has `type:\r\n'fleet-server'` for this, as well as use the `Fleet Server Hosts`\r\nsettings that we define via saved objects in Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\": [\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n \"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\": \"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n \"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those policies\r\nCollecting this from ts logic, querying from `.fleet-policies` index.\r\nThe alternative would be to write a painless script (because the\r\n`outputs` are an object with dynamic keys, we can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\": {\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n }\r\n```\r\n\r\nDid we mean to just collect the types here, or any other info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin failures\r\nWe only have the most recent checkin status and timestamp on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin status currently.\r\nOr do we mean to publish specific info for all agents\r\n(`last_checkin_status`, `last_checkin` time, `last_checkin_message`)?\r\nAre the only statuses `error` and `degraded` that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\": {\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6. Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean here elastic-agent logs only, or fleet-server logs as well\r\n(maybe separately)?\r\n\r\nI found an alternative way to query the message field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n \"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\": \"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n \"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n \"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n \"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n \"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n \"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n \"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline agents\",\r\n \"regex\": \".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n \"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n \"key\": \"\"\"stderr panic close of closed channel n ngoroutine running Stop ngithub.com/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5 \\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past period of time\r\n\r\nI think this is almost the same as #5. The difference would be to report\r\nnew failures happened only in the last hour, or report all agents in\r\nfailure state. (which would be an increasing number if the agent stays\r\nin failed state).\r\nDo we want these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr query, instead added a new field to report\r\nagents enrolled per policy (top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\": {\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n \"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of Elastic Agent and number of fleet server\r\n\r\nThis is already there in the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n \"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n \"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n },\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","v8.6.0","v8.7.0"],"number":145353,"url":"https://github.com/elastic/kibana/pull/145353","mergeCommit":{"message":"Fleet Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet to the telemetry that I added for each requirement.\r\nPlease review and let me know if any changes are needed.\r\nAlso asked a few questions below. @jlind23 @kpollich \r\n\r\n6. is blocked by [elasticsearch\r\nchange](https://github.com/elastic/elasticsearch/pull/91701) to give\r\nkibana_system the missing privilege to read logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning from\r\nhttps://github.com/elastic/kibana/pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n- [x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent running: `agent.version` field on\r\n`.fleet-agents` documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n ],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can query for `.fleet-policies` where some `input` has `type:\r\n'fleet-server'` for this, as well as use the `Fleet Server Hosts`\r\nsettings that we define via saved objects in Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\": [\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n \"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\": \"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n \"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those policies\r\nCollecting this from ts logic, querying from `.fleet-policies` index.\r\nThe alternative would be to write a painless script (because the\r\n`outputs` are an object with dynamic keys, we can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\": {\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n }\r\n```\r\n\r\nDid we mean to just collect the types here, or any other info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin failures\r\nWe only have the most recent checkin status and timestamp on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin status currently.\r\nOr do we mean to publish specific info for all agents\r\n(`last_checkin_status`, `last_checkin` time, `last_checkin_message`)?\r\nAre the only statuses `error` and `degraded` that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\": {\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6. Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean here elastic-agent logs only, or fleet-server logs as well\r\n(maybe separately)?\r\n\r\nI found an alternative way to query the message field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n \"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\": \"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n \"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n \"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n \"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n \"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n \"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n \"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline agents\",\r\n \"regex\": \".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n \"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n \"key\": \"\"\"stderr panic close of closed channel n ngoroutine running Stop ngithub.com/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5 \\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past period of time\r\n\r\nI think this is almost the same as #5. The difference would be to report\r\nnew failures happened only in the last hour, or report all agents in\r\nfailure state. (which would be an increasing number if the agent stays\r\nin failed state).\r\nDo we want these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr query, instead added a new field to report\r\nagents enrolled per policy (top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\": {\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n \"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of Elastic Agent and number of fleet server\r\n\r\nThis is already there in the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n \"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n \"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n },\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9"}},"sourceBranch":"main","suggestedTargetBranches":["8.6"],"targetPullRequestStates":[{"branch":"8.6","label":"v8.6.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/145353","number":145353,"mergeCommit":{"message":"Fleet Usage telemetry extension (#145353)\n\n## Summary\r\n\r\nCloses https://github.com/elastic/ingest-dev/issues/1261\r\n\r\nAdded a snippet to the telemetry that I added for each requirement.\r\nPlease review and let me know if any changes are needed.\r\nAlso asked a few questions below. @jlind23 @kpollich \r\n\r\n6. is blocked by [elasticsearch\r\nchange](https://github.com/elastic/elasticsearch/pull/91701) to give\r\nkibana_system the missing privilege to read logs-elastic_agent* indices.\r\n\r\nTook inspiration for task versioning from\r\nhttps://github.com/elastic/kibana/pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186\r\n\r\n- [x] 1. Elastic Agent versions\r\nVersions of all the Elastic Agent running: `agent.version` field on\r\n`.fleet-agents` documents\r\n\r\n```\r\n\"agent_versions\": [\r\n \"8.6.0\"\r\n ],\r\n```\r\n\r\n- [x] 2. Fleet server configuration\r\nThink we can query for `.fleet-policies` where some `input` has `type:\r\n'fleet-server'` for this, as well as use the `Fleet Server Hosts`\r\nsettings that we define via saved objects in Fleet\r\n\r\n\r\n```\r\n \"fleet_server_config\": {\r\n \"policies\": [\r\n {\r\n \"input_config\": {\r\n \"server\": {\r\n \"limits.max_agents\": 10000\r\n },\r\n \"server.runtime\": \"gc_percent:20\"\r\n }\r\n }\r\n ]\r\n }\r\n```\r\n\r\n- [x] 3. Number of policies\r\nCount of `.fleet-policies` index \r\n\r\nTo confirm, did we mean agent policies here?\r\n\r\n```\r\n \"agent_policies\": {\r\n \"count\": 7,\r\n```\r\n\r\n- [x] 4. Output type contained in those policies\r\nCollecting this from ts logic, querying from `.fleet-policies` index.\r\nThe alternative would be to write a painless script (because the\r\n`outputs` are an object with dynamic keys, we can't do an aggregation\r\ndirectly).\r\n\r\n```\r\n\"agent_policies\": {\r\n \"output_types\": [\r\n \"elasticsearch\"\r\n ]\r\n }\r\n```\r\n\r\nDid we mean to just collect the types here, or any other info? e.g.\r\noutput urls\r\n\r\n- [x] 5. Average number of checkin failures\r\nWe only have the most recent checkin status and timestamp on\r\n`.fleet-agents`.\r\n\r\nDo we mean here to publish the total last checkin failure count? E.g. 3\r\nif 3 agents are in failure checkin status currently.\r\nOr do we mean to publish specific info for all agents\r\n(`last_checkin_status`, `last_checkin` time, `last_checkin_message`)?\r\nAre the only statuses `error` and `degraded` that we want to send?\r\n\r\n```\r\n \"agent_last_checkin_status\": {\r\n \"error\": 0,\r\n \"degraded\": 0\r\n },\r\n```\r\n\r\n- [ ] 6. Top 3 most common errors in the Elastic Agent logs\r\n\r\nDo we mean here elastic-agent logs only, or fleet-server logs as well\r\n(maybe separately)?\r\n\r\nI found an alternative way to query the message field using sampler and\r\ncategorize text aggregation:\r\n```\r\nGET logs-elastic_agent*/_search\r\n{\r\n \"size\": 0,\r\n \"query\": {\r\n \"bool\": {\r\n \"must\": [\r\n {\r\n \"term\": {\r\n \"log.level\": \"error\"\r\n }\r\n },\r\n {\r\n \"range\": {\r\n \"@timestamp\": {\r\n \"gte\": \"now-1h\"\r\n }\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n \"aggregations\": {\r\n \"message_sample\": {\r\n \"sampler\": {\r\n \"shard_size\": 200\r\n },\r\n \"aggs\": {\r\n \"categories\": {\r\n \"categorize_text\": {\r\n \"field\": \"message\",\r\n \"size\": 10\r\n }\r\n }\r\n }\r\n }\r\n }\r\n}\r\n```\r\nExample response:\r\n```\r\n\"aggregations\": {\r\n \"message_sample\": {\r\n \"doc_count\": 112,\r\n \"categories\": {\r\n \"buckets\": [\r\n {\r\n \"doc_count\": 73,\r\n \"key\": \"failed to unenroll offline agents\",\r\n \"regex\": \".*?failed.+?to.+?unenroll.+?offline.+?agents.*?\",\r\n \"max_matching_length\": 36\r\n },\r\n {\r\n \"doc_count\": 7,\r\n \"key\": \"\"\"stderr panic close of closed channel n ngoroutine running Stop ngithub.com/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5 \\n\\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go n\r\n```\r\n\r\n\r\n- [x] 7. Number of checkin failure over the past period of time\r\n\r\nI think this is almost the same as #5. The difference would be to report\r\nnew failures happened only in the last hour, or report all agents in\r\nfailure state. (which would be an increasing number if the agent stays\r\nin failed state).\r\nDo we want these 2 separate telemetry fields?\r\n\r\nEDIT: removed the last1hr query, instead added a new field to report\r\nagents enrolled per policy (top 10). See comments below.\r\n\r\n```\r\n \"agent_checkin_status\": {\r\n \"error\": 3,\r\n \"degraded\": 0\r\n },\r\n \"agents_per_policy\": [2, 1000],\r\n```\r\n\r\n- [x] 8. Number of Elastic Agent and number of fleet server\r\n\r\nThis is already there in the existing telemetry:\r\n```\r\n \"agents\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"total_all_statuses\": 1,\r\n \"updating\": 0\r\n },\r\n \"fleet_server\": {\r\n \"total_enrolled\": 0,\r\n \"healthy\": 0,\r\n \"unhealthy\": 0,\r\n \"offline\": 0,\r\n \"updating\": 0,\r\n \"total_all_statuses\": 0,\r\n \"num_host_urls\": 1\r\n },\r\n```\r\n\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [ ] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios\r\n\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"e00e26e86854bdbde7c14f88453b717505fed4d9"}}]}] BACKPORT--> Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
This commit is contained in:
parent
b6907b84b0
commit
7b99f4c6cd
10 changed files with 783 additions and 204 deletions
|
@ -7,8 +7,9 @@
|
|||
|
||||
import type { SavedObjectsClient, ElasticsearchClient } from '@kbn/core/server';
|
||||
|
||||
import type { FleetConfigType } from '../../common/types';
|
||||
import { AGENTS_INDEX } from '../../common';
|
||||
import * as AgentService from '../services/agents';
|
||||
import { appContextService } from '../services';
|
||||
|
||||
export interface AgentUsage {
|
||||
total_enrolled: number;
|
||||
|
@ -20,7 +21,6 @@ export interface AgentUsage {
|
|||
}
|
||||
|
||||
export const getAgentUsage = async (
|
||||
config: FleetConfigType,
|
||||
soClient?: SavedObjectsClient,
|
||||
esClient?: ElasticsearchClient
|
||||
): Promise<AgentUsage> => {
|
||||
|
@ -47,3 +47,84 @@ export const getAgentUsage = async (
|
|||
updating,
|
||||
};
|
||||
};
|
||||
|
||||
export interface AgentData {
|
||||
agent_versions: string[];
|
||||
agent_checkin_status: {
|
||||
error: number;
|
||||
degraded: number;
|
||||
};
|
||||
agents_per_policy: number[];
|
||||
}
|
||||
|
||||
const DEFAULT_AGENT_DATA = {
|
||||
agent_versions: [],
|
||||
agent_checkin_status: { error: 0, degraded: 0 },
|
||||
agents_per_policy: [],
|
||||
};
|
||||
|
||||
export const getAgentData = async (
|
||||
esClient: ElasticsearchClient,
|
||||
abortController: AbortController
|
||||
): Promise<AgentData> => {
|
||||
try {
|
||||
const transformLastCheckinStatusBuckets = (resp: any) =>
|
||||
((resp?.aggregations?.last_checkin_status as any).buckets ?? []).reduce(
|
||||
(acc: any, bucket: any) => {
|
||||
if (acc[bucket.key] !== undefined) acc[bucket.key] = bucket.doc_count;
|
||||
return acc;
|
||||
},
|
||||
{ error: 0, degraded: 0 }
|
||||
);
|
||||
const response = await esClient.search(
|
||||
{
|
||||
index: AGENTS_INDEX,
|
||||
query: {
|
||||
bool: {
|
||||
filter: [
|
||||
{
|
||||
term: {
|
||||
active: 'true',
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
size: 0,
|
||||
aggs: {
|
||||
versions: {
|
||||
terms: { field: 'agent.version' },
|
||||
},
|
||||
last_checkin_status: {
|
||||
terms: { field: 'last_checkin_status' },
|
||||
},
|
||||
policies: {
|
||||
terms: { field: 'policy_id' },
|
||||
},
|
||||
},
|
||||
},
|
||||
{ signal: abortController.signal }
|
||||
);
|
||||
const versions = ((response?.aggregations?.versions as any).buckets ?? []).map(
|
||||
(bucket: any) => bucket.key
|
||||
);
|
||||
const statuses = transformLastCheckinStatusBuckets(response);
|
||||
|
||||
const agentsPerPolicy = ((response?.aggregations?.policies as any).buckets ?? []).map(
|
||||
(bucket: any) => bucket.doc_count
|
||||
);
|
||||
|
||||
return {
|
||||
agent_versions: versions,
|
||||
agent_checkin_status: statuses,
|
||||
agents_per_policy: agentsPerPolicy,
|
||||
};
|
||||
} catch (error) {
|
||||
if (error.statusCode === 404) {
|
||||
appContextService.getLogger().debug('Index .fleet-agents does not exist yet.');
|
||||
} else {
|
||||
throw error;
|
||||
}
|
||||
return DEFAULT_AGENT_DATA;
|
||||
}
|
||||
};
|
||||
|
|
61
x-pack/plugins/fleet/server/collectors/agent_policies.ts
Normal file
61
x-pack/plugins/fleet/server/collectors/agent_policies.ts
Normal file
|
@ -0,0 +1,61 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0; you may not use this file except in compliance with the Elastic License
|
||||
* 2.0.
|
||||
*/
|
||||
|
||||
import type { ElasticsearchClient } from '@kbn/core/server';
|
||||
|
||||
import { AGENT_POLICY_INDEX } from '../../common';
|
||||
import { ES_SEARCH_LIMIT } from '../../common/constants';
|
||||
import { appContextService } from '../services';
|
||||
|
||||
export interface AgentPoliciesUsage {
|
||||
count: number;
|
||||
output_types: string[];
|
||||
}
|
||||
|
||||
const DEFAULT_AGENT_POLICIES_USAGE = {
|
||||
count: 0,
|
||||
output_types: [],
|
||||
};
|
||||
|
||||
export const getAgentPoliciesUsage = async (
|
||||
esClient: ElasticsearchClient,
|
||||
abortController: AbortController
|
||||
): Promise<AgentPoliciesUsage> => {
|
||||
try {
|
||||
const res = await esClient.search(
|
||||
{
|
||||
index: AGENT_POLICY_INDEX,
|
||||
size: ES_SEARCH_LIMIT,
|
||||
track_total_hits: true,
|
||||
rest_total_hits_as_int: true,
|
||||
},
|
||||
{ signal: abortController.signal }
|
||||
);
|
||||
|
||||
const agentPolicies = res.hits.hits;
|
||||
|
||||
const outputTypes = new Set<string>();
|
||||
agentPolicies.forEach((item) => {
|
||||
const source = (item._source as any) ?? {};
|
||||
Object.keys(source.data.outputs).forEach((output) => {
|
||||
outputTypes.add(source.data.outputs[output].type);
|
||||
});
|
||||
});
|
||||
|
||||
return {
|
||||
count: res.hits.total as number,
|
||||
output_types: Array.from(outputTypes),
|
||||
};
|
||||
} catch (error) {
|
||||
if (error.statusCode === 404) {
|
||||
appContextService.getLogger().debug('Index .fleet-policies does not exist yet.');
|
||||
} else {
|
||||
throw error;
|
||||
}
|
||||
return DEFAULT_AGENT_POLICIES_USAGE;
|
||||
}
|
||||
};
|
|
@ -7,6 +7,8 @@
|
|||
|
||||
import type { SavedObjectsClient, ElasticsearchClient } from '@kbn/core/server';
|
||||
|
||||
import { PACKAGE_POLICY_SAVED_OBJECT_TYPE, SO_SEARCH_LIMIT } from '../constants';
|
||||
|
||||
import { packagePolicyService } from '../services';
|
||||
import { getAgentStatusForAgentPolicy } from '../services/agents';
|
||||
import { listFleetServerHosts } from '../services/fleet_server_host';
|
||||
|
@ -84,3 +86,47 @@ export const getFleetServerUsage = async (
|
|||
num_host_urls: numHostsUrls,
|
||||
};
|
||||
};
|
||||
|
||||
export const getFleetServerConfig = async (soClient: SavedObjectsClient): Promise<any> => {
|
||||
const res = await packagePolicyService.list(soClient, {
|
||||
page: 1,
|
||||
perPage: SO_SEARCH_LIMIT,
|
||||
kuery: `${PACKAGE_POLICY_SAVED_OBJECT_TYPE}.package.name:fleet_server`,
|
||||
});
|
||||
const getInputConfig = (item: any) => {
|
||||
const config = (item.inputs[0] ?? {}).compiled_input;
|
||||
if (config?.server) {
|
||||
// whitelist only server limits, timeouts and runtime, sometimes fields are coming in "server.limits" format instead of nested object
|
||||
const newConfig = Object.keys(config)
|
||||
.filter((key) => key.startsWith('server'))
|
||||
.reduce((acc: any, curr: string) => {
|
||||
if (curr === 'server') {
|
||||
acc.server = {};
|
||||
Object.keys(config.server)
|
||||
.filter(
|
||||
(key) =>
|
||||
key.startsWith('limits') ||
|
||||
key.startsWith('timeouts') ||
|
||||
key.startsWith('runtime')
|
||||
)
|
||||
.forEach((serverKey: string) => {
|
||||
acc.server[serverKey] = config.server[serverKey];
|
||||
return acc;
|
||||
});
|
||||
} else {
|
||||
acc[curr] = config[curr];
|
||||
}
|
||||
return acc;
|
||||
}, {});
|
||||
|
||||
return newConfig;
|
||||
} else {
|
||||
return {};
|
||||
}
|
||||
};
|
||||
const policies = res.items.map((item) => ({
|
||||
input_config: getInputConfig(item),
|
||||
}));
|
||||
|
||||
return { policies };
|
||||
};
|
||||
|
|
|
@ -11,13 +11,14 @@ import type { CoreSetup } from '@kbn/core/server';
|
|||
import type { FleetConfigType } from '..';
|
||||
|
||||
import { getIsAgentsEnabled } from './config_collectors';
|
||||
import { getAgentUsage } from './agent_collectors';
|
||||
import { getAgentUsage, getAgentData } from './agent_collectors';
|
||||
import type { AgentUsage } from './agent_collectors';
|
||||
import { getInternalClients } from './helpers';
|
||||
import { getPackageUsage } from './package_collectors';
|
||||
import type { PackageUsage } from './package_collectors';
|
||||
import { getFleetServerUsage } from './fleet_server_collector';
|
||||
import { getFleetServerUsage, getFleetServerConfig } from './fleet_server_collector';
|
||||
import type { FleetServerUsage } from './fleet_server_collector';
|
||||
import { getAgentPoliciesUsage } from './agent_policies';
|
||||
|
||||
export interface Usage {
|
||||
agents_enabled: boolean;
|
||||
|
@ -26,11 +27,33 @@ export interface Usage {
|
|||
fleet_server: FleetServerUsage;
|
||||
}
|
||||
|
||||
export const fetchUsage = async (core: CoreSetup, config: FleetConfigType) => {
|
||||
export const fetchFleetUsage = async (
|
||||
core: CoreSetup,
|
||||
config: FleetConfigType,
|
||||
abortController: AbortController
|
||||
) => {
|
||||
const [soClient, esClient] = await getInternalClients(core);
|
||||
if (!soClient || !esClient) {
|
||||
return;
|
||||
}
|
||||
const usage = {
|
||||
agents_enabled: getIsAgentsEnabled(config),
|
||||
agents: await getAgentUsage(soClient, esClient),
|
||||
fleet_server: await getFleetServerUsage(soClient, esClient),
|
||||
packages: await getPackageUsage(soClient),
|
||||
...(await getAgentData(esClient, abortController)),
|
||||
fleet_server_config: await getFleetServerConfig(soClient),
|
||||
agent_policies: await getAgentPoliciesUsage(esClient, abortController),
|
||||
};
|
||||
return usage;
|
||||
};
|
||||
|
||||
// used by kibana daily collector
|
||||
const fetchUsage = async (core: CoreSetup, config: FleetConfigType) => {
|
||||
const [soClient, esClient] = await getInternalClients(core);
|
||||
const usage = {
|
||||
agents_enabled: getIsAgentsEnabled(config),
|
||||
agents: await getAgentUsage(config, soClient, esClient),
|
||||
agents: await getAgentUsage(soClient, esClient),
|
||||
fleet_server: await getFleetServerUsage(soClient, esClient),
|
||||
packages: await getPackageUsage(soClient),
|
||||
};
|
||||
|
@ -41,7 +64,7 @@ export const fetchAgentsUsage = async (core: CoreSetup, config: FleetConfigType)
|
|||
const [soClient, esClient] = await getInternalClients(core);
|
||||
const usage = {
|
||||
agents_enabled: getIsAgentsEnabled(config),
|
||||
agents: await getAgentUsage(config, soClient, esClient),
|
||||
agents: await getAgentUsage(soClient, esClient),
|
||||
fleet_server: await getFleetServerUsage(soClient, esClient),
|
||||
};
|
||||
return usage;
|
||||
|
|
|
@ -0,0 +1,256 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0; you may not use this file except in compliance with the Elastic License
|
||||
* 2.0.
|
||||
*/
|
||||
|
||||
import path from 'path';
|
||||
|
||||
import * as kbnTestServer from '@kbn/core/test_helpers/kbn_server';
|
||||
|
||||
import { fetchFleetUsage } from '../collectors/register';
|
||||
|
||||
import { waitForFleetSetup } from './helpers';
|
||||
|
||||
const logFilePath = path.join(__dirname, 'logs.log');
|
||||
|
||||
describe('fleet usage telemetry', () => {
|
||||
let core: any;
|
||||
let esServer: kbnTestServer.TestElasticsearchUtils;
|
||||
let kbnServer: kbnTestServer.TestKibanaUtils;
|
||||
const registryUrl = 'http://localhost';
|
||||
|
||||
const startServers = async () => {
|
||||
const { startES } = kbnTestServer.createTestServers({
|
||||
adjustTimeout: (t) => jest.setTimeout(t),
|
||||
settings: {
|
||||
es: {
|
||||
license: 'trial',
|
||||
},
|
||||
kbn: {},
|
||||
},
|
||||
});
|
||||
|
||||
esServer = await startES();
|
||||
const startKibana = async () => {
|
||||
const root = kbnTestServer.createRootWithCorePlugins(
|
||||
{
|
||||
xpack: {
|
||||
fleet: {
|
||||
registryUrl,
|
||||
agentPolicies: [
|
||||
{
|
||||
name: 'Second preconfigured policy',
|
||||
description: 'second policy',
|
||||
is_default: false,
|
||||
is_managed: true,
|
||||
id: 'test-456789',
|
||||
namespace: 'default',
|
||||
monitoring_enabled: [],
|
||||
package_policies: [],
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
logging: {
|
||||
appenders: {
|
||||
file: {
|
||||
type: 'file',
|
||||
fileName: logFilePath,
|
||||
layout: {
|
||||
type: 'json',
|
||||
},
|
||||
},
|
||||
},
|
||||
loggers: [
|
||||
{
|
||||
name: 'root',
|
||||
appenders: ['file'],
|
||||
},
|
||||
{
|
||||
name: 'plugins.fleet',
|
||||
level: 'info',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
{ oss: false }
|
||||
);
|
||||
|
||||
await root.preboot();
|
||||
const coreSetup = await root.setup();
|
||||
const coreStart = await root.start();
|
||||
|
||||
return {
|
||||
root,
|
||||
coreSetup,
|
||||
coreStart,
|
||||
stop: async () => await root.shutdown(),
|
||||
};
|
||||
};
|
||||
kbnServer = await startKibana();
|
||||
await waitForFleetSetup(kbnServer.root);
|
||||
};
|
||||
|
||||
const stopServers = async () => {
|
||||
if (kbnServer) {
|
||||
await kbnServer.stop();
|
||||
}
|
||||
|
||||
if (esServer) {
|
||||
await esServer.stop();
|
||||
}
|
||||
|
||||
await new Promise((res) => setTimeout(res, 10000));
|
||||
};
|
||||
|
||||
beforeAll(async () => {
|
||||
await startServers();
|
||||
|
||||
const esClient = kbnServer.coreStart.elasticsearch.client.asInternalUser;
|
||||
await esClient.bulk({
|
||||
index: '.fleet-agents',
|
||||
body: [
|
||||
{
|
||||
create: {
|
||||
_id: 'agent1',
|
||||
},
|
||||
},
|
||||
{
|
||||
agent: {
|
||||
version: '8.6.0',
|
||||
},
|
||||
last_checkin_status: 'error',
|
||||
last_checkin: '2022-11-21T12:26:24Z',
|
||||
active: true,
|
||||
policy_id: 'policy1',
|
||||
},
|
||||
{
|
||||
create: {
|
||||
_id: 'agent2',
|
||||
},
|
||||
},
|
||||
{
|
||||
agent: {
|
||||
version: '8.5.1',
|
||||
},
|
||||
last_checkin_status: 'degraded',
|
||||
last_checkin: '2022-11-21T12:27:24Z',
|
||||
active: true,
|
||||
policy_id: 'policy1',
|
||||
},
|
||||
{
|
||||
create: {
|
||||
_id: 'inactive',
|
||||
},
|
||||
},
|
||||
{
|
||||
agent: {
|
||||
version: '8.5.1',
|
||||
},
|
||||
last_checkin_status: 'online',
|
||||
last_checkin: '2021-11-21T12:27:24Z',
|
||||
active: false,
|
||||
policy_id: 'policy1',
|
||||
},
|
||||
],
|
||||
refresh: 'wait_for',
|
||||
});
|
||||
|
||||
await esClient.create({
|
||||
index: '.fleet-policies',
|
||||
id: 'policy1',
|
||||
body: {
|
||||
data: {
|
||||
id: 'fleet-server-policy',
|
||||
outputs: {
|
||||
default: {
|
||||
type: 'elasticsearch',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
refresh: 'wait_for',
|
||||
});
|
||||
|
||||
const soClient = kbnServer.coreStart.savedObjects.createInternalRepository();
|
||||
await soClient.create('ingest-package-policies', {
|
||||
name: 'fleet_server-1',
|
||||
namespace: 'default',
|
||||
package: {
|
||||
name: 'fleet_server',
|
||||
title: 'Fleet Server',
|
||||
version: '1.2.0',
|
||||
},
|
||||
enabled: true,
|
||||
policy_id: 'fleet-server-policy',
|
||||
inputs: [
|
||||
{
|
||||
compiled_input: {
|
||||
server: {
|
||||
port: 8220,
|
||||
host: '0.0.0.0',
|
||||
'limits.max_agents': 3000,
|
||||
other: 'other',
|
||||
},
|
||||
'server.runtime': 'gc_percent:20',
|
||||
ssl: 'ssl',
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await stopServers();
|
||||
});
|
||||
|
||||
beforeEach(() => {
|
||||
core = { getStartServices: jest.fn().mockResolvedValue([kbnServer.coreStart]) };
|
||||
});
|
||||
|
||||
it('should fetch usage telemetry', async () => {
|
||||
const usage = await fetchFleetUsage(core, { agents: { enabled: true } }, new AbortController());
|
||||
|
||||
expect(usage).toEqual(
|
||||
expect.objectContaining({
|
||||
agents_enabled: true,
|
||||
agents: {
|
||||
total_enrolled: 2,
|
||||
healthy: 0,
|
||||
unhealthy: 0,
|
||||
offline: 2,
|
||||
total_all_statuses: 3,
|
||||
updating: 0,
|
||||
},
|
||||
fleet_server: {
|
||||
total_all_statuses: 0,
|
||||
total_enrolled: 0,
|
||||
healthy: 0,
|
||||
unhealthy: 0,
|
||||
offline: 0,
|
||||
updating: 0,
|
||||
num_host_urls: 0,
|
||||
},
|
||||
packages: [],
|
||||
agent_versions: ['8.5.1', '8.6.0'],
|
||||
agent_checkin_status: { error: 1, degraded: 1 },
|
||||
agents_per_policy: [2],
|
||||
fleet_server_config: {
|
||||
policies: [
|
||||
{
|
||||
input_config: {
|
||||
server: {
|
||||
'limits.max_agents': 3000,
|
||||
},
|
||||
'server.runtime': 'gc_percent:20',
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
agent_policies: { count: 3, output_types: ['elasticsearch'] },
|
||||
})
|
||||
);
|
||||
});
|
||||
});
|
|
@ -101,7 +101,11 @@ import {
|
|||
AgentServiceImpl,
|
||||
PackageServiceImpl,
|
||||
} from './services';
|
||||
import { registerFleetUsageCollector, fetchUsage, fetchAgentsUsage } from './collectors/register';
|
||||
import {
|
||||
registerFleetUsageCollector,
|
||||
fetchAgentsUsage,
|
||||
fetchFleetUsage,
|
||||
} from './collectors/register';
|
||||
import { getAuthzFromRequest, makeRouterWithFleetAuthz } from './routes/security';
|
||||
import { FleetArtifactsClient } from './services/artifacts';
|
||||
import type { FleetRouter } from './types/request_context';
|
||||
|
@ -383,14 +387,9 @@ export class FleetPlugin
|
|||
|
||||
// Register usage collection
|
||||
registerFleetUsageCollector(core, config, deps.usageCollection);
|
||||
const fetch = async () => fetchUsage(core, config);
|
||||
this.fleetUsageSender = new FleetUsageSender(
|
||||
deps.taskManager,
|
||||
core,
|
||||
fetch,
|
||||
this.kibanaVersion,
|
||||
this.isProductionMode
|
||||
);
|
||||
const fetch = async (abortController: AbortController) =>
|
||||
await fetchFleetUsage(core, config, abortController);
|
||||
this.fleetUsageSender = new FleetUsageSender(deps.taskManager, core, fetch);
|
||||
registerFleetUsageLogger(deps.taskManager, async () => fetchAgentsUsage(core, config));
|
||||
|
||||
const router: FleetRouter = core.http.createRouter<FleetRequestHandlerContext>();
|
||||
|
|
|
@ -1,187 +0,0 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0; you may not use this file except in compliance with the Elastic License
|
||||
* 2.0.
|
||||
*/
|
||||
import type {
|
||||
ConcreteTaskInstance,
|
||||
TaskManagerStartContract,
|
||||
TaskManagerSetupContract,
|
||||
} from '@kbn/task-manager-plugin/server';
|
||||
import type { CoreSetup } from '@kbn/core/server';
|
||||
|
||||
import type { Usage } from '../collectors/register';
|
||||
|
||||
import { appContextService } from './app_context';
|
||||
|
||||
const EVENT_TYPE = 'fleet_usage';
|
||||
|
||||
export class FleetUsageSender {
|
||||
private taskManager?: TaskManagerStartContract;
|
||||
private taskId = 'Fleet-Usage-Sender-Task';
|
||||
private taskType = 'Fleet-Usage-Sender';
|
||||
|
||||
constructor(
|
||||
taskManager: TaskManagerSetupContract,
|
||||
core: CoreSetup,
|
||||
fetchUsage: () => Promise<Usage>,
|
||||
kibanaVersion: string,
|
||||
isProductionMode: boolean
|
||||
) {
|
||||
taskManager.registerTaskDefinitions({
|
||||
[this.taskType]: {
|
||||
title: 'Fleet Usage Sender',
|
||||
timeout: '1m',
|
||||
maxAttempts: 1,
|
||||
createTaskRunner: ({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {
|
||||
return {
|
||||
async run() {
|
||||
appContextService.getLogger().info('Running Fleet Usage telemetry send task');
|
||||
|
||||
try {
|
||||
const usageData = await fetchUsage();
|
||||
appContextService.getLogger().debug(JSON.stringify(usageData));
|
||||
core.analytics.reportEvent(EVENT_TYPE, usageData);
|
||||
} catch (error) {
|
||||
appContextService
|
||||
.getLogger()
|
||||
.error('Error occurred while sending Fleet Usage telemetry: ' + error);
|
||||
}
|
||||
},
|
||||
|
||||
async cancel() {},
|
||||
};
|
||||
},
|
||||
},
|
||||
});
|
||||
this.registerTelemetryEventType(core);
|
||||
}
|
||||
|
||||
public async start(taskManager: TaskManagerStartContract) {
|
||||
this.taskManager = taskManager;
|
||||
|
||||
appContextService.getLogger().info(`Task ${this.taskId} scheduled with interval 1h`);
|
||||
await this.taskManager?.ensureScheduled({
|
||||
id: this.taskId,
|
||||
taskType: this.taskType,
|
||||
schedule: {
|
||||
interval: '1h',
|
||||
},
|
||||
scope: ['fleet'],
|
||||
state: {},
|
||||
params: {},
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* took schema from [here](https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/collectors/register.ts#L53) and adapted to EBT format
|
||||
*/
|
||||
private registerTelemetryEventType(core: CoreSetup): void {
|
||||
core.analytics.registerEventType({
|
||||
eventType: EVENT_TYPE,
|
||||
schema: {
|
||||
agents_enabled: { type: 'boolean', _meta: { description: 'agents enabled' } },
|
||||
agents: {
|
||||
properties: {
|
||||
total_enrolled: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents, in any state',
|
||||
},
|
||||
},
|
||||
healthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in a healthy state',
|
||||
},
|
||||
},
|
||||
unhealthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in an unhealthy state',
|
||||
},
|
||||
},
|
||||
updating: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in an updating state',
|
||||
},
|
||||
},
|
||||
offline: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents currently offline',
|
||||
},
|
||||
},
|
||||
total_all_statuses: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of agents in any state, both enrolled and inactive',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
fleet_server: {
|
||||
properties: {
|
||||
total_enrolled: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents, in any state',
|
||||
},
|
||||
},
|
||||
total_all_statuses: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description:
|
||||
'The total number of Fleet Server agents in any state, both enrolled and inactive.',
|
||||
},
|
||||
},
|
||||
healthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents in a healthy state.',
|
||||
},
|
||||
},
|
||||
unhealthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description:
|
||||
'The total number of enrolled Fleet Server agents in an unhealthy state',
|
||||
},
|
||||
},
|
||||
updating: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description:
|
||||
'The total number of enrolled Fleet Server agents in an updating state',
|
||||
},
|
||||
},
|
||||
offline: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents currently offline',
|
||||
},
|
||||
},
|
||||
num_host_urls: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The number of Fleet Server hosts configured in Fleet settings.',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
packages: {
|
||||
type: 'array',
|
||||
items: {
|
||||
properties: {
|
||||
name: { type: 'keyword' },
|
||||
version: { type: 'keyword' },
|
||||
enabled: { type: 'boolean' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
}
|
||||
}
|
|
@ -63,4 +63,4 @@ export type { PackageService, PackageClient } from './epm';
|
|||
// Fleet server policy config
|
||||
export { migrateSettingsToFleetServerHost } from './fleet_server_host';
|
||||
|
||||
export { FleetUsageSender } from './fleet_usage_sender';
|
||||
export { FleetUsageSender } from './telemetry/fleet_usage_sender';
|
||||
|
|
|
@ -0,0 +1,132 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0; you may not use this file except in compliance with the Elastic License
|
||||
* 2.0.
|
||||
*/
|
||||
import type {
|
||||
ConcreteTaskInstance,
|
||||
TaskManagerStartContract,
|
||||
TaskManagerSetupContract,
|
||||
} from '@kbn/task-manager-plugin/server';
|
||||
import { throwUnrecoverableError } from '@kbn/task-manager-plugin/server';
|
||||
import type { CoreSetup } from '@kbn/core/server';
|
||||
import { withSpan } from '@kbn/apm-utils';
|
||||
|
||||
import type { Usage } from '../../collectors/register';
|
||||
|
||||
import { appContextService } from '../app_context';
|
||||
|
||||
import { fleetUsagesSchema } from './fleet_usages_schema';
|
||||
|
||||
const EVENT_TYPE = 'fleet_usage';
|
||||
|
||||
export class FleetUsageSender {
|
||||
private taskManager?: TaskManagerStartContract;
|
||||
private taskVersion = '1.0.0';
|
||||
private taskType = 'Fleet-Usage-Sender';
|
||||
private wasStarted: boolean = false;
|
||||
private interval = '1h';
|
||||
private timeout = '1m';
|
||||
private abortController = new AbortController();
|
||||
|
||||
constructor(
|
||||
taskManager: TaskManagerSetupContract,
|
||||
core: CoreSetup,
|
||||
fetchUsage: (abortController: AbortController) => Promise<Usage | undefined>
|
||||
) {
|
||||
taskManager.registerTaskDefinitions({
|
||||
[this.taskType]: {
|
||||
title: 'Fleet Usage Sender',
|
||||
timeout: this.timeout,
|
||||
maxAttempts: 1,
|
||||
createTaskRunner: ({ taskInstance }: { taskInstance: ConcreteTaskInstance }) => {
|
||||
return {
|
||||
run: async () => {
|
||||
return withSpan({ name: this.taskType, type: 'telemetry' }, () =>
|
||||
this.runTask(taskInstance, core, () => fetchUsage(this.abortController))
|
||||
);
|
||||
},
|
||||
|
||||
cancel: async () => {
|
||||
this.abortController.abort('task timed out');
|
||||
},
|
||||
};
|
||||
},
|
||||
},
|
||||
});
|
||||
this.registerTelemetryEventType(core);
|
||||
}
|
||||
|
||||
private runTask = async (
|
||||
taskInstance: ConcreteTaskInstance,
|
||||
core: CoreSetup,
|
||||
fetchUsage: () => Promise<Usage | undefined>
|
||||
) => {
|
||||
if (!this.wasStarted) {
|
||||
appContextService.getLogger().debug('[runTask()] Aborted. Task not started yet');
|
||||
return;
|
||||
}
|
||||
// Check that this task is current
|
||||
if (taskInstance.id !== this.taskId) {
|
||||
throwUnrecoverableError(new Error('Outdated task version for task: ' + taskInstance.id));
|
||||
return;
|
||||
}
|
||||
appContextService.getLogger().info('Running Fleet Usage telemetry send task');
|
||||
|
||||
try {
|
||||
const usageData = await fetchUsage();
|
||||
if (!usageData) {
|
||||
return;
|
||||
}
|
||||
appContextService.getLogger().debug(JSON.stringify(usageData));
|
||||
core.analytics.reportEvent(EVENT_TYPE, usageData);
|
||||
} catch (error) {
|
||||
appContextService
|
||||
.getLogger()
|
||||
.error('Error occurred while sending Fleet Usage telemetry: ' + error);
|
||||
}
|
||||
};
|
||||
|
||||
private get taskId() {
|
||||
return `${this.taskType}-${this.taskVersion}`;
|
||||
}
|
||||
|
||||
public async start(taskManager: TaskManagerStartContract) {
|
||||
this.taskManager = taskManager;
|
||||
|
||||
if (!taskManager) {
|
||||
appContextService.getLogger().error('missing required service during start');
|
||||
return;
|
||||
}
|
||||
|
||||
this.wasStarted = true;
|
||||
|
||||
try {
|
||||
appContextService.getLogger().info(`Task ${this.taskId} scheduled with interval 1h`);
|
||||
|
||||
await this.taskManager.ensureScheduled({
|
||||
id: this.taskId,
|
||||
taskType: this.taskType,
|
||||
schedule: {
|
||||
interval: this.interval,
|
||||
},
|
||||
scope: ['fleet'],
|
||||
state: {},
|
||||
params: {},
|
||||
});
|
||||
} catch (e) {
|
||||
appContextService.getLogger().error(`Error scheduling task, received error: ${e}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* took schema from [here](https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/collectors/register.ts#L53) and adapted to EBT format
|
||||
*/
|
||||
private registerTelemetryEventType(core: CoreSetup): void {
|
||||
core.analytics.registerEventType({
|
||||
eventType: EVENT_TYPE,
|
||||
schema: fleetUsagesSchema,
|
||||
});
|
||||
}
|
||||
}
|
|
@ -0,0 +1,168 @@
|
|||
/*
|
||||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
|
||||
* or more contributor license agreements. Licensed under the Elastic License
|
||||
* 2.0; you may not use this file except in compliance with the Elastic License
|
||||
* 2.0.
|
||||
*/
|
||||
|
||||
import type { RootSchema } from '@kbn/analytics-client';
|
||||
|
||||
export const fleetUsagesSchema: RootSchema<any> = {
|
||||
agents_enabled: { type: 'boolean', _meta: { description: 'agents enabled' } },
|
||||
agents: {
|
||||
properties: {
|
||||
total_enrolled: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents, in any state',
|
||||
},
|
||||
},
|
||||
healthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in a healthy state',
|
||||
},
|
||||
},
|
||||
unhealthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in an unhealthy state',
|
||||
},
|
||||
},
|
||||
updating: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents in an updating state',
|
||||
},
|
||||
},
|
||||
offline: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled agents currently offline',
|
||||
},
|
||||
},
|
||||
total_all_statuses: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of agents in any state, both enrolled and inactive',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
fleet_server: {
|
||||
properties: {
|
||||
total_enrolled: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents, in any state',
|
||||
},
|
||||
},
|
||||
total_all_statuses: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description:
|
||||
'The total number of Fleet Server agents in any state, both enrolled and inactive.',
|
||||
},
|
||||
},
|
||||
healthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents in a healthy state.',
|
||||
},
|
||||
},
|
||||
unhealthy: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents in an unhealthy state',
|
||||
},
|
||||
},
|
||||
updating: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents in an updating state',
|
||||
},
|
||||
},
|
||||
offline: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The total number of enrolled Fleet Server agents currently offline',
|
||||
},
|
||||
},
|
||||
num_host_urls: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'The number of Fleet Server hosts configured in Fleet settings.',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
packages: {
|
||||
type: 'array',
|
||||
items: {
|
||||
properties: {
|
||||
name: { type: 'keyword' },
|
||||
version: { type: 'keyword' },
|
||||
enabled: { type: 'boolean' },
|
||||
},
|
||||
},
|
||||
},
|
||||
agent_versions: {
|
||||
type: 'array',
|
||||
items: {
|
||||
type: 'keyword',
|
||||
_meta: { description: 'The agent versions enrolled in this deployment.' },
|
||||
},
|
||||
},
|
||||
agents_per_policy: {
|
||||
type: 'array',
|
||||
items: {
|
||||
type: 'long',
|
||||
_meta: { description: 'Agent counts enrolled per agent policy.' },
|
||||
},
|
||||
},
|
||||
fleet_server_config: {
|
||||
properties: {
|
||||
policies: {
|
||||
type: 'array',
|
||||
items: {
|
||||
properties: {
|
||||
input_config: { type: 'pass_through' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
agent_policies: {
|
||||
properties: {
|
||||
count: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'Number of agent policies',
|
||||
},
|
||||
},
|
||||
output_types: {
|
||||
type: 'array',
|
||||
items: {
|
||||
type: 'keyword',
|
||||
_meta: { description: 'Output types of agent policies' },
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
agent_checkin_status: {
|
||||
properties: {
|
||||
error: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'Count of agent last checkin status error',
|
||||
},
|
||||
},
|
||||
degraded: {
|
||||
type: 'long',
|
||||
_meta: {
|
||||
description: 'Count of agent last checkin status degraded',
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
};
|
Loading…
Add table
Add a link
Reference in a new issue