The entire system generates detailed telemetry for all the features that can be consumed across Volterra services. This telemetry provides observability of infrastructure, applications, connectivity, and security services across a distributed environment and allows netops, devops, and application teams to troubleshoot and optimize their applications without additional burden on application developers. There are four types of telemetry data that is collected from the distributed system - metrics, logs, alerts, and events. Some of these logs, metrics, and events that are also used for post-processing to determine anomalies, analyze application APIs, security issues, create graph visualizations, etc.
This telemetry data provides different outcomes to different types of users:
Volterra SRE- our site reliability engineers goal is to ensure that customer services and our global infrastructure are operational and meeting the service level objectives
Customer Operations - based on RBAC and policy configured, a significant amount of data can be consumed by the central operations teams for observability of their infrastructure, network, applications, and end-users of their applications. There is a rich amount of visibility available on the Volterra Console for instant visualization of this data as well as APIs available that can be used to integrate with other tools
Customer Application Teams - depending on the RBAC and policy configured, the application team will be able to get observability of application and network services that relate to their specific applications
If you’re interested in further details of how the features described in this guide work. You can find out more about Volterra’s observability architecture in Concepts section.
There is a complex and distributed system to collect logs, metrics, alerts, and traces from our global infrastructure as well as each of the Volterra Nodes deployed across users cloud and edge locations.
From a user point of view, there are two methods to get observability into their applications and services deployed across multi-cloud, network, and edge sites - use the Volterra SaaS portal for centralized dashboards or use Volterra APIs to integrate with 3rd party tools. There are the four different types of telemetry and observability data that is collected from distributed sources and aggregated by the system:
Metrics - There are many time-series metrics for the Infrastructure (cpu, memory, disk, interfaces, connectivity, and latency), Applications, and Application Services (deployment status, application health, request rate, errors, duration, latency, and throughput) that are collected by the system.
Logs - There are three types of logs that are aggregated across the system - system logs, application logs, and access logs (request and response). The applications logs are currently not automatically stored by the system and the user needs to decide how to handle its storage.
Alerts - Alerts can be related to user services (eg. application restart, site connectivity lost, out of memory, etc) or infrastructure services (volterra service restarted, connectivity errors, etc). All of these alerts are available in the dashboard and using the APIs can be integrated to external system like Pagerduty. Some of the alerts relating to infrastructure services are handled and mitigated automatically by Volterra SRE team and does not require customer to worry about them.
Many of these logs and metrics are used for post-processing to determine anomalies, analyze application APIs, security issues, create graph visualizations, etc. For example, these metrics are also used to generate a health-score for sites as well as applications, determined based on statistical analysis of the metrics.
Metrics, Logs, Alerts, and Events are automatically stored by Volterra for each tenant and is available for default of 14-days of retention. If the user needs additional retention period, there is the capability to extend this time-period. Audit logs are retained for 6-months as there may be regulatory and compliance needs for longer retention.
The above observability data is available to the user through two mechanisms:
Volterra Console - Using a web-browser and credentials, the user can access various dashboards and graphs relating their infrastructure and applications.
The following concepts are used for Volterra’s observability features. Click on each one to learn more:
The following How-to guides are examples of using our Observability features: