Logging, Monitoring, and Debugging
There are a variety of ways in which you can monitor, log, and debug the execution of platform application services, tools, and APIs.
- Logging application services (Log forwarder and Elasticsearch)
- The monitoring application service and Grafana dashboards
- Kubernetes tools
- Event logs
- Cluster Support Logs
- API error information
For further troubleshooting assistance, visit Iguazio Support.
Logging Application Services
The platform has a default tenant-wide log-forwarder application service (
log-forwarder) for forwarding application-service logs.
The logs are forwarded to an instance of the Elasticsearch open-source search and analytics engine by using the open-source Filebeat log-shipper utility.
The log-forwarder service is disabled by default.
To enable it, on the
log-forwarder service; in the
Typically, the log-forwarder service should be configured to work with your own remote off-cluster instance of Elasticsearch.
- The default transfer protocol, which is used when the URL doesn’t begin with “
http://” or “
https://”, is HTTPS.
- The log-forwarder service doesn’t support off-cluster Elasticsearch user authentication, regardless of whether you use an HTTP or HTTPS URL.
- The default port, which is used when the URL doesn’t end with “
:<port number>”, is port 80 for HTTP and port 443 for HTTPS.
In cloud platform environments you can also configure the log-forwarder service to work with a tenant-wide Elasticsearch service that uses a pre-deployed on-cluster vanilla installation of Elasticsearch.
To do so, you first need to create such an Elasticsearch service:
After enabling the log-forwarder service, you can view the forwarded logs in the
The Monitoring Application Service and Grafana Dashboards
The platform has a default tenant-wide monitoring application service (
monitoring) for monitoring application services and gathering performance statistics and additional data.
This service is enabled by default.
All Grafana user services in the platform that are created or restarted after the monitoring service is enabled have a
Application Services Monitoring— displays information for all the managed application services. Nuclio Functions Monitoring - Overview— displays information for all the deployed serverless Nuclio functions. Nuclio Functions Monitoring - Per Function— displays information for a specific Nuclio function, as set in the dashboard filter.
When there’s an enabled shared Grafana user service in the platform, the name of the monitoring service on the
monitoring) links to the
This service is accessible from the platform dashboard to users with the IT Admin management policy via a
- The application-services and Nuclio-functions monitoring data that’s displayed in the platform dashboard and in the platform’s Grafana monitoring dashboards. Note that the predefined monitoring Grafana dashboards display information only when the monitoring service is enabled; the service is enabled by default.
- Auto scaling of Nuclio functions.
You can use the Kubernetes
get podscommand to display information about the cluster’s pods:
kubectl get pods
logscommand to view the logs for a specific pod; replace
PODwith the name of one of the pods returned by the
kubectl logs POD
top podcommand to view pod resource metrics and monitor resource consumption; replace
[POD]with the name of one of the pods returned by the
getcommand or remove it to display logs for all pods:
kubectl top pod [POD]
get podsand logscommands require the “Log Reader” service account or higher.
top podcommand requires the “Service Admin” service account.
For more information about the
Event Logtab displays system event logs.
Alertstab displays system alerts.
Audittab displays a subset of the system events for audit purposes — security events (such as a failed login) and user actions (such as creation and deletion of a container).
Cluster Support Logs
Users with the IT Admin management policy can collect and download support logs for the platform clusters from the dashboard. Log collection is triggered for a data cluster, but the logs are collected from both the data and application cluster nodes.
You can trigger collection of cluster support-logs from the dashboard in one of two ways; (note that you cannot run multiple collection jobs concurrently):
- On the
Clusterspage, open the action menu () for a data cluster in the clusters table ( Type= “Data”); then, select the Collect logsmenu option.
- On the
Clusterspage, select to display the Support Logstab for a specific data cluster — either by selecting the Support logsoption from the cluster’s action menu () or by selecting the data cluster and then selecting the Support Logstab; then, select Collect Logsfrom the action toolbar.
You can view the status of all collection jobs and download archive files of the collected logs from the data-cluster’s
API Error Information
The platform APIs return error codes and error and warning messages to help you debug problems with your application. See, for example, the Error Information documentation in the Data-Service Web-API General Structure reference documentation.