Monitoring your Kubernetes cluster has never been easier!

9 min readMar 30, 2023

https://giphy.com/gifs/astronomy-self-involved

Navigate the Kubernetes cluster using New Relic’s explorer

The New Relic Kubernetes cluster explorer uses the data collected by the Kubernetes integration to show the status of your cluster, from the control plane to nodes and pods.

After installing our Kubernetes integration, you can start instrumenting the services that run in your cluster. To learn more about how to do this, please check our Monitor services running on Kubernetes page.

You can find out about the health of each entity, explore logs, and see how your apps are performing. With the Events integration, everything that happens in your cluster becomes visible, and logs brought in using the logs plugin are also available.

Understand how the health of your Kubernetes resources impacts your workloads, all in one place.

The cluster explorer represents your most relevant cluster data on a chart with the shape of a ship’s wheel — which is also Kubernetes’s logo.

Outer ring: Contains up to 24 nodes of your cluster, the most relevant based on the amount of alerts. Hover over each node to check resource consumption and the percentage of allocable pods used.

Inner rings: Contain the pods of each node. Pods with active alerts are shown in the third innermost ring, and pods that are pending or unable to run are in the center.

Hover the mouse over each node or pod to get a quick overview of its resource usage. You can click each node and pod to view its resource usage over time or to get more information about its health and active alerts. Colors are based on predefined alert conditions: Yellow pods have active warning alerts, while red pods have active critical alerts.

one.newrelic.com > Kubernetes: Click any pod to get more information about its status and health, and to dig deeper into application data and traces, logs, and events.

Click a node to see the following data:

Pod statistics
CPU, memory, and storage consumption against allocatable amounts
Amount of pods used by the node against the allocatable amount of pods

For each pod, depending on the integrations and features you’ve enabled, you can see:

Pod status and metadata, including namespace and deployment
Container status and statistics
Active alerts (both warning and critical)
Kubernetes events that happened in that pod
APM data and traces (if you’ve linked your APM data)
A link to the pods’ and containers’ logs, collected using the Kubernetes plugin for log management in New Relic

Cluster and control plane statistics are always visible on the left side.

Cluster explorer node table

Below the cluster explorer is the node table, which shows all the nodes of the cluster, namespace, or deployment. Like all other usage indicators, the table shows consumption against allocatable resources.

Search and filter your cluster data

The main way to modify the data view in the cluster explorer is by using the top bar to search for specific attributes or values. All the attributes and values collected by the Kubernetes integration can be combined to narrow down the cluster view.

one.newrelic.com > Kubernetes cluster explorer: All your Kubernetes cluster’s attributes and data points can be used to filter the cluster explorer view.

You can also change the time frame using the time picker in the upper right corner. The Auto-refresh box turns the cluster explorer into a real-time dashboard that refreshes every 60 seconds.

one.newrelic.com > Kubernetes cluster explorer: The time picker lets you select several predefined time spans. To reload the data every minute, check the auto-refresh box.

Cluster Overview dashboard

The cluster Overview dashboard can be an essential tool for monitoring and managing your Kubernetes clusters, providing real-time visibility into the health and performance of the containerized applications running on the cluster, and enabling administrators to quickly identify and resolve any issues that may arise.

one.newrelic.com > Kubernetes > your_cluster > Overview: You can access the Kubernetes dashboard within the Kubernetes cluster explorer. It shows useful overview data about the status and performance of the cluster and its containerized workloads.

It’s designed to help answer questions such as:

Are there any pods that are pending or failed?
Which Daemonsets, Deployments, Statefulsets, HPAs, or other Kubernetes resources are unhealthy?
Are all nodes ready and able to host pods?
How many pods, containers, nodes, or other Kubernetes resources are in the cluster?
When did a spike in Kubernetes warning events occur?
When did one or more pods enter a pending state?
Are there any pods unable to be scheduled to a node?
Can my nodes host additional pods?

IMPORTANT

We recommend that you run nri-kubernetes version 3.2.0 or greater for the best experience.

In version 3.2.0, a containerRestartDelta metric was introduced which is used on the Container Restarts widget.
In version 2.7.0, Node status metrics were introduced which are used in the Node Status Conditions widget.

How Kubernetes changes your monitoring strategy

If you ever meet someone who tells you, “Kubernetes is easy to understand,” most would agree they are lying to you!

Kubernetes requires a new approach to monitoring, especially when you are migrating away from traditional hosts like VMs or on-prem servers.

Containers can live for only a few minutes at a time since they get deployed and re-deployed adjusting to usage demand. How can you troubleshoot if they don’t exist anymore?

These containers are also spread out across several hosts on physical servers worldwide. It can be hard to connect a failing process to the affected application without the proper context for the metrics you are collecting.

To monitor a large number of short-lived containers, Kubernetes has built-in tools and APIs that help you understand the performance of your applications. A monitoring strategy that takes advantage of Kubernetes will give you a bird’s eye view of your entire application’s performance, even if containers running your applications are continuously moving between hosts or being scaled up and down.

Increased monitoring responsibilities

To get full visibility into your stack, you need to monitor your infrastructure. Modern tech stacks have made the relationship between applications and their infrastructure a more complicated than in the past.

Traditional infrastructure

In a traditional infrastructure environment, you only have two things to monitor–your applications and the hosts (servers or VMs) running them.

The introduction of containers

In 2013, Docker introduced containerization to the world. Containers are used to package and run an application, along with its dependencies, in an isolated, predictable, and repeatable way. This adds a layer of abstraction between your infrastructure and your applications. Containers are similar to traditional hosts, in that they run workloads on behalf of the application.

Kubernetes

With Kubernetes, full visibility into your stack means collecting telemetry data on the containers that are constantly being automatically spun up and dying while also collecting telemetry data on Kubernetes itself. Gone are the days of checking a few lights on the server sitting in your garage!

There are four distinct components that need to be monitored in a Kubernetes environment each with their specificities and challenges:

Infrastructure (*worker nodes)
Containers
Applications
Kubernetes clusters (*control plane)

Correlating application metrics with infrastructure metrics with metadata

While making it easier to build scalable applications, Kubernetes has blurred the lines between application and infrastructure. If you are a developer, your primary focus is on the application and not the cluster’s performance, but the cluster’s underlying components can have a direct effect on how well your application performs. For example, a bug in a Kubernetes application might be caused by an issue with the physical infrastructure, but it could also result from a configuration mistake or coding problem.

When using Kubernetes, monitoring your application isn’t optional, it’s a necessity!

Most application performance monitoring (APM) language agents don’t care where an application is running. It could be running on an ancient Linux server in a forgotten rack or on the latest Amazon Elastic Compute Cloud (Amazon EC2) instance. Yet when monitoring applications managed by an orchestration layer, having context into infrastructure can be very useful for debugging or troubleshooting. For example, you can relate an application error trace to the container, pod, or host that it’s running on.

Configuring labels in Kubernetes

Kubernetes automates the creation and deletion of containers with varying lifespans. This entire process needs to be monitored. With so many moving pieces, a clear organization-wide labeling policy needs to be in place in order to match metrics to a corresponding application, pod, namespace, node, etc.

By attaching consistent labels across different objects, you can easily query your Kubernetes cluster for these objects. For example, suppose you get a call from your developers asking if the production environment is down. If the production pods have a “prod” label, you can run the following kubectl command to get all their logs.

kubectl get pods -l name=prod:

NAME&nbsp; READY STATUS&nbsp; RESTARTS&nbsp; AGErouter-worker-6db6999875-b8t8m &nbsp; 0/1 &nbsp; ErrImagePull &nbsp; 0 &nbsp; 1d4hrouter-worker-6db6999875-7fn7z &nbsp; 1/1 &nbsp; Running&nbsp; &nbsp; &nbsp; &nbsp; 0 &nbsp; 47s router-worker-6db6999875-8rl9b &nbsp; 1/1 &nbsp; Running&nbsp; &nbsp; &nbsp; &nbsp; 3 &nbsp; 10h router-worker-6db6999875-b8t8m &nbsp; 1/1 &nbsp; Running&nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; 11h&nbsp;

In this example, you might spot that one of the prod pods has an issue with pulling an image and providing that information to your developers who use the prod pod. If you didn’t have labels, you would have to manually grep the output of kubectl get pods.

Common labeling conventions

In the example above, you saw an instance in which pods are labeled “prod” to identify their use by environment. Every team operated differently but the following naming conventions can commonly be found regardless of the team you work on:

Labels by environment

You can create entities for the environment they belong to. For example:

env: production
env: qa
env: development
env: staging

Labels by team

Creating tags for team names can be helpful to understand which team, group, department, or region was responsible for a change that led to a performance issue.

### Team tags

team: backend
team: frontend0
team: db### Role tagsroles: architecture
roles: devops
roles: pm### Region tagsregion: emea
region: america
region: asia

Labels by Kubernetes recommended labels

Kubernetes provides a list of recommended labels that allow a baseline grouping of resource objects. The app.kubernetes.io prefix distinguishes between the labels recommended by Kubernetes and the custom labels that you may separately add using a company.com prefix. Some of the most popular recommended Kubernetes labels are listed below.

LabelsKeyDescriptionapp.kubernetes.io/nameName of application (such as redis)app.kubernetes.io/instanceUnique name for this specific instance of the application (such as redis-department-a)app.kubernetes.io/componentA descriptive identifier of what the component is for (such as login-cache)app.kubernetes.io/part-ofThe higher-level application using this resource (such as company-auth)

With all of your Kubernetes objects labeled, you can query your observability data to get a bird’s eye view of your infrastructure and applications. You can examine every layer in your stack by filtering your metrics. And, you can drill into more granular details to find the root cause of an issue.

Therefore, having a clear, standardized strategy for creating easy-to-understand labels and selectors should be an important part of your monitoring and alerting strategy for Kubernetes. Ultimately, health and performance metrics can only be aggregated by labels that you set.

Next steps

We also provide a pre-built dashboard and set of alerts for Kubernetes if you’re ready to try things out. To use it sign up for a New Relic account today. Your free account includes one user with full access to all of New Relic, 5 basic users who are able to view your reporting, and 100 GB of free data ingest per month.

If you are a nonprofit organization, looking to level up your Observability tools, New Relic offers free tools and full platform features along with 1TB of free monthly data injest and 5 user accounts to all qualifying global nonprofit customers! Just email us at hgruber(at)newrelic(dot)com to find out more!