πŸ’»Mastering Kubernetes Troubleshooting: 30 Essential Commands for Day-to-Day Operations πŸš€

Β·

4 min read

πŸ’»Mastering Kubernetes Troubleshooting: 30 Essential Commands for Day-to-Day Operations πŸš€

Kubernetes, the powerful container orchestration platform, is the backbone of modern cloud-native applications. However, its complexity can be overwhelming when troubleshooting issues. Whether you're dealing with pod failures, network issues, or scaling challenges, knowing the right commands is crucial to keep your cluster running smoothly.

This article highlights 30 essential Kubernetes troubleshooting commands every DevOps engineer and administrator should master to streamline daily operations.


1. Cluster Information and Health Checks

Start by ensuring the health and status of your Kubernetes cluster.

  1. Get cluster information

     kubectl cluster-info
    

    Displays essential details about your cluster, including the API server and endpoints.

  2. Check cluster nodes

     kubectl get nodes
    

    Lists all nodes in the cluster and their statuses.

  3. Detailed node status

     kubectl describe node <node-name>
    

    Provides detailed information about a specific node, including resource allocation and issues.

  4. Check cluster events

     kubectl get events --sort-by='.metadata.creationTimestamp'
    

    View the latest cluster-wide events for identifying potential problems.


2. Pod Management and Debugging

  1. List all pods

     kubectl get pods -A
    

    Displays pods across all namespaces to identify issues at a glance.

  2. Detailed pod information

     kubectl describe pod <pod-name>
    

    Offers a detailed breakdown of a pod’s configuration and current state.

  3. View pod logs

     kubectl logs <pod-name>
    

    Access logs to debug application-level issues.

  4. Stream pod logs

     kubectl logs -f <pod-name>
    

    Continuously stream logs for real-time monitoring.

  5. Execute commands in a pod

     kubectl exec -it <pod-name> -- /bin/bash
    

    Access the shell of a running container for troubleshooting.

  6. Check pod resource usage

kubectl top pod <pod-name>

Displays CPU and memory usage for a pod.


3. Network and Connectivity

  1. Inspect services
kubectl get svc

Lists all services to ensure proper exposure and routing of applications.

  1. Debug service issues
kubectl describe svc <service-name>

Provides details about a specific service, including associated endpoints.

  1. Test DNS resolution
kubectl exec -it <pod-name> -- nslookup <service-name>

Check if DNS resolution is working within the cluster.

  1. Inspect network policies
kubectl get networkpolicy

Ensures proper access control between pods and services.

  1. Trace service endpoints
kubectl get endpoints <service-name>

Validates the service is mapping to the correct pods.


4. Deployment and Replica Management

  1. List deployments
kubectl get deployments

Ensures deployments are running as expected.

  1. Inspect deployment status
kubectl describe deployment <deployment-name>

Provides details about replica counts and issues during updates.

  1. Scale a deployment
kubectl scale deployment <deployment-name> --replicas=<number>

Adjust the number of replicas to handle workload demands.

  1. Rollback a deployment
kubectl rollout undo deployment <deployment-name>

Reverts to the previous stable deployment state.

  1. Check rollout status
kubectl rollout status deployment <deployment-name>

Ensures the deployment update is proceeding as expected.


5. Persistent Volumes and Storage

  1. List persistent volumes
kubectl get pv

View the status of persistent volumes in the cluster.

  1. Inspect persistent volume claims
kubectl get pvc

Verifies if pods are correctly bound to storage.

  1. Describe a persistent volume
kubectl describe pv <pv-name>

Provides details about storage capacity and access modes.


6. Resource Monitoring and Usage

  1. Check resource usage by node
kubectl top node

Displays CPU and memory usage across all nodes.

  1. Check pod resource limits
kubectl describe pod <pod-name> | grep -i "limits"

Validates resource limits defined for pods.

  1. Monitor resource quotas
kubectl get resourcequotas

Ensures namespaces are within resource allocation limits.


7. Configuration and Secrets

  1. List config maps
kubectl get configmaps

Lists all ConfigMaps in the current namespace.

  1. Inspect secrets
kubectl get secrets

Ensures secure information is properly configured and accessible.


8. Miscellaneous Utilities

  1. Dry-run a deployment
kubectl apply -f <file-name.yaml> --dry-run=client

Tests changes without applying them to the cluster.

  1. Delete a problematic resource
kubectl delete pod <pod-name>

Removes malfunctioning pods to free up resources.


Conclusion

Troubleshooting Kubernetes effectively involves understanding its architecture and mastering key commands. The 30 commands we've discussed cover many scenarios, helping you diagnose and fix issues efficiently.

As you get more comfortable with these commands, you'll notice a significant improvement in managing complex Kubernetes environments. Keep this list handy, practice often, and enhance your DevOps workflow!

What are your favorite Kubernetes troubleshooting commands? Share them in the comments! πŸš€

Β