It’s 3:00 am and your phone sounds the alarm. You wake up, groggy, tired, and see that your Linux server is down.
Several errors are going off, and by the look of things, your Linux server might be on fire (not literally).
Not sure what to check when troubleshooting Linux systems?
Well, don’t waste countless hours troubleshooting when you can have the issue resolved in a timely manner. Your time is valuable, and the SLA is valuable.
Here is a guide of Linux commands to know when troubleshooting.
What Is Running?
- pstree showss running processes as a tree. Adding “-a” will display the command associated with the process.
- a lists the processes of all users on the system
- u provides the information in detail
- x lists processes with no controlling terminal – like daemons.
- List all ports with active connections
- List all LISTENING ports
- List all TCP ports
- List all UDP ports
- netstat -tupln will combine the above commands into one. The output is clean and provides a lot of information.
CPU and RAM
- List available memory in MB
- -g to list available memory in GB
- Display system uptime
- Note: A system uptime that is a few minutes or hours would indicate a recent reboot and a potential system problem.
- Displays information such as; tasks, memory, cpu and swap.
- q to quit
- Shift+O to sort
- Similar to the top command earlier, but it has more details and a cleaner, modern UI.
- List PCI devices
- Display Hardware and BIOS information
- Display and change ethernet card settings
iostat -x 2
Display extended disk I/O stats every 2 seconds.
vmstat 2 10
Display virtual memory statistics every 2 seconds up to 10 times.
mpstat 2 10
Display CPU statistics every 2 seconds up to 10 times.
For more details, you can invoke:
mpstat -P ALL 2 10
This will give you CPU usage per core, every 2 seconds, 10 times. If the CPU cores are not equal in value there is likely an issue where the application on the server is not properly multi-threaded and may require changes to the code.
I don’t think a lot of people know about “dstat.”
DStat is a tool for generating system stats, and can be viewed as a replacement for “vmstat, iostat, and ifstat.” Am example of the output can be found here:
Mount Points and Filesystems
- Mount a file system
- Unmount a filesystem
- Display File System Tables
vgs or vgdisplay
- Display info about volume groups
- Display physical volume information
lvs or lvdisplay
- Display logical volume information
“disk free,” but human readable. You’ll use it to display disk size.
lsof +D /path/to/dir
List opened files under the targeted directory. This is useful during troubleshooting. A relevant use case would be to see what is making a log file grow in size. By invoking “lsof +D” it will allow you to determine the process that is writing to the log file.
Kernel, Interrupts, and Network Usage
- Display all available kernel configuration parameters
- Display number of interrupts per IRQ
cat /proc/net/ip_conntrack # may take some time on busy servers
- Linux Netfilter system (iptables firewall) – Allows for advanced filtering for the state of a connection
- Display sockets summary
System Logs and Kernel Messages
- Display message buffer of the kernel
- Display system log messages
- Display messages related to authentication and authorization. Includes failed login attempts
# Display crons for all users (requires root) for user in $(cat /etc/passwd | cut -f1 -d:); do crontab -l -u $user; done
- Display user specific crons