Synopses & Reviews
“If you’re a developer trying to figure out why your application is not responding at 3 am, you need this book! This is now my go-to book when diagnosing production issues. It has saved me hours in troubleshooting complicated operations problems.” –Trotter Cashion, cofounder, Mashion
DevOps can help developers, QAs, and admins work together to solve Linux server problems far more rapidly, significantly improving IT performance, availability, and efficiency. To gain these benefits, however, team members need common troubleshooting skills and practices.
In DevOps Troubleshooting: Linux Server Best Practices , award-winning Linux expert Kyle Rankin brings together all the standardized, repeatable techniques your team needs to stop finger-pointing, collaborate effectively, and quickly solve virtually any Linux server problem. Rankin walks you through using DevOps techniques to troubleshoot everything from boot failures and corrupt disks to lost email and downed websites. You’ll master indispensable skills for diagnosing high-load systems and network problems in production environments.
Rankin shows how to
- Master DevOps’ approach to troubleshooting and proven Linux server problem-solving principles
- Diagnose slow servers and applications by identifying CPU, RAM, and Disk I/O bottlenecks
- Understand healthy boots, so you can identify failure points and fix them
- Solve full or corrupt disk issues that prevent disk writes
- Track down the sources of network problems
- Troubleshoot DNS, email, and other network services
- Isolate and diagnose Apache and Nginx Web server failures and slowdowns
- Solve problems with MySQL and Postgres database servers and queries
- Identify hardware failures–even notoriously elusive intermittent failures
Synopsis
The DevOps approach to system administration describes a world where Linux developers and sysadmins work far more closely than in traditional environments. DevOps is optimized to support today's confluence of Linux trends, including cloud migrations, the rise of startups using hosted services, greater requirements to analyze big data, and the transition to NoSQL databases. One of DevOps' most powerful benefits is its support for better, faster troubleshooting. In this book, pioneering Linux sysadmin Kyle Rankin teaches DevOps' standardized, repeatable troubleshooting techniques, showing administrators and developers how to work together to reduce costs and improve effectiveness. Rankin helps both developers and admins fill the gaps in their respective troubleshooting skillsets, so they can stop pointing fingers - and start getting results. Writing clearly and simply, he walks through using DevOps techniques to identify and resolve these and other common problems in Linux environments: * Slow servers (including CPU, RAM, and disk I/O bottlenecks) * System boot failures * Full or corrupt disks * Down servers and websites * Failures in DNS server hostname resolution * Email delivery problems * Slow databases * Hardware faults
About the Author
Kyle Rankin, a senior systems administrator and DevOps engineer, is president of the North Bay Linux Users’ Group and is an award-winning columnist for
Linux Journal. Rankin speaks frequently on open source software at SCALE, OSCON, Linux World Expo, Penguicon, and many Linux user groups. His other books include
The Official Ubuntu Server Book;
Knoppix Hacks, Second Edition;
Knoppix Pocket Reference; and
Linux Multimedia Hacks. He is also coauthor of
Ubuntu Hacks.
Table of Contents
Preface xiii Acknowledgments xix
About the Author xxi
Chapter 1: Troubleshooting Best Practices 1
Divide the Problem Space 3
Practice Good Communication When Collaborating 4
Favor Quick, Simple Tests over Slow, Complex Tests 8
Favor Past Solutions 9
Document Your Problems and Solutions 10
Know What Changed 12
Understand How Systems Work 13
Use the Internet, but Carefully 14
Resist Rebooting 15
Chapter 2: Why Is the Server So Slow? Running Out of CPU, RAM, and Disk I/O 17
System Load 18
Diagnose Load Problems with top 20
Troubleshoot High Load after the Fact 29
Chapter 3: Why Won’t the System Boot? Solving Boot Problems 35
The Linux Boot Process 36
BIOS Boot Order 45
Fix GRUB 47
Disable Splash Screens 51
Can’t Mount the Root File System 51
Can’t Mount Secondary File Systems 55
Chapter 4: Why Can’t I Write to the Disk? Solving Full or Corrupt Disk Issues 57
When the Disk Is Full 58
Out of Inodes 61
The File System Is Read-Only 62
Repair Corrupted File Systems 63
Repair Software RAID 64
Chapter 5: Is the Server Down? Tracking Down the Source of Network Problems 67
Server A Can’t Talk to Server B 68
Troubleshoot Slow Networks 78
Packet Captures 83
Chapter 6: Why Won’t the Hostnames Resolve? Solving DNS Server Issues 93
DNS Client Troubleshooting 95
DNS Server Troubleshooting 98
Chapter 7: Why Didn’t My Email Go Through? Tracing Email Problems 119
Trace an Email Request 120
Understand Email Headers 123
Problems Sending Email 125
Problems Receiving Email 135
Chapter 8: Is the Website Down? Tracking Down Web Server Problems 141
Is the Server Running? 143
Test a Web Server from the Command Line 146
HTTP Status Codes 149
Parse Web Server Logs 154
Get Web Server Statistics 158
Solve Common Web Server Problems 163
Chapter 9: Why Is the Database Slow? Tracking Down Database Problems 171
Search Database Logs 172
Is the Database Running? 174
Get Database Metrics 177
Identify Slow Queries 182
Chapter 10: It’s the Hardware’s Fault! Diagnosing Common Hardware Problems 185
The Hard Drive Is Dying 186
Test RAM for Errors 190
Network Card Failures 191
The Server Is Too Hot 192
Power Supply Failures 194
Index 197