Being a system administrator is like being Han Solo: you must solve problems left and right, you must be on top of everything at all times, and most importantly: you must ensure that nothing crashes.
It’s not an easy job — but if you have the right tools (like the Millennium Falcon), the right people on your team (like Chewbacca), and the right policies and procedures (like The Force), then you can thrive. To help you enjoy success and happiness as a system administrator — and bring order to the chaos — here are 10 best practices to work and live by:
1. Be nice. Be likable. Be the best you that you can!
Now, I know you’re thinking: “Seriously? This is a best practice?”. Well, yes it is — and it’s an often overlooked one, too. I’m not saying that you should go around giving everyone fist bumps and saying “looks like someone has a case of the Mondays” (in fact, never, ever do that!). What I’m saying is that part of your job is being the “bridge” between human beings and technology/systems. You’re like an ambassador and a guide. If you embrace this role and responsibility, your career satisfaction — and your professional opportunities for growth — will surge.
Yes, I know that end users can be frustrating at times, and I also know that you’re a human being, not a machine. But if you do your best and stay professional at all times (or as much as possible), then you’ll reap the rewards very quickly.
2. Monitor Your Systems
Always, always, always monitor your systems! They’re like little kids. You need to keep an eye on them, or they’ll get into trouble. And don’t assume that “no news is good news.” When things are eerily quiet, unless you have data telling you that everything is okay, then there’s probably something bad happening in the background.
Ensure that you have across-the-board insight into your network environment, including memory usage, network traffic, CPU and capacity. Plus, analyze your databases and your users’ data needs to anticipate future capacity requirements. And of course: have a baseline of your system’s normal operating behavior, so that an alarm can go off when something is out of the ordinary. Having a proactive warning system is an essential part of your success — and your sanity!
3. Perform Disaster Recovery Planning
Are you ready for a disaster? You’d better be. And not just a major disaster that affects your whole datacenter, but also the kind that can impact a single system. If you don’t have a robust and practical emergency plan ready-to-go, then create one ASAP. Make sure it includes what needs to be done — and by whom, in case you’re away from the office when disaster strikes. You don’t want people panicking; you need them to be prepared.
Also keep in mind that the best disaster recovery plans are not designed for “if” something happens, but “when” something happens. Unfortunately, no amount of regular backups, redundancy or load balancing will prevent a disaster from happening (but of course, you should still do all of these).
4. Keep Your Users Informed
An informed user is a confident, happy and most importantly: a SAFE user. Always let users know when to expect upgrades or changes. Be clear and brief when communicating, but also (as advised in #1) be friendly. Bear in mind that some people are afraid of technology, and intimidated by the details. You want to build trusting relationships with your users, so that they’ll follow the rules, and tell you if they suspect something is wrong or unusual. If they’re afraid you’ll snap at them or make them feel stupid, they’ll keep quiet until something serious goes wrong.
5. Back Up Everything
If you choose to adopt just one of these best practices, then this should be it — because it could save your career more than once. Always — I repeat, always — have a solid backup policy that has been tested and proven to work, even if your servers are replicated. A replicated error is still an error! Expect the best…but prepare for the worst and always, I repeat always, test your restores!
6. Check Your Log Files
Regularly check your log files for any errors and warnings, so they can alert you to problems before they become a threat to your servers and everything they support. And don’t worry that going through all of your log files will take forever. There are great tools out there to help make the process much simpler and faster.
7. Implement Strong Security
We can never talk enough about the importance of security! Your security efforts should align with the data that you need to protect, and be supported by the least privilege principle, a role-based security system, monitoring critical services, and conducting vulnerability and penetration testing. Also, watch for any signs of a break-in, and be prepared with an emergency plan (as noted in #3), so that you know when and to whom you need to report any signs of system compromise.
8. Document Your Work
I’m not saying that you need to write a novel here. But complete documentation is vital to good systems operations and administration. Also, don’t hesitate to add comments to your scripts. More is better than less, since you might forget what your scripts are for if you don’t run them for a while, and the commands you use might not be obvious or self-evident to others. Always think, “if someone had to take over my work tomorrow, would they have enough information to be successful?” If the answer is yes, then you’re good to go. If the answer is no, then you have some writing to do!
9. Automate Routine & Complex Tasks
Time is money, and your time is more valuable than many others in your workplace (no offense to them, but it’s just the truth!). If there’s a task that you repeat often in your day-to-day routine, then you need automation to come to the rescue. Also consider automating complex tasks (even if you don’t do them often).
10. Beware the Late Friday Afternoon Task
It’s a familiar scenario: it’s Friday afternoon, and you’re 10 minutes away from leaving the office to start a well-deserved weekend. Out of nowhere, a colleague asks you to “look into a little problem.” This is almost ALWAYS a mistake! It’s not that you shouldn’t be helpful. It’s that in the system administrator world, many so-called “little problems” turn into really (really) big problems. And that means instead of leaving the office in 10 minutes, you might be stuck there for the next 3 hours…or more. You’re much better off scheduling the work for the following week, when you can give it the time and attention it deserves during normal working hours.
Bonus Best Practice: Have an Emergency Exit
Here’s a bonus best practice that many experienced system administrators swear by: always have an emergency exit in case things are not going the way you thought they would (which happens more often than not!). Borrow a strategy from the Hansel and Gretel playbook by leaving a trail of breadcrumbs, so that you can get back to your starting point. And be sure to apply all of your changes on a test system before doing so on your production systems, confirming that everything is, as Data would say, “fully functional” before going live.
Your Turn
So, there you go folks! I hope that you find these 10 (plus bonus) best practices helpful in your career. If you’re an experienced system administrator, please comment on the above, and also share your wise tips and insights for the community. We’re all in this together.
As always, please let us know your thoughts by using the comment feature of the blog. You can also visit our forums to get help and submit feature requests, you can find them here.