“Remember that you are a Black Swan.” ― Nassim Nicholas Taleb
I spent much of 1999 preparing for the Y2K bug. When the clock struck midnight, nothing broke. The business felt it received very little return for our efforts.
In 2000 and 2001, I helped Empire Blue Cross with IT disaster planning. The 9/11 attack on the office at WTC 2 tested our plans and, in the future, I found disaster planning budgets suddenly plentiful.
Some might consider these experiences failures. They are wrong. Highly-visible threats get people out of their comfort zone and serve to motivate investment in preparation for a "worst-case" scenario.
The COVID19 outbreak has reached near-pandemic levels. Now is not the time for alarm. However, it is the perfect opportunity to take a moment and think about how ready your business is for catastrophe. The US CDC issued Interim Guidance for Businesses and Employers this month. Every business leader should be familiar with its contents. The WHO just told countries to prepare for a pandemic. Markets are reacting.
In this post, I bundle up practical Business Continuity advice for IT teams. Often, BC is considered only as a business-wide problem. In reality, IT has some unique BC considerations. These recommendations are generally useful in preparing for any kind of disaster, from a viral pandemic to hurricanes, earthquakes, civil unrest, regional fires, or a nearby release of hazardous materials.
IT Continuity
Infected workers must stay home, even if they are showing minimal symptoms. Others must be at home to take care of an infected child, or parent. Widespread school and daycare closures could keep a large swath of parents tied to the house. Air travel could become more painful than usual. Regional quarantines are being imposed in Italy, China, and South Korea.
The best thing you can do is to enable remote work.
Just like everything else in computers, this feature must be tested. Plan an office-wide remote workday in the coming weeks.
- Ensure each team member's home connection is stable enough for work. Note: in a pandemic situation, home internet services will likely be overwhelmed with whole neighborhoods working (or streaming) from home.
- Confirm videoconferencing and phone bridge licenses are sufficient.
- Validate your VPN/Citrix/remote access servers can support the entire team working at once.
Knowledge Sharing: A potential pandemic raises the chance critical staff will be out of the office or even completely unavailable. Access to services and systems is paramount: if you don't have one, deploy a shared password manager immediately. Everything in your business with a login should be accessible by at least two individuals. Caution: 1Password is my choice, but be sure that folks are storing business credentials in "shared", not "private" vaults.
Documentation and cross-training are also essential. Like the other measures in this post, they will yield compound returns on investment in the coming years. Even in the optimistic case that we see no major disasters.
Supply Chain: A direct supply chain from China is vital to manufacturers like Apple. COVID19 has already impacted Apple's financial forecasting, and we might not even see the iPhone 12 this Fall. Say you're a startup SaaS... is your supply chain important?
Take a moment to ponder if you can go 3+ months without ordering new laptops, phones, storage, or networking devices. In addition to manufacturer shortages due to closed factories, package deliveries see significant delays. Good luck on-boarding new hires!
If you have the capital to create a small hardware stockpile, do it. If not, this may be one of a few cases that debt makes sense. Lease a few extra laptops.
Dependencies: Can your team make forward progress without an internet connection? Git is supposed to make this easy, but heavy dependencies on NPM or Github could stop developers on their tracks. Standard vendoring practices for local development can keep the wheels of progress spinning.
Monitoring and alerting: Significant absenteeism guarantees your on-call rotation will blow up. An alert routing solution like PagerDuty will help tamp down the flames. By investing time now in prioritizing alerts, you can adjust alarms based on team availability.
Game Day
"Disruption to everyday life might be severe," says Nancy Messonnier, who leads the coronavirus response for the U.S. Centers for Disease Control and Prevention. "We are asking the American public to work with us to prepare for the expectation that this is going to be bad."
In an ideal world, IT teams would carve out time to perform tabletop Disaster Recovery (DR) and Business Continuity (BC) exercises quarterly. Sadly, these drills get overlooked far too often. We may have the utmost confidence in our load-balancers, our server availability, and our geographic redundancy. Engineers gravitate to technical problems – the human element gets pushed aside.
"A learning organization, disaster recovery testing, game days, and chaos engineering tools are all important components of a continuously resilient system." – Adrian Cockcroft, Failure Modes and Continuous Resilience
One software developer mantra is: "if it's not tested, it's not done." Documenting policies and processes is only the beginning. Everyone on the team must be familiar with them through hands-on experience. Google's approach to DiRT (Disaster Recovery Testing) is a spectacular model:
Situational Awareness
You do not want to be stuck in the office during an emergency. If an evacuation is necessary, your team wants to be with their family and community. We're all pretty aware of news as it happens, but you must be prepared to act decisively. Don't hesitate to get your team on the road before traffic makes the trip home impossible.
Generally, ensure you are receiving NOAA Severe Weather Alerts; I use the AccuWeather app for this. Subscribe to your local reverse-911 system. Specific to Coronavirus, checking the Johns Hopkins CSSE map and this update feed may give you an early heads-up for your region.
If a disaster happened, a NOAA Weather Radio and/or device with satellite weather alerts could be your only source of news. Throw one in the storage closet.
Work Environment
Person-to-person Hygiene: Individuals are a critical link in the health safety chain. Put up a poster or send a reminder email about personal health hygiene. Hopefully, basics like hand washing, not touching your face, and coughing in your elbow are already common knowledge. I'm also a big fan of switching to elbow bumps instead of handshakes (aka: "the Ebola handshake"). This could seem socially awkward to some, but doing it as a team can make it fun.
Office Hygiene: Give hand sanitizer and surface wipes a prominent place in your office. Smaller personal/team fridges are better than full-sized ones. Talk through your cleaning contract with both the cleaning team and your staff. Your team needs to know precisely which housekeeping tasks are their responsibility.
Travel and Events: The CDC has issued a Level 3 travel warning for China and South Korea: Avoid all nonessential travel to these regions. Iran, Italy, and Japan are at Level 2, in which older adults and those with chronic illness should stay home. Hong Kong is at a "watch" level. Latin America's first case was just detected.
This list is likely to grow until a pandemic is declared. At that point, containment efforts such as quarantine and travel restrictions will be mostly abandoned.
Conference planners should engage health safety experts for events in the coming months. As of now, the 2020 Olympics are on for Tokyo, but IOC members are raising concerns. There is no reason to cancel events in the US at this time, but every reason to think ahead and plan for the impacts of mandated social distancing or quarantine.
Emergency Supplies: The Red Cross recommends three days of emergency supplies stored at the office for short-term sheltering in place. This includes: nine meals worth of non-perishable food and three gallons of water per person; fans or indoor-safe propane heaters, as appropriate; toilet paper, soap, and paper towels; flashlights and batteries.
Medical: Ensure your office space has a well-stocked First Aid kit. You would typically go to the hospital for an in-office paper-cutter accident, but with a potentially overloaded health system, the wait could be many hours. You need to be able to treat "minor" injuries without visiting the ER. If your team doesn't have emergency medical experience, bring the Red Cross in for a First Aid training. Every office should stock an Automated External Defibrillator.
Personal Protective Equipment (PPE): Face masks and latex gloves are the norm in healthcare, and visible all over Asia right now. I doubt that IT teams should stock them in the office. Face masks are most effective for preventing the spread of a virus. They are much less useful for avoiding getting sick. If someone has symptoms, send them home immediately. Good interpersonal and office hygiene should be a higher priority.
Environmental Protection: In 2016, a toxic gas cloud threatened almost 20 million people near Los Angeles. If your office is within ten miles of a highway, railway, or factory, you should have what it takes to protect your team from this danger. Identify an interior conference or basement room, keep some plastic wrap and duct tape around, and purchase a carbon monoxide detector.
Power: Every desk should have an Uninterruptible Power Supply (UPS). Your equipment gets protected from surges and brownouts during "normal" times. During an emergency, you can recharge your phone a hundred times with the thing.
If your building is lucky enough to have a generator, awesome. UPSs are still required to protect electronics from the power spike your generator may produce when it starts up. Check with your facilities expert, so your team knows how long you can expect your fuel supply to last. Also, understand your fuel delivery vendor's SLA: they can only deliver so much fuel each day in crisis, and you pay for your place on the list.
Personal
Planning: Remind staff to make or update their emergency plan. Red Cross recommends a two week supply of food, water (1 gal. per person per day), and essentials at home. There is a lot of stress and emotion involved in considering disastrous situations. Some people actively "prep," while others have more pressing matters or concerns. Decisions in this space are an individual choice.
Communications: Getting in touch with family and friends is vital in an emergency. Cell phone network capacity could quickly become exhausted. Landlines are more likely to work in an emergency, but do you have one at home? I have a friend in Seattle who's worried about staying connected in case of a serious earthquake. He purchased inReach Satellite Communicators for each of his family members.
Prescriptions: A pharmacy is the last place you want to visit during a pandemic. Imagine being stuck for hours in a long line of sick people. Additionally, we may see pharma supply chain problems due to factory closures in China. Encourage staff to fill and keep on-hand an extra month's worth of prescriptions.
Emergency Car Supplies: Stock your car in case you get stuck in the car on a snowy night. It happens regularly here in Colorado. On the other side of the country, hurricane evacuations have left people stuck on the highway overnight. Toss a blanket, small shovel, First Aid kit, fire extinguisher, snacks, and a gallon of water in the trunk.
Looking Forward
There is so much we don't know about COVID19: transmission details, percentage of cases requiring hospitalization, morbidity... if containment is even possible. With luck, COVID19 will fizzle out much like MERS. In that happy case, your efforts on these IT Business Continuity controls will not go to waste.
Just like my Y2K preparations two decades ago, investing in this work now will pay off for the next decade. Rather than worrying about COVID19, keep your focus on how powerful it is for your business to be ready for whatever is around the corner.
"Antifragility is beyond resilience or robustness. The resilient resists shocks and stays the same; the antifragile gets better." – Nassim Nicholas Taleb, Antifragile: Things That Gain from Disorder
I'm focused where software, infrastructure, and data meets security, operations, and performance. Follow or DM me on twitter at @nedmcclain.