The client I was working with had undergone a management shakeup over the previous year. The CIO left, replaced by someone who brought in several new managers. The result was a lot of IT and DevOps staff turnover. Many skilled staff who knew how everything worked at the company left amid the uncertainty. There were not enough senior people left to train all of the new hires. Without direction, new hires didn't always know what was important. A lot of things fell through the cracks, including data recovery.
The Compass from the Ridge and Valley Sculpture at the Arboretum at Penn State |
In the AWS environment, the RDS databases were automatically backed up and retained for 1 to 35 days, depending on the method used to create the RDS database. The company's backup procedures had the RDS snapshots copied to S3 buckets, where they were required to be retained for three months. But the task to copy the backups to the S3 buckets stopped functioning months before. After investigating they determined the task of copying the backups to the buckets was tied to an AWS IAM role that had been modified in error, removing the role's ability to write the data to the bucket. Configuration management issues like this occur, especially in dynamic environments.
For the on-premises environment the company used Veeam to backup systems to on-site storage devices for short-term storage. The backups were copied to AWS S3 buckets for longer retention. Unfortunately, a set of the Veeam backup jobs were unsuccessful each day. Alerts had been emailed to the IT team but they were ignored. The dedicated backup administrator had left amid the company changes. A team with many other pressing responsibilities was assigned to oversee the backups. It wasn't a high priority for them.
These issues were easily correctable with a few hours work but could have been very costly if staff at the company inadvertently deleted data or had been targeted by a ransomware gang. In both cases, the company would not have been able to restore critical business data. We discussed the Center for Internet Security Controls, specifically Control 11 - Data Recovery. We also discussed the basics of data backup and recovery to better protect company and customer data.
The Overview for CIS Control 11 is - Establish and maintain data recovery practices sufficient to restore in-scope enterprise assets to a pre-incident and trusted state.
Control 11 includes 5 safeguards. They are:
11.1 Establish and Maintain a Data Recovery Process
11.2 Perform Automated Backups
11.3 Protect Recovery Data
11.4 Establish and Maintain an Isolated Instance of Recovery Data
11.5 Test Data Recovery
Why is this control critical? Organizations need and use data to make decisions and provide services to customers. If data is not available or loses its integrity, the organization and its customers could be negatively impacted. The CIS Controls Document refers to an example of an attacker encrypting a company's data for ransom. In such a case, the company would need to have backup data prior to the point when it was encrypted by the attacker. Unfortunately, many organizations find their backup data retention is not long enough to protect them from this attack.
Other challenges that companies face regarding data recovery is that they have so much data, they may not have an effective method to restore it in a timely fashion. It might take them weeks to restore the data, which could result in lost business.
In my own work, it's common for companies to have lax practices around their data backup and recovery testing. The CIS Controls document recommends that on a quarterly basis or when new data sources or technologies are introduced, companies evaluate their backups and attempt to restore data in a test environment. It's necessary to verify that data can successfully be restored in a reasonable time period in case of a serious incident and that the systems and applications can be restored.
The news is replete with stories of ransomware gangs compromising companies and holding their data hostage. You can better protect your company from becoming the next target by implementing the CIS Critical Controls. Review your data backup strategy regularly to determine it is current. Test your data recovery practices to verify you can restore data in case of accidental deletion or a ransomware attack.
Next month I'll discuss Center for Internet Security Control 12 - Network Infrastructure Management.
No comments:
Post a Comment