04 Nov

Object storage in the cloud: Is backup needed?

    The failure to back up data that is stored in a cloud block-storage service can be lost forever if not properly backed up. This article explains how object storage works very differently from block storage and how it offers better built-in protections.

    What is Object Storage?

    Each cloud vendor offers an object storage service, and they include Amazon’s Simple Storage Service (S3), Azure’s Blob Store, and Google’s Cloud Storage.

    Think of object storage systems like a file system with no hierarchical structure of directories and subdirectories. Where a file system uses a combination of a directory structure and file name to identify and locate a file, every object stored in an object storage system gets a unique identifier (UID) based on its content.

    The UID is then used as both as a way to identify an object, as well as retrieve it. The UID is created by running the content of the file through a cryptographic algorithm, such as SHA-1. (To get an idea how SHA-1 works, you can create your own SHA-1 hash here by inserting any random amount of text.) Any item, such as a file, block, a group of files or blocks, or a portion of a block or file, can be stored as an object.

    One huge difference between object and block storage is that every object stored in object storage is automatically replicated to at least three availability zones. This means that a natural or other disaster could take out two availability zones, and you would still have any data stored within the object storage system. It is typically only replicated within a single availability zone, so a single large outage can destroy the data.

    How the replication works is also very different. Object replication is done at the object level vs the block-level replication of cloud block storage and typical RAID systems.

    Objects are also never modified. If an object needs to be modified it is just stored as a new object. If versioning is enabled, the previous version of the object is saved for historical purposes. If not, the previous version is simply deleted. This is very different from block storage, where files or blocks are edited in place, and the previous versions are never saved unless you use some kind of additional protection system.

    Cloud vendors offer object-storage services, which include Amazon’s Simple Storage Service (S3), Azure’s Blob Store, and Google’s Cloud Storage. These object-storage systems can be set up to withstand even a regional disaster that would take out all availability zones.

    Amazon does this using cross-region replication that must be configured by the customer. Microsoft geo-redundant storage includes replication across regions, and Google offers dual-region and multi-region storage that does the same thing. Combined with the versioning features built into all object-storage systems, this makes data stored in such systems much more resilient than data stored in block-storage systems offered by any of these vendors.

    Whereas block volumes and filesystems were designed for performance, object storage was designed with data integrity as its primary goal. For example, the unique identifier can be used at any time to ensure that a given copy of an object has not been corrupted. All the system has to do is rerun the object through the process that created the unique identifier. If the UID is still the same, the contents of the object have not changed. If the contents of the object have changed due to bit rot or some other reason, the system will automatically detect that because the UID will change. It can then automatically repair the object by retrieving a good copy from another region. No block device or file system of which I’m aware has this level of data integrity built into it.

    Object storage has taken a lot of heat due to what is known as the open-bucket problem, where important and sensitive data is stored in a bucket whose permissions were not properly managed. (Think of a bucket as a very large container containing related objects.)

    Large customer databases have been exposed via this problem, largely because customers simply did not understand how object storage works. It is certainly possible to create an open bucket, as it allows you to easily distribute files to many people by simply giving them the direct link to that object. But that also means it’s relatively easy to create an open bucket and accidentally give away your trade secrets to the world.

    Follow best practices

    A simple Google search of best practices for your favorite object storage vendor will yield resources you need to do the right thing. For example, Amazon has this webpage that gives a number of common sense suggestions, like disabling public-access and rewrite permissions for everyone. Microsoft also has a best practices page, and so does Google. You should also be able to find a number of third-party articles to guide you along the way as well.

    One common suggestion is to identify only what access is required for a given application and to grant that level of access and no more. It may be a lot easier simply to grant every application full access to your object-storage buckets, but it is a security disaster waiting to happen. Also consider using role-based administration, which can be used to easily grant and revoke access as needed.

    Does object storage need to be backed up?

    Deciding whether to backup object storage is not as simple as whether or not a block volume should be backed up. Unlike block volumes, object storage automatically includes many levels of protection to protect against various things that might do your company harm, including optional write-once-read-many (WORM) protection. If you follow all of the best practices available to you – including cross-region replication – one could easily argue that there is no scenario under which all of your data could disappear and cause you to reach for your backup. A data-protection expert could help create a sound strategy.

    Having said that, it is difficult to argue with those who say that object-storage services are still written by humans who can make mistakes. They would say that if data residing in object storage is mission critical, you should back it up.

    It is important to mention that there are a number of ways to do that. For example, you could use a completely different level of service for your backup (AWS Glazier Deep Archive, Azure Archive Storage, or Google Coldline) to hold a “just in case” copy of your object data. If your data is that important, then you should consider backing it up in that way, as well as ensuring it is in a different account and region, just like with block storage.

    Be aware of what you’re using

    Block volumes need to be backed up, so make sure you are backing them up. Block-storage snapshots should also be replicated to another region and account. Object storage offers a much higher level of resiliency, since it is automatically replicated to multiple availability zones. But be aware that nothing is infallible and so take that information and make your own decision.

    Share this