Object storage: The blob creeping from niche to mainstream
Can you be scalable AND floatable?
Storage dull? Dry? Uninteresting? Not a bit. Everybody and everything uses data storage. Without we'd be lost. And thanks in part to the growth of cloud computing and big data, storage has risen up the agenda.
In the big data universe, things are changing. Our methods of naming, storing and retrieving filesystems need to be reinvented to keep pace with the swelling data volumes that will extend from petabytes into zettabytes and ultimately yottabytes. Could object storage be the answer for these new massive data environments?
Object storage: a definition
Object storage is the discipline or practice of labelling units of data as objects rather than files. An object is comprised of data in the same way that a file is, but it does not reside in a hierarchy of any sort for means of classification i.e. by name or file size or other. Instead, data storage objects “float” in a storage memory pool with an individual flat address space.
Object storage sits well with the über-flexible world of cloud. This is because each unit benefits from an extended metadata identifier to allow its retrieval without the user needing to know its real physical location. Suddenly data storage automation sounds a lot easier.
According to OpenStack’s official documentation, object storage provides an API-accessible storage platform that can be integrated directly into applications or used for back up, archiving and data retention.
As there isn’t a notion of RAID, volumes, or aggregates, object storage can be treated as a “pooled capacity” so applications and users can consume the desired amount of storage at any one time. This means (if the system works), the guesswork of capacity planning is eliminated. Volumes no longer need to be tied to a particular server or application. If application ‘A’ unexpectedly grows at 80 per cent, there is no need to reconfigure and reallocate storage volumes as Application ‘A’ has access to the pooled storage capacity.
Sean Derrington, of cloud storage provider Exablox, adds: “Perhaps more importantly, storage capacity can be increased in any ‘unit’ desired. Since there are no volumes, capacity can be non-disruptively added to the existing pool in near real-time - eliminating the need to purchase and plan nine or 12 months ahead. When storage is added, the file system seen by applications and users doesn’t change. The only thing noticeable is the storage they have access to has increased.”
In terms of industry standards we have OpenStack Swift. This open source object storage system is described by its development team as a highly available, distributed, “eventually consistent” object/blob store. That distributed part is important; Swift helps replicate the objects across a server and multiple locations to make retrieval as easy as possible.
According to ZFS-storage software supplier Nexenta, the jury is still out on Swift. The firm says that there are well-recognized limitations and some flaws in the Swift design, but still it is gaining increasing popularity. Nexenta asserts that today Swift fills the niche that is not covered by Dropbox et al, so BYOD users will eventually use it. This means Swift could be on track to become the preferred backend, although of course Amazon, Google and Microsoft will compete for that space leveraging their respective proprietary close sourced technologies.
Simon Robinson is research vice president for storage technologies at 451 Research. In his report Object storage looks like a technology whose time has come Robinson explains that object storage, on paper at least, seems like an appealing option. “It's radically simpler than traditional SAN and even NAS, it scales much better from a capacity standpoint, and it's especially well-suited for cost-effectively storing the reams of unstructured data – think files, videos, music and images – that are being created in this 'big data' era.”
Software-defined storage prowess
As positive as this sounds, Robinson’s team say that according to their research, the adoption of object storage remains a “minority sport”. But the analyst points out that growth may yet be spurred by the many cloud service providers who are keenly interested in developing cloud storage services that will help them compete with Amazon Web Services - and object storage represents achievable “software-defined storage prowess” in this regard.
Nexenta also points out that Intel’s continuing work on x86 architecture and instruction sets to accelerate SHA, RAID, CRC and erasure coding is “very timely and promising” just now. “Those are the functions that a storage appliance executes, generating sometimes multiple processor cycles per each stored byte. Local deduplication, for instance, uses cryptographic strong hashing - this may be SHA-256, SHA-512 or SHA-3. In that sense, deduplication definitely requires specific capabilities from the CPUs (or GPUs if available),” according to the company.
Who's in then?
So who else is playing the object storage game? Along with Intel, Hitachi Data Systems (HDS), NetApp, IBM and EMC are all interesting players - with all of them looking at ways to integrate object stores into mainstream applications.
As you might also expect of a company its size, HP has an OpenStack-based object storage proposition, which the firm promotes as a way to store and retrieve objects in a “highly redundant cluster” of publicly accessible physical machines hosted its own HP datacenters. HP plays a hand at accessibility for storage-focused programmers and DevOps professionals who want to remain inside their favourite language environment and not deal with the “guts of a REST API” to achieve their aims. The firm provides a dedicated ‘bindings’ offering for developers to code against HP Cloud Object Storage in this instance.
But should object storage be regarded as some kind of niche storage technology only of interest to massive data environments such as healthcare, media and entertainment and the cloud storage providers themselves? “Object storage is used by over 700 HDS customers worldwide and deployed for many different reasons, for example long-term archives, internet content stores, private cloud repositories, as well as acting as a replacement for traditional backup with its built-in protection services,” says Lynn Collier of HDS.
But how should object storage be accessed - via REST, CDMI, XAM controls or perhaps through a file interface? In the case of the Hitachi Content Platform, Collier explains that the solution is flexible with access via standard file system protocols such as NFS & CIFS as well as REST, WebDAV & SMTP for email to provide open (but secure) access and retrieval to objects. “The adoption of XAM has not yet proved fully successful and the take up from leading ISVs was limited. We are also currently considering CDMI support as an alternative access method,” she says.
Software is eating infrastructure
Is this a new dawn in data storage? Is the failure of filesystems a symptom of specialised storage hardware's inability to scale technically and economically to meet emerging data storage volume? Yes, says Stuart McCaul of Basho, the database specialists. This is because software is eating infrastructure i.e. it is now providing the reliability guarantees of traditional specialised storage hardware on commodity hardware.
“For a time, Distributed File Systems were a reasonable stop-gap for companies struggling to vertically scale their filers. However, businesses also need to keep operations running efficiently while scaling out horizontally. Object storage helps keep operations efficient by simplifying security and eliminating filesystem admin tasks. We believe access to object storage should be customer friendly, which means supporting multiple access methods. Basho's large object storage platform is Riak CS and we've worked to make access to Riak CS compatible with the RESTful S3 API as well as OpenStack Swift,” says McCaul.
And here’s the interesting part in terms of adoption. Basho’s own Riak CS also scales down and is used by companies like DVAG to offer a private file sync-n-share service to internal IT users. So object storage should perhaps be considered “just one part” of a CIO's storage service catalogue i.e. perhaps a third tier online archive, a fourth tier online backup or as a new, distinct web tier for next generation services.
Another significant benefit of object storage is the ability to perform other functions based on object/hash calculations. Exablox’s Derrington says that since files are managed as objects, it's “relatively easy” to perform functions like inline deduplication and encryption. “The benefits of deduplication from a backup/recovery perspective are well understood: now they can apply to primary storage as well,” he says.
Decomposed deduplicated data
“There are many benefits to managing object storage that are particularly attractive to organizations that don’t have the skill set or are looking to increase the ratio of terabytes to admin for their organisation. With object storage there is no notion of RAID, volumes, or LUNs. Every file written is decomposed into a data block and a hash is calculated on the data block so it's treated as an ‘object’. Consequently, erasure coding or replication is used to provide resiliency in the case of device or drive failures,” he adds.
In terms of access then, how do you get over the historical predicament where organisations had to write custom APIs (often proprietary to a storage vendor) for their applications to access object-based storage? Exablox had customer APIs and the public cloud storage providers (e.g. Amazon S3) offered object storage via RESTful APIs - in both cases object storage was out of reach for the vast majority of applications and users. Fortunately, says Derrington, some object storage vendors are providing more common ways to access their storage like CIFS/SMB, NFS, or iSCSI.
As well aligned and positive as much of these object storage developments sound, we have to remember that not all object storage systems are created equal and that the implementation/deployment approaches taken can vary significantly.
Quantum’s Laurent Fanichet reminds us that like every technology play, it’s more than the technology; it’s a total systems approach that makes it a solution. “Turning over all the responsibility for data ‘keys’ to an external application that has not had years of investment carries risk, as some of the users of legacy object storage CAS (content addressable storage) systems have discovered. In some cases, CAS workflow applications lost their keys and suddenly the customer had a CAS system full of data that they could neither read nor delete because the object store itself had no central intelligence of what it was storing. This vulnerability is an area where we’ve made significant investment,” he says.
Fanichet warns that the desire for information quality is driving data ingest ‘granularity’ - increasing the resolution of the data, which generates massive data growth. He says that these ‘high grain’ data sets are being created by more and more users - and they are being kept indefinitely.
Surely not, not niche
These realities result in high growth of the large-scale archives of ‘big’ data files - videos, images, sensor information - that object storage technology is most suited to support, especially when you factor in the requirement for secure metadata organisation. Can we still call it a niche when these large-scale data files are generated everywhere and are part of our daily lives? Probably not.
Shane Harris, director at CommVault, explains that his firm’s Simpana 10 product exists in this space as a hardware-agnostic software storage solution. Simpana 10 supports file, block and object storage as well as tape, disk and cloud storage tiers.
“Simpana software can function as an object-based storage platform and can send data to external object storage platforms. We support solutions like the recently announced Quantum Lattus-D Object Storage platform and the OpenStack cloud. The real difference is when Simpana 10 with ContentStore is used as an object storage platform, we can intelligently expose data for more use cases through API integration,” he says. Simpana 10 integrates with cloud platforms through an HTTP/REST interface. Because users can access all of their stored information through a single platform, it will be easier for them to repurpose backup and archive data for a range of uses, including test and development, disaster recovery, eDiscovery, mobility and self-service access.
So back to our initial question. Is object storage a cure for failing filesystems in the new world of big data?
Quantum’s Fanichet issues a definite and defiant no. “Like any new architecture, next-gen object storage by itself is not a panacea. The very system that makes object storage so scalable (no central metadata index) also gives it vulnerability. File systems and databases have years of engineering work invested in them to assure data integrity and availability,” he points out.
Will object storage interest developers?
So is this the point at which DBAs and DevOps professionals (or even pure-play programmers) start to find storage interesting at last? HDS's Collier says yes! Her team regards the development capabilities of intelligent object stores as enormous and integration or SMART ingest into object stores will make interesting projects for many DBAs and DevOps professionals.
She is not alone. John Burwell, Basho consulting engineer and Apache CloutStack PMC member, says: “Looking at the prevalence of organisations using S3 to host static web content (e.g. software downloads, images, blogs, single page web applications etc.), I think we can say that DevOps and developers alike are currently replacing traditional database LOB storage with object storage. In addition to the cost savings and reduced operational complexity, the native HTTP interface provided by object storage provides a natural interface to access and integrate data directly into HTML documents.”
Technology platform shifts happen in roughly five-year cycles, so is there enough time for object storage to blossom? The smart money is on a much smaller time window than that, perhaps two to four years is all we have. From inherently software-defined (and hence more controllable) roots, an intelligent deployment of object storage will come into play only when needed.
In the words of Nexenta: “Object storage won’t be a niche technology. In fact, we believe that NAS will gradually relegate itself to specific applications requiring instant consistency and limited scale (see above). In addition to the big data wave, BYOD is evolving to become the primary driving force behind the growth of object storage. It will cross the chasm when we see back up, archive and disaster recovery being re-implemented via “objects”, thus boosting the already exponential growth of the “cloudified” petabytes.”
Data storage that is as controllable as a cloud isn’t an option; it will soon be a prerequisite for service-based virtualised computing architectures. Object storage could just be the answer. ®