Object Stores: Addressable Capacity vs. Object Count – Part 1
by Hu Yoshida on Jul 12, 2013
In our last post together, Bob Primmer talked about the concept of object stores and how they enable us to store data without the need to understand the underlying infrastructure. When asked if object stores are a replacement for file systems, Bob said that they were an augmentation of file systems.
A common misperception is that object stores are a replacement for file systems; instead, they are an augmentation. The file system is tightly coupled to the operating system and provides a well-established mechanism for organizing files within a hierarchy of directories. By contrast, an object store focuses on changing the presentation layer to the storage consumer (typically an application) through a simplified interface (REST) while achieving enormous scale by aggregating many file systems into a single, higher-order grouping.
Recently some scale-out NAS systems have been marketed as scaling to petabytes. The basic scale-out NAS architecture consists of NAS nodes with internal storage that are clustered together through a private back-end LAN. A scale out storage vendor claims to scale out to 20PB with low-cost internal storage. Below is Bob’s response when I asked him how to measure scale when it comes to object stores:
“In the 10 years that I’ve been working on object stores I’ve frequently been asked the question: how big can this system scale? This is a natural question, but the metric most often used to measure such scale belies a misunderstanding of the fundamental difference between an object store and traditional storage. With object stores, the limitation to scale is unlikely to be the amount of addressable storage, but rather the number of individual objects that the system can store.”
In this blog I’ll focus on the first category, addressable capacity. I’ll follow this up with an article that talks about the effect of object count limitations.
The most common measure given for a storage system’s scale is how much capacity it can address. For example, a single instance of the Hitachi Content Platform (HCP) can address 80PB of storage capacity with 80 nodes (presently, 80 nodes is the max cluster size for HCP). This means that a single node is able to address a petabyte of data, which has the effect of reducing the impact of node cost (including power, cooling and maintenance) with respect to total system cost. One of the reasons that people say object stores are well suited for big data repositories is the substantial reduction in the number of servers required to address the data.
In the last year I’ve often been asked to compare HCP (an object store) to a scale-out NAS. Such a comparison doesn’t make sense to me since the primary design center for scale-out NAS is compute-intensive jobs, not storing large volumes of data. However, I’ve heard this comparison frequently enough that it’s clear to me that people conflate the two, object stores and scale-out NAS, so I’ll address that here. While I don’t want to take up accounting and dive into the myriad factors involved in an exhaustive cost analysis, let me try to illustrate just how great of a difference there is between the two technologies for the basic use case of a large data repository.
The table below compares the number of nodes required for HCP to address 1 petabyte and 80 petabytes against a prototypical scale-out NAS system. Present-day commercial scale-out NAS systems typically address ~36TB per node, with some products offering a “dense configuration” where a single node can address ~108TB per node. In the table below we plug in a cost of $3,200 for a node without disks to get an aggregate node cost for an 80PB repository. The final column shows the multiple to be applied to power and cooling costs for each scale-out NAS configuration compared to HCP.
As the table illustrates, the difference in configurations is stark. However, this shouldn’t be too surprising as the design center for scale-out NAS is different from that of an object store. The design of scale-out NAS is optimized for compute-intensive applications, whereas the design for a typical object store is optimized for large data repositories. The problem comes when a vendor tries to apply a single solution to too many problem domains. There are good and valuable use cases for both object stores and scale-out NAS systems; it’s just a question of applying the appropriate technology to the right problem. In the next blog we’ll tackle the issue of why object count is the more important measure for determining the scale of an object store.