When is CAS Not CAS?
by Hu Yoshida on Apr 13, 2007
Acronyms are so prolific in the IT industry; it often leads to misunderstanding or misinformation. I recently talked to an analyst who categorized our Hitachi Content Archive Plat form with CAS, which is another vendor’s product. If an industry analyst could be confused than I thought I should post a blog to clarify the difference between a content archive platform and content addressable storage. The difference is in the names behind the acronyms. One is a content archive platform and the other is content addressable storage.
When a data object is ingested into Content Addressable Storage, the object is put through an algorithm which calculates a code based on the bit pattern of the data. This is known as a hash, and is used to address the data object. While the hash is not generated based on the content of the data, it is based on the bit pattern of data content, and is therefore called content addressable storage. The hash is not aware of the content. The producer of the data object must associate the hash with the awareness of the content.
I have heard some compare this to a hat check at a restaurant where we check our coats before being seated. The hat check attendant takes our coat, hangs it on a rack and returns to us a ticket which we must use to retrieve our coat later. No ticket, no coat. The ticket only tells us that a coat was hung on a certain rack. It does not tell us who the coat belonged to or what the coat looked like. If I happen to check in a lot of coats for my friends, I am responsible for keeping the tickets and remembering which friend wore which coat. If I lost the tickets or mixed up the tickets, I would have a tough time finding the right coats.
HCAP, on the other hand, works like your public library. When a book is ingested into a library, the librarian creates an entry into a card catalogue based upon meta data such as title, author, subject, and a short description of the books content. Now anyone with the proper credentials, a library card, can retrieve a book by looking it up in the card catalogue. One does not need to know the title of a specific book. He can scan for a book by title or subject or synopses since the card catalogue has awareness of the book content.
This approach is based on the ISO OAIS standard or Open Archival Information System. There are many steps in the process of archiving. HCAP does not create the object. A producer application like a KVS or IXOS, encapsulates the raw data into an object that contains all the information needed to interpret the data object and hands it over to HCAP during the ingestion process. HCAP will create a finger print or hash of the object for immutability purposes, to verify that the object has not been changed during its storage and at retrieval. The meta data along with policies needed to preserve the archive object are stored in a data base and the object is stored in a file. HCAP can ingest objects from multiple, heterogeneous, producers. After ingestion, HCAP manages the long term preservation of the archive object. It also provides access to the objects by authorized users who can search by file attributes, meta data, or a bit map search. You can’t do this with a CAS system that simply stores objects.
When is CAS not a CAS? When it is a content archiving system, and not content address storage.
[...] Elsewhere, you might be interested in reading what Hu Yoshida, vice president and CTO of HDS, has to say about HDS’s Content Archive Platform in a recent blog post entitled “When is CAS not CAS?”. [...]