open -a TextEdit 000000cc-00000080–000b
by Michael Hay on Apr 4, 2011
I cannot resist starting out the post this way; the title of this post is how one would open a file stored in a Remote BLOB Store (RBS), a part of MS-SQL 2008 and later, stored on a Mac — that is if we assume that the file is a *.txt file. Said BLOB/file could be retrieved from an “easily comprehensible” directory like ‘/blobstore/9de21b5-e4c3-4ef1-ad71-9b123021f2ba/f6c43087-816f-4b08-a68d-a77a04452c81.’ (Note this bit is a joke and was inspired from a blog post on Todd Klindt’s SharePoint Admin Blog.)
To be very clear, BLOBs in RBS are not managed by humans and are assumed to be managed 100% by MS-SQL Server, that is they are machine managed data. Further the stated design target of RBS is to improve the performance for both the DBMS layer and the storage layer with respect to BLOBs that were formerly stored directly in the DBMS. So this reinforces the notion that data stored in a BLOBstore managed by MS-SQL Server is not meant for human consumption.
When Microsoft (MS) first debuted RBS, Hitachi Data Systems made a commitment to write an RBS provider (stay tuned on this front). We totally understood the intention of clearing out the DBMS of BLOBs to increase DB performance and scale while easing backups. From discussions with MS at the time we also understood that the primary motivation in making the innovative RBS layer is ensuring that customers who had stored piles of BLOBs in MS-SQL had a path to increase the scale of their applications. More specifically during the informal discussion, geospatial applications were cited whereby many satellite images were persisted in MS-SQL as BLOBs with machine generated metadata (they were from a satellite after all) so some way was needed to clear these beasties out of the DMBS allowing MS’s customers to do more on each MS-SQL server, so along came RBS. So with all of that in mind we signed up, and then…
Microsoft’s SharePoint team produced a product that has been the fastest to achieve more than $1 billion in annual revenues (actually Microsoft stated that SharePoint made $1.3 billion as of 2009). We can link SharePoint’s rapid success to its ease of use resulting in an almost democratic IT experience, creating headaches for IT administrators who exhibit a love-hate relationship with SharePoint.
They love the fact that it is so easy to use, modify and set up, but they hate the fact that it creates problems like storage consumption spiraling out of control and “fuzzy governance.” As a result, the SharePoint development team realized they had exactly the same problem as the custom application developers: DBMS instances with BLOBs growing out of control. So they decided to turn off their old implementation, External BLOB Storage (EBS), and move to RBS for SharePoint 2010.
Almost immediately when the SharePoint team adopted RBS, there was a pack of companies who started updating their SharePoint archiving products to include RBS, yet at Hitachi we realized that the competition was playing our game so we could take our time. Our game was in fact the product we call Hitachi Data Discovery for Microsoft SharePoint (HDD-MS) and because we already had the implementation which was independent of EBS and RBS using documented SharePoint APIs an immediate move was not warranted. (Note I have blogged a lot about HDD-MS. Here is a good gateway link to many such posts: Managing SharePoint Growth.) Let me provide you with a bit of the analysis that we did when thinking about when and if we should transition HDD-MS to RBS, or accelerate our RBS development.

All of these attributes were important to Hitachi, but because I started off the discussion illustrating that RBS is machine optimized, I want to focus on that. Since long ago we recognized that to unlock the true potential of data and generate new information (see my previous post on Big Data) we must start storing data independent of the application in an open and readable format. That is because we cannot predict the future where we might be seeking the union of disparate data types to find hidden gems from the noise, remastering the content in a new format, etc. It also means that we need to not only think about the data but also the data’s metadata. HDD-MS extracts both the object and its metadata from SharePoint persisting both in the storage media in a human (and machine) readable format. Whereas RBS is 100% machine optimized using hints from MS-SQL server to construct the directory and object naming schemas, ensuring that MS-SQL performance and scale is optimized.
Both design choices are perfectly fine and while there is some overlap depending upon a customer’s problem, one may be more applicable than the other. For instance, if the customer is really after optimizing their MS-SQL database attached to their SharePoint infrastructure, then RBS may be the best fit. On the other hand, if a “future-proof” archive is key, then HDD-MS is the best choice (note this is a Hitachi internal best practice. Here are a few examples beyond HDD-MS illustrating this point:
- Hitachi Data Ingestor (HDI) – stores the file system object + all metadata onto HCP
- Hitachi NAS Platform (HNAS) – stores the file system object + all metadata onto HCP
- Universal Volume Manager (UVM) – when implemented users can select a mode to merely virtualize but not change the LU this allows the user to back out of the virtualization process quickly
- HCP SMTP Interface – stores emails from Exchange in a user and machine comprehensible format including persisting individual emails as *.eml or *.mbox format
Now I’ll be the first to admit that my primary argument and others I hinted at make sense for SharePoint. But what about the applications that we haven’t optimized our products for or implemented specific plugins to that make use of MS-SQL server? What about a user who doesn’t care about perfectly archiving metadata and the content itself? That is where RBS shines! If a customer has written their own application making use of MS-SQL 2008 and they want to store BLOBs outside of the DBMS: RBS is the right choice. If an application like Microsoft Dynamics wants to store BLOBs outside of the MS-SQL DBMS: RBS is the right choice.
In short, RBS is the right tool for a set of use cases and HDD-MS is the right tool for another set. So when our competitors were jumping on the RBS bandwagon and racing to the finish line to solve SharePoint problems there, we were standing on the finish line waiting to cheer the company who makes it to second place.
Comments (5 )
Frank T Wilkinson on 04 Apr 2011 at 2:39 pm
I agree with the comments and direction for positioning RBS vs. HDD-MS. What Michael describes as the key differences and use cases for each, is a good level set for understanding the nuances that each has upon deciding which road to take. What is described above by Michael, only addresses the capacity issue and while important, there is another perhaps larger issue to address fact is, that most organizations want a way to convert SharePoint data to a “record” for preserving the item itself as well as applying a retention policy to the data. When combined with the HDS HCP, enables the data to be stored as an immutable object. This is critical for not only search capabilities but for controlling and implementing a data classification policy. The ability to migrate seamlessly and or archive data is critical for corporate governance and policy control. As far as the SQL DB, at some point some customers may want to prune or archive SQL data into the HCP for the purpose of corporate governance and eDiscovery search capabilities, as this would have a two fold advantage, first, the data would be stored and indexed and fully searchable, secondly, it allows for the off loading of data from the SQL DB and by doing so, improves performance.
This would address the two issues that RBS cannot:
1. Ability to archive SharePoint data along with its associated meta-data to enable fast retrievals and search
capabilities and stored as an immutable object and enable the data to be “cloud” ready
2. SQL archiving would address the issue of DB performance and capacity utilization and would negate the need for
RBS to be enabled. This would also allow for the record and its associated meta-data to also be immutable
If this is the case then the utilization of RBS may be a moot point.
The biggest issue with RBS is that it is ALL OR NOTHING approach, but this means that there is no logical rule to push specific data from the SQL-DB, but rather everything is pulled out, which is an issue if you are trying to enable “compliance” because there is no way to have RBS push data out based upon a policy via, last accessed or some modification data or by age. With the use of RBS, while a “data” mover it does not add any capabilities for seamless searches or audit capabilities. Don’t get me wrong, as I believe that there are definite use cases for when to use RBS and when to utilize a more flexible solution, simply we just have more options now and the question is, which tool is right for you?
dnz on 27 Apr 2011 at 12:07 am
I am interested in the comparison between RBS and HDD-MS for the SharePoint use case. We are considering using HDD-MS with HCP and SP2007. SP2010 is also on the roadmap and will be installed in the next 12 months. The main drivers are:
1) Improve performance of SP2007 (by reducing DB size)
2) Archiving of SP data for eDiscovery purposes
We also want to:
3) Maintain native SP2007 search capability
4) Single search tool for all eDiscovery purposes (Archived data, SP2007, email, file shares, etc)
Unfortunately, HDD-MS cannot satisfy points (3) and (4) as you cannot use the native SP search facility for full content searches of archived content (metadata only). Archived content can be searched separately using HCP or HDDS but the ‘live’ data in SP can’t through these tools. As a result of these limitations we will not be implementing HDD-MS with SP2007.
We are now looking at whether SP2010 will be any better. Options such as using RBS to store BLOBs on an HNAS and then archiving the HNAS files are being considered. I would be interested to hear of any potential solutions you may have.
Michael Hay on 27 Apr 2011 at 1:54 pm
dnz, let’s cover points 3 and 4 first.
For (3 & 4) you are correct and to be clear for SP2007 we can search metadata on the content archived to HNAS or HCP from the SP2007 UI. For SP2010 the full content search is preserved even when the content is moved to an archive store including the case of FAST Search being used as the index for SP2010.
This means that for SP2010 we can answer both 3 & 4 but for SP2007 only partially.
For better eDiscovery support let me suggest HDDS which can search HNAS, NetApp, HCP, Windows, etc. If you add this to your archived Sharepoint content on HNAS or HCP you’ll get everything you want and then some. Further if you deploy HCP you can point the Exchange journaling to HCP and you don’t need middleware to facilitate the email archive. This lets the data be freed from the application for future use cases you may not imagine. Finally we’ve also been explicitly tuning HDDS for eDiscovery related activities. As a result there are a lot of bells and whistles for it in this area.
Techno-Musings >> Blog Archive >> Announcing the Hitachi RBS Provider on 18 May 2011 at 11:49 am
[...] implied in my previous post that Hitachi was soon to release our own RBS provider. Today I’m pleased to announce that our [...]
HDS Blogs: Announcing the Hitachi RBS Provider - HDS Blog on 27 Jun 2012 at 7:59 am
[...] implied in my previous post that Hitachi was soon to release our own RBS provider. Today I’m pleased to announce that our [...]


