To BIN or not to BIN, that is the question
by Hu Yoshida on Jul 7, 2010
Hamlet was depressed when he posed the question, “to be to not to be”. There was no questions in Barry Burk’s mind when StorageNerve asked Michael Hay “ Where is the Hitachi BINfile” and Michael answered “Hitachi doesn’t have the concept of a ‘BINfile’.”
Barry’s immediate response was that “EVERY intelligent storage array has the equivalent of a Binfile”. Barry also makes the correction that the correct name is .BIN file.
For those of you who may not know what a .BIN file is, StorageNerve provides a comprehensive description of it in EMC Symmetrix: BIN file.
A .BIN file is used only with the Symmetrix, to hold the configuration information for the Symmetrix. It requires EMC services for the initial installation and for hardware upgrades. It was used for the first Symmetrix in 1990 and is still used today with the VMax . I am not aware of any other storage array that has a .BIN file. Not even the EMC Clariion. .BIN file changes are loaded into front end directors, backend directors, and global cache in a process called IML (Initial Memory Load).
Hitachi does not require a .BIN file to map configurations into our directors or cache. The mapping of our cache is dynamic. Starting in the mid 1990’s with the introduction of the 7700 Freedom storage arrays we store the configuration data in a mirrored control store which is directly accessible to the front end and back end directors on busses which are separate from the connections to the data cache. We can change the configuration of the data cache simply by changing bits in the control store. This enables non-disruptive, configuration changes, upgrades, maintenance, tiering, dynamic provisioning, and mapping of cache to external storage arrays for storage virtualization. Keeping control data in a separate storage area has other advantages. Performance is increased by eliminating the cache contention between control data and user data. Privacy of user data is insured for remote call home maintenance since remote maintenance can only access the control store which is separate from the data cache.
There is boot information which is kept in flash storage on the front and back end directors of the USP V and VM, and our control and data stores are backed up by batteries. This enables us to offer a diskless version of the USP VM. Users can configure the USP V or VM through a Device Manager or Storage Navigator software.
So for Hitachi storage arrays, there is no need for a .BIN file. I do not know of any other intelligent storage arrays that has a .BIN file. This only seems to be a requirement for the Symmetrix architecture which is now over 20 years old. .
Comments (9 )
While you are correct that the .BIN file today holds the configuration data of a Symmetrix VMAX that will be required to restart the system from a cold boot, you continue to misrepresent its utility, use and operational implications.
Despite your argument that the configuration of USP-V/VM is not stored a singular “file”, the fact remains that ALL storage arrays (and servers, and desktops, and laptops, and yes even iPads and iPhones) must, BY DEFINITION, keep a definitive description on persistent storage to define how the system should be configured when rebooted. That the USP-V/VM keeps this configuration data in flash storage on each director instead of one centralized file does not change the fact that there is, indeed, the operational equivalent of a .BIN file on the USP-V – yours is just lots of little “files” on each of the directors.
So please, don’t insult the intelligence of your audience with assertions to the contrary.
What you and your HDS competitive folks apparently have not yet realized (or perhaps intentionally choose to ignore) is that the utility of the .BIN file on a VMAX is radically different than the .BIN file used on the first Symmetrix arrays. Today, *almost* all configuration changes are handled dynamically, via direct system calls that change the system configuration in real time and without involving the .BIN file at all. Of course, the .BIN is also updated dynamically to reflect the changes that are made dynamically, but that is effectively no different than would be the necessity to update the flash-based configuration of an USP-V director.
And while I admit that there are still hardware configuration changes that are not yet fully dynamic and thus still require .BIN updates to implement, I can also tell you that the days are numbered for these: a non-disruptive Enginuity software update later this year will make all configuration changes AND hardware upgrades/replacement fully dynamic. At that point, the .BIN file will be relagated to a role exactly analguous to CONFIG.SYS and/or the Windows Registry. The notoriety of the .BIN file will then be deprecated to the point of irrelevance.
So, when you and your cohorts spew misinformation about the .BIN file like you are wont to do, it merely underscores the point that you are not a credible source for information about how Symmetrix works. Please, don’t stop – it makes it much easier four our field to earn the customers’ trust when competitors try to win deals based upon misinformation such as yours. Thanks!!!
Oh – and thanks also for pointing folks to Devang’s post on the .BIN file…hopefully many will take advantage of his much more accurate insights and analysis. To his credit, he has worked with me on numerous occaisions to improve the accuracy of his posts, and I am sure he will happily update them again when the aforementioned Enginuity update is released and the .BIN file is deprecated.
All modern storage arrays have data and metadata.
You are right in that most customers debate the availability of data endlessly but ignore the following questions :
1. What does the metadata contain ?
2. Where are how is the metadata stored ?
3. How is the metadata retrieved and updated ?
The metadata mainly contains
1. LUN to Physical Drive mapping
2. LUN to Server mapping
3. Remote and Local Copy Configuration and change tracking
In the case of the USPV when the field engineer installs a RAID group the LDEV to PDEV mapping is automatically updated in shared memory. From what I understand in the case of the DMX the field engineer has to modify a bin file whenever a RAID group or additional capacity is installed. This is a drawback and does give room for errors.
In the case of lun to server mapping EMC ControlCenter issues symapi calls that in turn modifies the LUN to Server mapping. This is no different from the USPV where storage navigator issues java RMI calls to the SVP laptop that in turn modifies shared memory.
In the case of copy config both EMC and HDS are the same in that the config is hand crafted.
One of the drawbacks in the EMC architecture is that the metadata is a file and is retrieved and updated on the same paths as the customer data.
What we need is open standards that all storage vendors can use to design their arrays.
For e.g :- The entire industry relies on the SCSI inquiry protocol to discover LUNs on a storage array. The server only knows a logical bus address, target number and lun number. When a data is written to a lun the bus, target and lun is translated by the storage array to a set of physical drive addresses where the data resides.
One could have a embedded LDAP standard for this purpose. The server queries the LDAP server which returns the physical address where the data resides. This LDAP server also provide services such as lun and snapshot discovery whereby a lot of work done by SAN Admins is offloaded to the server admins. Imagine a ideal world where all storage vendors used a embedded LDAP server. Migrating from one array to another would be much simpler.
However the reality unfortunately is that unlike the ethernet industry the storage industry is a closed world making it much harder for customers.
Customers are eagerly awaiting for the day that the symmetrix / v-max bin file issues go away.
The real problem it causes is as follows: (the following example is the most usual, you can include also other hw upgrades as well:)
Every time new physical disks have to be installed, the bin-file needs to be changed. This causes normally 5 office days (read: week) time that no configuration changes (capacity provisioning) can be done on the array. In a live production environment where capacity provisioning is done literally every day, this is unacceptable. We have been able to reduce this into three days, but still it causes major headache.
I am glad that this will soon be taken care of. On other high-end systems, it is a matter of ~one hour, not days…
Barry, I am glad that you are listening…
It is not a question about whether or not there is configuration data. The real problem is as Soikki, Devang, and Vinod all point out – It is disruptive!
As Barry admits in his comment:
“And while I admit that there are still hardware configuration changes that are not yet fully dynamic and thus still require .BIN updates to implement, I can also tell you that the days are numbered for these: a non-disruptive Enginuity software update later this year will make all configuration changes AND hardware upgrades/replacement fully dynamic. At that point, the .BIN file will be relagated to a role exactly analguous to CONFIG.SYS and/or the Windows Registry. The notoriety of the .BIN file will then be deprecated to the point of irrelevance.”
The notoriety of the .BIN file still exists today and is still disruptive.
Indeed, Soikki, we are listening. And as a result of listening, VMAX today can allocate and map/mask 1TB of usable capacity in less than 10 minutes, while the USP-V takes significantly longer (I’ll leave the actual number for Hu to supply). The upcoming software update will reduce that significantly, arguably to be the fastest in the industry.
Vinod, it is not true that the BIN file has to be modified to change or add a RAID group – like Hu, you are working off of ancient information. Today all storage allocation, forming of RAID groups, configuration of Virtual Provisioning pools and mapping devices to hosts is handled dynamically using standard customer management utilities…these have not been “hand crafted” since the introduction of the first DMX in 2003, as a matter of fact.
The negative claims about separate paths for data and metadata are meaningless as well – in fact, the inter-processor communications operate independent of data transfers and have done so since the first DMX as well. They they are stored in the same memory alongside data cache is actually a plus, and although Hu and HDS will argue that point, they won’t be able to for much longer. For when they switch to an Intel-based platform, there are only TWO paths to get data into the CPU: the PCIexpress bus and the Memory bus. Either way, metadata is going to share paths with data, even if on separate lanes. At that point the argument will be moot, even as it is truly meaningless today.
But I will agree on the need for a better standard for defining the presentation of LUNs to hosts, and EMC stands ready to work with our competitors to develop a heterogeneous standard that eliminates the hard link of LUN personality to the proprietary implementation/personality of the various arrays. Anyone interested in joining us, please send me an email at:
barry DOT burke AT emc DOT com
1. Mapping 1TB to a host from a USPV takes just under 5 minutes if you know what you are doing.I guess what you are really talking about is installing 1 TB of disk drive capacity into the USPV, formatting it and carving out LDEVs that can then be presented to the host.
2. You are partly right about the BIN file changes prior to 2003 and the way it is handled right now. Let me explain how things work in the USPV world. When you install drives into slots in a frame on the USPV there is a ALPA and a loop number that is involved. All the CE has to do is specify the RAID level and the drive model and the drive emulation and the storage is formatted and made available to the customer. On the DMX4 the customer needs the EMC solutions enabler license and then the CE can run the symconfigure scripts to make the additional storage available to the customer. These scripts take static text files as input and commit the config specified in the text files to the array’s BIN file. This is indeed a change from 2003 but the text files are still hand crafted leaving room for errors.
First ever posting but I want to comment on on the time it takes to actually allocate a lun. Using HDS Device Manager does not take 5 Minutes to allocate anything. I just put in a New USPV in our environment and just to get anywhere in the java interface is impossible. For a novice Device Manager person its about 20+ minutes. Using the latest Vmax and just got that in our environment as well. First time less than 2 minutes. Even though at EMC world I heard the record was 22 seconds to allocate a disk to a host and bring it up.
Any heterogenous out of band storage management utility is going to be slow. Did you use EMC controlcenter or the command line to map luns to the VMAX. ? Maybe you should try Storage Navigator or the HDVM in band CLI.
I feel your pain, on so many levels.We just upregdad to VMax. And it’s performance is wonderful great array. But ECC has never really performed on any version I’ve seen so far the way it was presented. It would get out of sync with the database (house on UNIX) at times. This newer version is staying more in sync, but it just plain doesn’t work right. The nightmare of upgrading was bad enough, but so far I’ve been on a support call for days now, and it is clear they do not know what to do. I just seem to keep finding issues with the software. Trying to create a meta with 10 disk. Well you can create the meta with maybe 3 disks to it, but it won’t allow you to add more than 3. And after that it’s hit and miss if you get to add 3 more or just one. Or like yesterday where it choked and wouldn’t even let you add another disk to the meta period. Thank goodness I kept my command line skills so I could add them via scripts. Poof done! Create another one from command line all the disks I want to add done!This whole configuration is a nightmare. Use smc to add your IG/SG/MV or whatever, then there are all the other software pieces all requiring their own server (what a waste). So far, none seem to be working well together. Oh well .GUI’s were created so folks who have no clue how to work from a command line and write a script to run things can be a SAN person! You know just highlight & click and the software will do all the real work for you. I bet there are alot of companies with folks who only know how to click and play in some trouble, cause the GUI here is just plain not working right!!!This keeps up there should be a real nice niche industry being created where folks who know how to NOT use a GUI will have tons of work.And sadly, EMC is not the only vendor with this kind of problem. I’ve watched software from respected vendors take total nose-dives in quality. So bad I can not afford to upgrade the software because the product has gotten so bad, that the older version is the only one you can rely on. One company I know the product was sent to their outsourced/global section of their company. And now the product is not berry,berry good’! And the 6 figures this company pays for support costs is unbelievable for the support we get, when you compare it to how good it’s support used to be just 5 years ago! So sad!Keep up your scripting skills folks you will need them .