Hitachi NAS SiliconFS Object-based File System
by Ken Wood on May 21, 2012
Today, I get to introduce my first guest blogger, Matthew O’Keefe, PhD. My colleague Matt will discuss hardware accelerating the various components of NAS systems, specifically Hitachi NAS (HNAS, aka BlueArc Mercury), in a multi-part series. Matt’s expertise is in scale-out file systems and kernel development, so you might want to read this thoroughly. Since I am a performance nut (as well as having a passion for efficiency) this post seems appropriate to provide you with some insight on the thinking and architecture within the design of HNAS.
Accelerating NAS via Hardware
The Foundation of NAS Technology: NFS and CIFS protocols
Network-attached storage (NAS) first became widespread in the mid-1980s with the advent of LANs and workstations, and later, PCs. Client machines on the network shared data via a file server using two protocols: Microsoft adapted and renamed IBM’s Server Message Block (SMB) into the Common Internet File System (CIFS) by adding more features in order to evolve from NETBIOS/NETBEUI for Windows-based systems, while Sun popularized NFS in the UNIX world. The basic idea for these protocols was to implement file operations (e.g., create/open/read/write/close/truncate/delete/mkdir/rmdir/link/unlink/stat/fsync) over the network via client requests to a server. Specialized network file server appliances became popular for overcoming protocol performance hurdles (such as synchronous writes, which could be accelerated via NVRAM) versus roll-your-own-file-servers, simplifying storage hardware deployment and volume management, exploiting operating systems tuned specifically for file serving, and simplifying system management.
Implementing each NFS and CIFS operation generally involves a series of three sub-operations—network, file and storage—to determine what operation a client is requesting, transferring the necessary data, then sending any data and return codes from the file back across the network from server to client. Each sub-operation stage can be broken down further into micro-operations (such as translating a file byte address to the appropriate block address, performing a lookup of a file name in a directory, etc.). Traditionally, these micro-operations have been performed in software sharing a single memory space, using traditional operating system support for network and file system operations.
The Basics of HNAS Pipelined FPGA Architecture
Pipelining is a classic technique to speed up processes consisting of a series of operations, including assembly lines (which are inherently pipelined) and computer central processing units (CPUs), which have been pipelined since the 1960s. Instead of processing one operation completely, then starting and completing the next operation sequentially thereafter, and so on, pipelined operations are broken into n sub-operations; each operation is completed by going through the n sub-operations implemented by the pipeline. Several good things result from pipelining operations:
- At any point in time, n operations are being performed in parallel;
- After the first operation gets through the pipeline, each following operation completes at the pipeline rate, which for efficient pipelines is n times the rate of doing each operation sequentially;
- Each pipeline stage can use its own local memory, providing n times the bandwidth of a single main memory and removing memory conflicts between requests in different pipeline stages.
For example, an NFS read request could be implemented roughly as a sequence of 4 steps:
- The request is encapsulated as a network packet and sent from the client to the server;
- The server interprets the read request and determines the blocks associated with the file offset and length requested;
- The server requests these blocks from the storage devices;
- And returns the file data obtained from the blocks and return code indicating the operation was completed successfully.
If a series of 100 such read requests is sent to a non-pipelined NFS server implemented in software, each request is completely executed before the next request is started, so the total execution time is (4)*(time per step)*(100) or (400)*(time per step). In a pipelined NFS server with 4 independent stages, 4 operations are occurring simultaneously and once the pipeline fills, a read request is fulfilled every (time per step). Hence, the amount of time to complete the 100 read requests is 100*(time per step), or ¼ the time required by the non-pipelined server.
HNAS SiliconFS technology implements precisely this kind of pipelining, but with much deeper pipelines, more parallelism, and multiple memory modules to remove bottlenecks. In fact, the NFS and CIFS protocols are so amenable to pipelining that the pipeline depth and the resulting clock rate can be increased as necessary to achieve the targeted performance. Moreover, by avoiding resource conflicts over memory ports and other hardware, performance scales predictably across different loads, and can sustain itself consistently even under very heavy, difficult (e.g., random small file write) workloads.
The Performance Potential of Pipelined NAS Hardware versus Traditional Software Designs
In modern multi-core processors, CPU pipelines are limited to about 6 to 7 stages and require complex interlock logic to delay certain operations from moving forward in the pipeline until earlier operations in the pipeline complete. Due to the sequential nature of most software, this interlock logic is activated quite often, creating stalls in the pipeline and reducing the pipeline rate increase to less than the number of stages n. This effect is so deleterious that in the mid-2000s, processor vendors like Intel and Sun reduced the pipeline depths of their processors from 14 to 6 stages (Sun UltraSparc) and from 31 to 14 stages, and reduced clock rates by over 50%.
In contrast, NFS and CIFS server operations require little interlock logic and generally execute completely independently of each other. This means that pipeline depth can be increased as necessary to match the network and storage hardware speeds available at the time, so that pipelines for implementing NFS and CIFS can be designed for very specific performance targets and can be guaranteed to reach them.
However, if these file server operations are implemented sequentially in software, then the pipeline speedup potential inherent in the NFS and CIFS protocols is lost. Today’s multi-core processors have significant contention for the limited bandwidth between processors and to off-chip memory. Networking, file system and storage operations implemented in parallel across multiple cores contend for the bandwidth into the single off-chip, main memory, creating bottlenecks, contention and erratic performance, especially under heavy load. What’s needed is a pipelined implementation of the network, file system and block storage operations with separate, parallel memories per pipeline stage.
In my next blog post, we’ll tell you exactly how HNAS implements this kind of pipelining, and the amazing performance it can achieve.
Comments (2 )
HDS Blogs: Part 2 of Hitachi NAS SiliconFS Object-based File System - HDS Blog on 27 Jun 2012 at 10:18 am
[...] my previous blog post, I pointed out that network protocols like NFS and CIFS could, in theory, exploit pipelining to [...]
HDS Blogs: Hitachi NAS SiliconFS Object-based File System – Techno Musings | Rafael Fernández on 01 Nov 2012 at 12:07 pm
[...] HDS Blogs: Hitachi NAS SiliconFS Object-based File System – Techno Musings. This entry was posted in Storage and tagged HNAS. Bookmark the permalink. ← What files will NetBackup automatically exclude during backup? Cancel Reply [...]


