Now that Isilon is a full member of the EMC storage family, there's a lot to like.
For storage aficionados like me, Isilon's OneFS technology is so simple and effective for storing big data, it's almost magical.
If you're willing to dive into the details, be prepared to potentially unlearn many of your previous assumptions about how things work in a NAS environment.
The way that OneFS does things might be academically interesting in more modest environments -- say, a handful of terabytes.
But as capacity under management grows to hundreds -- or typically thousands of terabytes, the elegance of the OneFS approach starts to become downright compelling.
Today, Isilon announced a slew of enhancements and upgrades at the NAB (National Association of Broadcasters) show. Not surprising, storage at NAB is all about video and content at scale.
And there's a lot to like with today's announcements.
The Simple Magic of OneFS
In a OneFS cluster, every node is a peer-- there are no control or master nodes. Every node has enough of a complete map of all resources to satisfy any incoming request.
New nodes are simply interconnected, and the cluster is then instructed to incorporate the new CPU, memory and storage into the complex. During the process, data is transparently moved to the new node to avoid it becoming an unwanted hot spot for newly written data.
You've got to see it working to believe how simple it is.
Both back-end data stores and front-end processing is transparently and dynamically load-balanced across the available resources. Redundancy levels (up to N+4) are selectable on a per-file or per-directory basis.
If one or more nodes fail, surviving nodes team up to rebuild new nodes with surprising levels of performance; in-flight file reads or writes made to a failing node are simply redirected transparently to surviving nodes.
There are no underlying concepts of LUNs, volumes, aggregates, path management, RAID groups, or even multiple file systems -- everything is a single filesystem under OneFS.
It's a big, intelligent pool of resources -- period. Although OneFS is understandably different, it does bear a passing architectural resemblance to products such as EMC's Atmos as well as Centera.
Isilon offers three flavors of nodes: one optimized for IOPS (the X series), one optimized for bandwidth (the S series) and one optimized for capacity (the NL series). All can be freely mixed and matched in a single OneFS cluster. A cluster is can be as small as a minimum of 3 nodes and as large as a maximum of 144 nodes.
Administrators can control capacity utilization through a relatively sophisticated manager, and can control performance by automatically moving files or directories between different node types.
There are powerful snap capabilities, as you'd expect, as well as asynchronous replication. Administrators also get powerful storage analytic tools to figure out how the data is being used, and to adjust things if needed.
The OneFS approach stands in sharp contrast from traditional NAS approaches like EMC's VNX and NetApp. While the relative merits of each might be debatable at modest scale, it's hard to argue against the attractiveness of OneFS in far larger big data environments.
As all resources are pooled, utilization and performance is typically much higher that simply trying to aggregate multiple, standalone NAS devices. And all of this scalability is achieved without having to scale administrative effort -- whether it's 100 terabytes or 100 petabytes -- the administrative effort is roughly similar.
My apologies for dragging you through the details, but it's necessary to have some context to fully appreciate some of the new goodies here. If you're intrigued, you'll find more information here.
Storage Is Thought Of Differently In Big Data Environments
If the technology is decidedly different in big data storage environments, so are the perspectives. Information is seen as valuable -- the more, the better.
The entire model is based on aggregating enormous amounts of information, usually from multiple sources.
It's stored, of course, and then leveraged (distributed, analyzed, etc.) in a variety of ways. These activities drive value, which inevitably drives demand to aggregate and store even more.
There's almost a virtuous cycle where technology advances mean that the users of these environments can do more than they did before.
Any efficiencies in utilization, performance, administrative effort, etc. are usually and promptly re-invested in -- wait for it -- storing more data.
Contrast this with the more traditional IT viewpoint of trying to minimize the amount of data on hand to reduce storage and management costs. The motivations can be very different in big data environments.
Lest you think that big data storage use cases are confined to a few, isolated niches in selected verticals, the trend appears to be exactly the opposite.
In addition to familiar use cases such as energy exploration, media and entertainment, biotech and pharma, etc. -- there's also growing application in public sector, online retail and several other interesting industries.
Before we get into the new capabilities, it's fair to point out that all of this was well in process prior to Isilon's acquisition by EMC. Imagine what we'll be able to do once we've had a bit of time to cross-pollinate!
The New S200 -- Extreme Performance
You'll hear the phrase "transactional workflows" in big data environments. We're not talking traditional OLTP here, we're talking ginormous data sets that have to be sequentially processed to derive value.
For many of these use cases, time is money, and faster is better.
The new S200 offers levels of performance that were only previously achievable with exotic, proprietary storage devices -- if at all.
The new levels of node-level performance come from the expected sources -- faster processors, large amounts of globally coherent cache -- and gobs of SSDs.
SSDs can play a key role in performance acceleration -- not only in accelerating primary data, but also by accelerating metadata handling.
The key metrics for this use are simple: how many IOPS, and what's the cost per IOPS? A full cluster of 144 S200s is spec'd at ~1.4 million IOPS (twice as much as before), with a cost per IO/s approximately half of its predecessor.
Those specs are not typos.
A fully-configured S200 cluster sports 13.8 terabytes of globally coherent cache, and can deliver a whopping 85 gigabytes per second throughput.
Sometimes big data needs big performance :)
The customers who've been trying out the new S200 are duly impressed. Not only much more performance, but at 50% of the cost.
Of course, due to the Isilon architecture, all this is achieved in bite-sized, scalable chunks as needed.
The New X200 -- Balanced Throughput And Capacity
The workhorse X series has gotten an update as well -- new processors, memory, disks, etc.
The resulting specs are more modest, as you'd expect -- but still considerable.
For example, a full-boat 144-node cluster would deliver ~309K IOs per second, and ~35 GB per second of throughput. Not shabby.
For these use cases, the key metric focused on cost per MB/sec throughput.
Not only is total bandwidth doubled as compared with it's predecessor, but the cost per unit of throughput has dropped by approximately 40%.
In these environments, there's often a mix of different requirements -- performance, protection, etc. Administrators want to understand how the data is being used, and then adjust the Isilon cluster appropriately.
"Storage management" under OneFS has exactly two major components -- understand what's needed, and specify accordingly.
With this release, Isilon introduces version 1.5 of its InsightIQ storage analytics.
To be clear, this isn't simplistic performance and capacity reporting, it's a sophisticated environment that provides deep insight on how many petabytes of data are being used by potentially thousands of clients.
The other half of the administrative equation is SmartPools.
Remember, a OneFS cluster can be comprised of any combination of extreme performance S models, balanced throughput X models, and capacity-oriented NL models. As mentioned before, protection levels can independently vary as well.
Performance can be optimized for the I/O profile (sequential or random). And, of course, protection levels can vary from very cost-effective to incredibly bulletproof as needed.
The administrator makes a few selections, and OneFS does the rest -- transparently, in the background, and with no disruption.
Those of you familiar with EMC's FAST (fully automated storage tiering) may wonder -- shouldn't this be fully automated at some point? I think the answer is that -- yes -- for some parts of the environment, that might make sense -- but, generally speaking, in these environments, having a smart set of eyes make some choices seems to be preferred.
There's More, Of Course
The core OneFS operating system now sports enhanced authentication, native CIFS and NFS 4.0, among other enhancements. The choice of protocol support turns out to be more than arbitrary -- many of these environments use NFS for "upstream" data processing, and knowledge workers on desktop PCs doing the high-level analysis and collaboration -- so both are needed.
The Isilon SyncIQ distance replication software also got a big performance and functionality bump. Just like other forms of data, big data has to be occasionally replicated as well :)
What Does All Of This Mean?
You're likely already familiar how big data storage is different than the bread-and-butter type used in so many enterprise environments.
But maybe -- just maybe -- you're trying to solve a big data problem with more traditional NAS storage.
Maybe you've got a farm of many filers with people busy running around trying to configure, optimize and balance things.
Maybe you're despairing that you can't get anywhere near the theoretical performance and efficiency of your hardware no matter how hard you try. And maybe, just maybe, you'd like to spend more time using your data than managing storage.
If that sounds like you, I'd encourage you to take a closer look at what Isilon and OneFS can do.
It's pretty magical stuff ...