An excellent example of this "unlearning storage" was buried in an Atmos announcement from EMC World. In particular, Atmos storage now runs nicely in a VM. Not just for eval, for production.
And there are some interesting implications as a result.
I doubt myself these days, and I've been playing with it for over 20 years.
Let's make a short list of default assumptions that are up-for-grabs now.
Storage is spinning disk, right?
Not when you consider the advent of technologies like flash and FAST. Storage is becoming a semiconductor device, and less rotating rust.
Lots of spindles means lots of performance, right?
Nope, all the old-school performance tricks like spreading the IOPS load, or using the sweet spot of a physical disk -- all those are starting to become much less relevant given the new technology.
Logical and physical capacity are pretty much the same thing, right?
No, widespread use of compression, deduplication, thin provisioning, spin-down, etc. means that the gap between "what we see" and "what's actually there" will continue to widen.
Storage lives in one location, right?Not always, not when technologies like distributed cache coherence and global federation (e.g. VPLEX) are fully considered. Storage lives where it needs to, including potentially multiple places at the same time.
Storage administration is done by storage administrators, right?
Well, that's changing too. More and more of what we've always considered "storage administration" is moving to other places in the stack -- the VMware administrator, the application administrator, or the unified infrastructure administrator.
And storage, well, that's about hardware, right?
Maybe yes, maybe no. After reading this next bit, you can be the judge of that.
Storage As A Virtual Machine?
Deconstruct just about any storage array, and you'll find familiar components: storage media, processors, memory, motherboards, power supplies, etc. Indeed, most storage arrays are built out of largely the same parts bin that the server guys use.
Much of the differentiation from storage comes from its software. And anything that can run in software can be virtualized.
Which brings up the question: will the future bring more "storage solutions" that are nothing more than VMs running on a pool of virtualized hardware resources?
Before we get started, when we toss up the word "storage", we're actually talking about an incredibly broad set of use cases. For some of these use cases, the approach of "storage software in a VM is your storage" is very attractive. For others, it's a bit more difficult.
Consider the Celerra VSA
Many of you are familiar with the Celerra Virtual Storage Appliance. It's a virtual machine that turns arbitrary block storage into a fully-featured unified storage environment. It's not HA like the physical Celerra, nor is it as performant as the physical Celerra, which is why it's an eval-only offering, and not promoted as a production solution.
But what if you had another approach to HA that didn't involve hot-failover of controller blades to shared RAID? Or you had another way of offering the required performance that didn't require dedicated physical hardware?
Or, perhaps, your interest is more along the lines of EMC FMA (File Management Appliance) that does policy-based tiering, movement, archiving, etc. -- now available as a VM for production. The virtual edition does everything the physical version does, only it does it using virtualized (vs. dedicated) resources.
Now it gets interesting, doesn't it?Enter Atmos Virtual Edition
Ask people about Atmos storage, and most people will envision big pools of commodity hardware that are geographically dispersed. Well, that's not entirely accurate.
If you're not familiar with Atmos, think in terms of a global, distributed repository of content objects accessed using familiar RESTful protocols. Policy associated with the metadata dictates how the information is stored, protected and secured: how many copies, disks spun up or down, compressed or not, multiple locations or not, encrypted or not, audit and compliance trails, and so on.
Early this year, Atmos added the GeoProtect function, essentially bringing parity RAID concepts to geographically dispersed data. As one result, we can argue that the cloud is actually more resilient against multiple failures than traditional approaches.
Atmos Virtual Edition is nothing more than the Atmos software running in a virtual machine. As a result, it can use existing server and storage resources (including EMC storage arrays) to do its thing.
It overcomes the HA challenge mentioned above partly by using VMware's HA model, and externally reachable storage. And, as mentioned above, it doesn't have to use traditional RAID to protect against storage media failures, using instead a geographically dispersed approach.
So, how does that sort of thinking change the equation?
Cost Of Entry vs. Cost At Scale
If we were simply looking at cost-to-serve at significant scale, I could make a strong argument that a purpose-built Atmos storage cloud could be more cost-effective than using Atmos Virtual Edition.
But that presumes that you've got an Atmos environment that demands scale. And, when new technologies are entering the market, cost of entry can be more important than cost at scale. Things have a way of wanting to start small, and then get big if they're popular.
With Atmos VE, we now have an attractive cost-of-entry proposition. Use available virtualized resources (server, storage, etc.) in two or more locations by simply firing up Atmos virtual machines. Offer the service to your users or clients. See how it goes.
If you get lucky and need more performance, simply throw more resources at existing VMs, simply fire up additional VMs, or use faster storage. Want to get more information closer to your users? Fire up an instance close to them. Want more geographically removed data protection? Fire up an instance that's very far away.
Offering Atmos-style cloud storage services now becomes a completely different economic proposition on the cost-of-entry side.
If you're an enterprise that owns multiple data centers (or even IT closets!), you now can start offering an internal Atmos-style cloud storage service for not much effort. If you're a service provider, you now can offer the same without requiring dedicated infrastructure -- runs nicely as yet-another-task on a Vblock, for example.
Either way, get in cheap and easy, see if it grows, continue to add virtualized resources if you want to, or start adding in dedicated infrastructure alongside virtualized infrastructure.
And that's the power of offering functionality (including certain forms of storage) as a virtual storage appliance -- it tends to lessen the friction associated with new technology adoption.
Stepping Back A Bit
I'm sure that many in our industry will eventually start debating whether or not storage functionality implemented as virtual machines on pooled assets is "better" or "worse" than traditional array-based approaches.
Give it a while, and it's inevitable. And you know that our industry loves a good, spirited debate.
That discussion -- when it happens -- will miss the point as far as I'm concerned. Storage software running on a VM using pooled resources will be just another option to consider when offering storage services. Decide what functionality you want, and then decide whether you want a virtualized or physical resource approach -- or some combination of the two.
I think everyone knows that EMC's entire storage-related portfolio is now entirely Intel based. And anything that is Intel-based is can theoretically be virtualized, no?
So, anyone want to hazard a guess as to how many of these "storage as virtual machine" options we're going to see before too long?For me, it's just another step along the journey of unlearning everything I know about storage.

Chuck,
it becomes more and more obvious that the smarts of any storage device is going to be software; yes, the hardware is important but many of the hardware advances such as faster switching, faster backplanes, improved redundancy for availability etc actually have applications to improve the whole infrastructure, not just storage.
And yes it is obvious that 'storage as virtual machine' has a very strong value proposition but....I wonder if some of your competitors feel entirely comfortable putting their storage appliances on top of the current leading virtualisation technology? Sure they'll work with it but run on it?
Posted by: Martin G | June 01, 2010 at 05:46 PM
As much as I agree with you on VMs & their placement in the cloud, there are applications which are not cloud applicable. Such as high I/O apps or large data sets.
Sure the chip will replace the HDD as the primary data storage, though similar to core 2 duo's vs duo cores, having multiple paths & storage devices has to be faster than having singular large SSDs.
Posted by: twitter.com/needcaffeine | June 01, 2010 at 09:40 PM
Storage lives in one location, right?
Not always, not when technologies like distributed cache coherence and global federation (e.g. VPLEX) are fully considered. Storage lives where it needs to, including potentially multiple places at the same time.
...
Piggy-backing on caffeine dude.
I'm not buying it for similar reasons. For those report
runs that typically do 8 million back-end IOs (who knows
how many satisifed in SGA) I'd prefer average IOs in the
2-3 ms range so the reports run in 4-6 hours. Average IO
of 6 ms stretches run times out to 12+ hours. Introduce
hops (and yes additional block coherency) you risk adding
to average IO time. I'm sure more than few will be shocked when they move certain apps to the cloud. The
IO best be close in many use cases. Sure, email is already slow. But again, these are cases where people are
knocking on the door waiting on report runs. There are
a lot of those people and runs.
Posted by: Rob | June 02, 2010 at 12:57 AM
Hi Rob
I understand where you're coming from, but I think you're missing a few key points.
First, we all know that storage supports a wide range of use cases, including running mongo report runs.
Second, 2-3ms is now old school, the target with the newer flash drives is <1ms, so you need to raise the bar a bit.
Third, no one is proposing that increasing the latency between an application and its data is a good thing. Applications generally want to be close to their data -- that's true whether we're talking desktops, data centers or the proverbial cloud.
Report runs -- in particular -- are great use cases for fully virtualized resource pools -- call them clouds or call them whatever. CPU and IOs spike nicely during the peak, and those resources can be used for other purposes.
Now, if you can imagine pooling resources between two of your data centers in such a way where you can move both information and applications to an alternate location while not taking the app down -- well, that's the use case that's interesting so many people.
Thanks for the comment.
Posted by: Chuck Hollis | June 02, 2010 at 08:15 AM
@needcaffiene
While I would agree with you on the general premise that there are certain apps that aren't a good fit for a cloud, I think your examples are poor.
Google, for example, deals with extremely high I/O rates and large data sets. Most people would call that a "cloud". I work with a number of service providers in the credit card industry who provide a "cloud service" to financial institutions that have -- yes -- high I/O rates and large data sets.
What I think both you and Rob are getting at is that it's generally a bad thing to increase latency between application and data set. No argument there!
-- Chuck
Posted by: Chuck Hollis | June 02, 2010 at 08:17 AM
Martin
Your point is valid -- who supports the end-to-end stack? We saw this arise in the last round of old-school storage virtualization -- vendor A would put their virtualization thingie in front of vendor B's storage, and the responsibility for end-to-end support would move to vendor A, not vendor B, as a result.
Having learned from this experience, we realize that when EMC is vendor A, we need to be prepared to offer end-to-end support for vendors B,C, D and E. I think you saw that with the VPLEX announcement, as an example.
Thanks for the comment.
Posted by: Chuck Hollis | June 02, 2010 at 08:25 AM
We at HP have many storage products that are essentially VM's. The most visible is VSA - the virtual edition of SANIQ and have a large and successful partnership with both VMware and Microsoft. As well as enabling the use of commodity architectures it also allows the storage sharing mechanism needed to exploit the benefits of server virtualisation, its fully featured and I haven't met anyone yet who bemoans the performance hit for this flexibility. This makes me think that its virtualisation thats provided the key catalyst for this trend and will keep us vendors feet firmly on the ground as migration between different stacks becomes easier as you aren't tied to a proprietary hardware platform and I also believe there is enough healthy competition in the hypervisor market to keep everyone honest.
Posted by: Andy Sparkes | June 03, 2010 at 11:41 AM
"Second, 2-3ms is now old school, the target with the newer flash drives is <1ms, so you need to raise the bar a bit."
Sure. Maybe in a year or two? Today the prices don't
work. So let's say everything is flash at some point.
The customer run times go from 4-6 hours to 45 minutes
and they are estatic. Now move those flash drives to a
cloud. Suddenly, their report run is 8 hours. The IOs
have to be nearby for many folks (not email). I don't
see that changing.
Posted by: Rob | June 03, 2010 at 01:05 PM
Rob
I think you and I are thinking about things very differently.
If the report (and its associated resources like CPU, memory, storage, etc.) live in a "cloud" vs the "data center", they should run at roughly the same speed ....
Perhaps the problem is your definition of "cloud".
As far as "prices not working", have you asked the ecstatic business owners if they see value in getting their reports run in 45 minutes vs. 4-6 hours?
You might be surprised at the answer :-)
-- Chuck
Posted by: Chuck Hollis | June 03, 2010 at 01:38 PM