You hear a lot these days about artificial intelligence (AI) and machine learning. Magazine article and TV news spots drool over the transformational potential of these technologies. But watch out. AI and machine learning have an almost insatiable appetite for data storage. They will consume vast quantities of capacity while demanding insane levels of throughput.
With storage revenues flagging according to the most recent report from International Data Corp. (IDC), that’s good news for enterprise storage vendors seeking to boost sales. But industry capabilities are likely to be stretched to the limit as analytics engines battle with data storage repositories to be fed information at the rate they desire.
The adoption of machine learning can quickly tax underlying data access and management infrastructure,” said Laura Shepard, Senior Director, Product Marketing, DataDirect Networks. “Prototypes and generation one machine learning infrastructure are typically built on existing enterprise storage, or the team building it decides to roll their own with white box and a mix of open source, home grown and commercial tools and applications.”
As a result, it’s common for even the most successful machine learning programs to run into problems with scale as in general, when it comes to AI, the more data that can be incorporated the better the results will be. This pushes machine learning projects to grow and grow.
When this happens, we see the generation-one infrastructure start to stress. Scaling failures start to show up such as the inability to deliver data access at the required speed, inability to scale the amount transformation on the data to improve findings, and inability to scale data storage in a footprint that’s easy or cost-effective to manage. Any one of these failures can derail advances of the overall program because if you can’t grow your inputs or increase the depth of your deep learning network you can’t scale your outputs, said Shepard.
Opportunity Knocks
But one man’s challenge is another’s opportunity. As adoption of AI and machine learning grows, it is and will attract a growing legion of startups eager to solve the many issues involved.
“Managing data center infrastructure has been a process of being proactive, and staying ahead of the requirements of the business,” said Frank Berry, Senior Analyst, IT Brand Pulse. “The promise of machine learning is improved storage performance, higher availability service levels and greater efficiency (less admins per storage unit) through automation.
Kevin Liebl, vice president of marketing, Zadara Storage, expanded upon that theme. He believes that AI will make data storage far more self-managing – think self-driving data centers, just like self-driving cars.
“Automation will dramatically increase the number of servers an administrator can manage – from an estimated 500 servers today in a best-in-class VMware environment, to perhaps 20,000 servers per admin in the future, when servers are fully instrumented with analytics and automated server management software,” said Liebl. “That will make storage management easier, less time consuming and more efficient.”
He added that storage lies at the heart of the self-driving data center because all this automation requires a record of various activities, which of course, generates data. Data will be generated in ever greater volumes by the rise of cloud computing, mobility, the Internet of Things (IoT), social media and analytics. That’s why overall data storage volumes will continue to double every two years.
“AI’s greatest demand on the storage industry is likely to be the increased demand for storage management capabilities that allow systems to handle the deluge,” said Liebl.
It could well be that the rise of AI and machine learning will influence the storage industry in much the same way that personal computers reshaped the business world. Just as PCs have advanced from personal productivity applications to large-scale enterprise databases and automation programs, AI and machine learning is likely to evolve from consumer-like functions to full-scale data driven programs that will drive global enterprises.
“Over the course of the next 20 years, companies will evolve to AI-assisted organizations,” said Michael Tso, CEO, Cloudian. “It will be a world in which data enables collaboration, with machines gathering information, learning and helping people to make decisions real-time to match customer needs.”
Examples of this already exist. The recommendation engines on shopping sites like Amazon already use this technology. Similarly, ad feed systems are getting good at serving up ads based on website visits. Cloudian, too, is involved in digital billboards that match advertising to individual drivers and their cars.
“For the storage industry, this means companies will have to retain massive volumes of unstructured data to ‘train’ machines,” said Tso. “Once machines can learn for themselves, they will collect and generate a new deluge of data to be stored, intelligently tagged and analyzed.”
Many of the experts interviewed by us made reference to self-driving vehicles. It should be noted that autonomous cars use a large number of sensors to “read” the environment, which is then compared to accurate map data. From there, decisions are made on how to steer, brake and accelerate. The storage complexity required is significant. Data from sensors such as cameras and radars come in at 10s of GB per second. All of it has to be compressed and processed. From there, the camera and radar-like view of where the car is on the road is compared to a High Definition (HD) map data. This is an essential part of deriving an accurate vehicle position. These HD maps are layered on top of standard map data that contain additional information such as lane markings, curbs and signs. All of this can 10s more GBs of additional storage. Multiply that by the amount of motion performed by one car and the amout of traffic on the road and the mind starts to boggle.
Additionally, each car has to record some of the driving data and keep it for days or months – depending on OEM and regulatory requirements. This is important because even if this data is uploaded to the cloud – a local copy will almost certainly have to be kept. The quantity of data involved – within each car and by the systems that keep the traffic running safely and efficiently – is just the beginning. All kinds of AI and machine learning systems will be accessing it to turn information into actionable intelligence. That means storage systems evolving that can store, move and process data at the desired velocity.
“AI could also lead to untapped hidden or unknown value in existing data that has no or little perceived value,” said Greg Schulz, an analyst at StorageIO Group.
Also see: DROBO 5D Review
Storage Enhancements
But it isn’t just a one-way street. It’s not only about how storage needs to be able to store more, process it faster, and feed it more rapidly to analytics engines. There is also the reciprocal impact – how AI and machine learning will return the favor and enhance storage technology.
“There is the scenario where AI and other algorithms enabling analytics can be used to help manage data, storage along with associated data infrastructure resources,” said Schulz. “This means moving beyond basic analytics and insight awareness reporting as well as traditional policy based system or software management.”
He said to watch for increased need of additional CPU processing requirements along with memory for AI and analytics, as well as tools that transform data into information.
Storage Enhancements
But it isn’t just a one-way street. It’s not only about how storage needs to be able to store more, process it faster, and feed it more rapidly to analytics engines. There is also the reciprocal impact – how AI and machine learning will return the favor and enhance storage technology.
“There is the scenario where AI and other algorithms enabling analytics can be used to help manage data, storage along with associated data infrastructure resources,” said Schulz. “This means moving beyond basic analytics and insight awareness reporting as well as traditional policy based system or software management.”
He said to watch for increased need of additional CPU processing requirements along with memory for AI and analytics, as well as tools that transform data into information.