The recent partnership between Cloudera Inc. and Nvidia Corp., announced in April, is worth watching as it represents an important step in creating the architectures of the future to streamline data science and machine learning pipelines.
This DataOps trend, described by research firm Gartner as a key development for 2021, is driven by growing frustration in enterprise IT with the challenges of building data workflows today. Training and iteration over models takes time, is expensive because large-scale CPU infrastructure can be expensive for big data operations, and the level of frustration increases as refactoring and handoffs add cycle time. .
In the past, using analytics in big data operations often involved multiple organizational teams, which required customizing GPU integrations for different use cases. Cloudera and Nvidia are jointly researching a solution that relies on the use of GPUs integrated into workflows to feed data preparation and analysis.
The integration of Cloudera Data Platform with Apache Spark 3.0 libraries from Nvidia creates architectures without requiring GPU customization. It’s a solution that fits well with the scale of data operations today and customer interest in applying a GPU-driven model.
“We run on over 400,000 servers and have over five exabytes of data under management,” said Sushil Thomas, vice president of machine learning at Cloudera, during an online news briefing last week. “Customers are clamoring for GPUs. “
Accelerate business workloads
The collaboration between Cloudera and Nvidia uses RAPIDS, an open source accelerator designed to run analysis pipelines on GPUs on hybrid platforms. RAPIDS, which is licensed under the Apache 2.0 license, speeds up data analysis and the process of extracting, transforming and loading GPUs on Cloudera Data Platform to accelerate enterprise data science.
“We focused a lot on deep learning, where the power of the GPU really shines,” said Manuvir Das, head of enterprise IT at Nvidia, during CUBE’s broadcast of “Transform Innovative Ideas Into Data-Driven Insights “with Cloudera on August 5th. “But as we moved forward, we discovered that GPUs can accelerate a variety of different workloads, from machine learning to inference. Computing AI or machine learning needs to respond to the customer where the data is.
Another essential element that drives GPUs and the partnership between the two companies is being able to run a cloud experience for processing data in a hybrid model. The “contactless, self-service” cloud experience is what many organizations are looking for when it comes to data management and analytics.
The integration of Nvidia’s solution into the Cloudera platform provides a key integration step to facilitate the processing of analytics workflows.
“We’re seeing a strong demand to simplify the whole experience,” Scott McClellan, senior director of the Data Science Product Group at Nvidia, said at the press conference. “We want to bring much of that same experience to businesses in a hybrid model, which is a key focus of the Cloudera data platform. “
Harnessing the power of the hybrid
The partnership represents another key step for the two companies in a big bet on the hybrid cloud. Nvidia has been particularly active in the hybrid computing space over the past year with a series of high-profile announcements.
In August last year, Nvidia partnered with Google LLC to integrate their GPU operator development environment with the Anthos hybrid and multicloud platform. A month later, Nvidia extended its partnership with VMware Inc. by integrating its NGC software hub to support GPU-accelerated AI applications in hybrid solutions.
Nvidia unveiled a new set of hybrid cloud and AI edge services for AI workloads in June, and this month announced the first Nvidia-powered hybrid cloud offering available through its partner program. AI LaunchPad.
“When it comes to using your data, you want to use it in different ways with a powerful platform, which of course you’ve built over time,” Das said. “Believe in the power of the hybrid that the data exists and that the compute must follow the data.”
For Cloudera, a hybrid strategy is based on the belief that not all applications are created equal and that companies will need to understand the lineage of machine learning algorithms. This will require a dependency on tools to fully track data across multiple operating environments.
“The key is to develop a hybrid data strategy,” said Mick Hollison, president of Cloudera, in an interview with theCUBE this week. “Hybrid will play a more important role in the conduct of work. “
IRS Compliance and Fraud Solution
The partnership between the two companies has already resulted in an important customer use case within a government agency, which is certainly not lacking in data. The US Internal Revenue Service has faced a need to analyze the data of its vast information treasuries to deal with the thorny issues of taxpayer compliance and fraud.
The agency turned to Cloudera and Nvidia for help using data-driven insights to fuel critical use cases.
“Our data sets are only getting bigger, and that requires that we actually do something to get more value added,” said Joe Ansaldi, technical branch head, Research Applied Analytics and Statistics Division at the IRS, at the time. interviewed as part of theCUBE’s Event on August 5. “Our biggest challenge is the infrastructure to support all the ideas that the subject matter experts come up with in terms of all the algorithms they would like to create. “
The IRS has decided to test the Cloudera / Nvidia solution using a fraud detection algorithm on a four terabyte data set, according to Ansaldi. The agency was looking for speed.
“Our expectation was that we were definitely going to see some acceleration in calculation processing times,” Ansaldi said. “If I remember correctly, we had a 22- to 48-fold acceleration after we started tweaking the original algorithm. Now it’s as if the chains are off and we can just run at our heart’s desire, wherever our imaginations lead our subject matter experts to actually develop solutions.
Documented performance improvement
The IRS example shows significant acceleration for a use case, but executives at Cloudera and Nvidia were more careful in their own earnings estimates, citing improved speed at a lower cost. Either way, that was the promise of a move to GPUs all along, and the evidence to date has supported the companies’ claims.
“With documented 3-fold performance improvements and customer references suggesting up to 10-fold speed improvements at 50% of the current cost for such workflows, it’s hard not to think that the Nvidia-Cloudera partnership has significant potential to increase revenue and customer earnings for both companies. “said Daniel Newman, senior analyst at Futurum Research and CEO of Broadsuite Media Group, in an interview with theCUBE.” I think this partnership reflects the importance of leveraging both hardware and software infrastructure to enable enterprises and other large organizations to fully realize the potential of machine learning deployed at scale. ”
The collaboration between a software company such as Cloudera and a large semiconductor designer such as Nvidia addresses the essential nature of AI and machine learning for a data-centric strategy. Businesses want to make better decisions faster from virtually unlimited amounts of information, which will require an architecture designed for speed and hybrid cloud.
“The reason we talk about speed and why speed is paramount in a hybrid world and hyper-competitive climate is that the faster we get information from all of our data, the faster we grow and the more competitive we are.” Cloudera CEO Rob Bearden said in an interview on CUBE. “This is why the partnership between Cloudera and Nvidia is so important. We’ve boosted the enterprise data cloud to enable our customers to work faster and better, and to make the integration of AI approaches a reality for businesses of all sizes.
Image: Cloudera’s Twitter
Show your support for our mission by joining our community of Cube Club and Cube Event experts. Join the community that includes Amazon Web Services and will soon be Amazon.com CEO Andy Jassy, Dell Technologies Founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many other luminaries and experts.
Join our community
We’re hosting our second cloud startup showcase on June 16. Click here to join the free and open Startup Showcase event.
We really want to hear from you. Thank you for taking the time to read this post. We look forward to seeing you at the event and at Club CUBE.