Databricks has announced the launch of a new open source project called Delta Sharing, an open protocol for securely sharing data between organizations in real time, completely independent of the platform on which the data resides.
Delta Sharing is included in the open source Delta Lake project and supported by Databricks and a wide range of data providers including Nasdaq, ICE, S&P, Precually, Factset, Foursquare, SafeGraph, and software vendors like Amazon Web Services ( AWS), Google Cloud, and Tableau. This is the fifth major open source project launched by Databricks, after Apache Spark, Delta Lake, MLflow, and Koalas, and is donated to the Linux Foundation.
Data sharing has become essential for the digital economy, as businesses want to easily and securely exchange data with their customers, partners and suppliers – like a retailer sharing timely inventory data with each of the brands that ‘they sell. However, data sharing solutions have always been tied to a single vendor or commercial product, which has the effect of tying data access to proprietary systems and limiting collaboration between organizations that use different platforms.
“The main challenge for data providers today is to make their data easily and widely usable. Managing dozens of different data delivery solutions to reach all user platforms is untenable. An open and interoperable standard for real-time data sharing will dramatically improve the experience for data providers and data users, ”said Matei Zaharia, chief technologist and co-founder of Databricks. “Delta Sharing will standardize the way data is securely exchanged between businesses, no matter what storage or computing platform they use, and we’re excited to make this innovation open source.”
Because Delta Sharing eliminates vendor blockage, it enables a much broader and more diverse set of use cases than ever before.
For example, an academic institution and a hospital system teaming up for vaccine research would have a standard and simple way to securely share research data and collaborate on their findings – without being limited by formats. proprietary data or different applications and tools, and without requiring such complex configuration. like installing the same data warehouse software in both organizations. Or, an aircraft engine manufacturer would have a standard way to access engine performance data from all the different airlines it serves, even though each airline uses a different set of systems to store and manage these. data.
Delta Sharing extends the applicability of the Lakehouse architecture that organizations are rapidly adopting today, as it enables an open, simple and collaborative approach to data and AI within and now between organizations.
A new open standard for secure data sharing between organizations
Backed by Delta Lake 1.0 and boasting a vendor-independent governance model supported by the Linux Foundation, Delta Sharing establishes a common standard for sharing all types of data with an open protocol that can be used in SQL, visual analysis tools and programming languages such as Python and R.
Delta Sharing also allows organizations to seamlessly share existing large-scale datasets in Apache Parquet and Delta Lake formats in real time without copying them, and can be easily implemented into existing software that supports Parquet.
The introduction of Delta Sharing marks the latest advancement in Databricks’ pursuit of fostering an open and democratized data and artificial intelligence ecosystem. Recognizing that innovation thrives through collaboration, not isolation, Delta Sharing builds on Databricks’ long-standing commitment to the open source community and adds to a rich catalog of open source projects, including including the widely adopted Delta Lake, Apache Spark, MLflow, and Koalas projects downloaded over 15 million times per month by data teams around the world.
Vendor neutral flexibility to consume, analyze and visualize shared data with tools of choice
Delta Sharing provides built-in security controls and easy-to-manage permissions that help ensure privacy and compliance needs are met when data assets are securely shared between organizations.
Delta Sharing also allows organizations to confidently share data between vendors and partners while giving each of these data teams the flexibility to query, visualize and enrich this shared data with their tools of choice, including Azure. Purview, GCP Big Query, AtScale, Collibra, Dremio, Immuta, Looker, Privacera, Qlik, Power BI, and Tableau.
“The ability to easily access, analyze and share data is critical to driving innovation and building truly data-driven organizations,” said François Ajenstat, Chief Product Officer at Tableau. “Establishing a new open standard for data sharing aligns with Tableau’s mission to democratize data and empower anyone to make faster, smarter decisions. We look forward to supporting the future of Delta Sharing and helping our customers enjoy the flexibility of an open and collaborative data ecosystem. “
As an open protocol for sharing data securely between organizations, supported by the Delta Lake open source project, Databricks and business partners:
“We support Delta Sharing and its vision of an open protocol that will simplify secure data sharing and collaboration between organizations. Delta Sharing will improve the way we work with our partners, reduce operational costs and allow more users to access a full suite of Nasdaq’s data suite to discover insights and develop financial strategies, ”said Bill Dague, Head of Alternative Data, Nasdaq.
“Our investment in Azure Data Share reflects the vision we share with Databricks: data sharing must be open. We believe that Delta Sharing fits this vision well. We are excited to move forward with Databricks in our shared goals of supporting an open data ecosystem, ”said Mike Flasko, associate director, program management at Microsoft.
“Google Cloud and Databricks share a common vision of making data accessible, actionable and open to help businesses make informed decisions in today’s rapidly changing environment,” said Sudhir Hasbe, director of product management at Google Cloud. “We’re excited to bring Databricks to Google Cloud and support data accessibility and portability through solutions like BigQuery to ensure organizations can securely share data and discover new and unique information. “