Dremio Darts software turns data lakes into data warehouses – Blocks and Files

The latest version of Dremio Data Lake Analysis provides more than five times faster processing of SQL expressions than its June release.

Dremio provides in-memory software, powered by Apache Arrow, to analyze data stores using SQL. Its Dart initiative aims to make it possible to run SQL workloads directly on source data lakes rather than on data warehouses powered by extract, transform and load procedures. It has pumped up software updates to help achieve that goal.

Tomer Shiran, Founder and Product Manager at Dremio, said: “We want to push the boundaries of what is possible in the data lakehouse and deliver the best [business intelligence] experience for our customers. To that end, the Dart initiative has reduced the area of ​​confusion between data lakes and warehouses in critical areas such as query performance and acceleration, SQL coverage, and transactionality.

The “zone of confusion” is a Gartner phrase referring to the overlap of analytical processing of data lakes and warehouses. You could say that the idea of ​​Shiran’s data lakehouse adds verbal confusion. The purpose of running ETL procedures is to get more structured data into a data warehouse so that analytical routines can deliver results faster.

This latest version of the software provides near real-time metadata refresh for datasets by refactoring metadata processing to become a parallel executor-based process, with metadata stored and managed in Apache Iceberg tables. This offers metadata refresh times up to 20 times faster than previous versions, and performance improves as the size of the dataset increases. This graphic illustrates the point:

The company claims that the Arrow Gandiva component is an LLVM-based toolkit that enables vectorized execution directly on Arrow buffers in memory, generating code to evaluate SQL expressions that take full advantage of the pipeline and SIMD capabilities of modern processors. . This latest version of the Dart Initiative allows Dremio to speed up expression processing rates by more than 5 times, and in some cases 30 times, as another graphic shows:

The latest version adds Pivot / Unpivot functions and filtered aggregates. Dremio claims that analyzing risk in insurance, maximizing revenue in travel and transportation, improving clinical trials in the pharmaceutical industry, and enabling credit risk assessment in the banking industry are making part of the use cases that can benefit from it.

Source link

Steven L. Nielsen

Leave a Reply

Your email address will not be published. Required fields are marked *