Unifying table storage with Delta Lake’s new feature: UniForm – Part 1

By the Blueprint Team

Delta Lake is the leading open-source storage framework that empowers users to build a comprehensive Lakehouse. With its integration with popular compute engines like Spark, PrestoDB, Flink, Trino, and Hive, and compatibility with a wide range of APIs, including Scala, Java, Rust, Ruby, and Python, Delta Lake has paved the way for data solutions that are both flexible and powerful.
One of Delta Lake’s significant strengths is its simplicity. It seamlessly integrates your ETL processes, data warehousing, and machine learning tasks within your Lakehouse, simplifying operations and enhancing efficiency. With its commitment to open source, it’s no surprise that Delta Lake has been battle-tested in 10,000+ production environments, showcasing its reliability and production-readiness.
 
Delta Lake is known for features like ACID transactions, scalable metadata handling, time travel capabilities, unified batch/streaming, schema evolution/enforcement, and comprehensive audit history. Plus, it provides SQL, Scala/Java, and Python APIs to merge, update, and delete datasets, cementing its reputation as a versatile and user-friendly data lakehouse solution.
 
With the 2023 Data and AI Summit announcement, we’re excited to share our experiences with the private preview of Delta Lake’s latest feature: UniFormUniForm unifies table storage formats by supporting Apache Iceberg and Apache Hudi. In this post, we will quickly share our initial impressions of UniForm, outline the testing approach we used to validate its features, and discuss the implications of this new feature on the future of data lake architectures.

The Need for UniForm

Organizations make strategic decisions and investments in specific lakehouse table storage formats that best suit their needs. We’ve seen this firsthand with organizations of all sizes across various industries, including retail, insurance, oil & gas, and technology. These decisions, while critical at the time, can later pose challenges as technologies evolve, and new solutions emerge.

One of these challenges is migrating to a different table storage format. Addressing this challenge can be daunting, fraught with risks and complexities. The process involves significant time, resources, and expertise and can disrupt ongoing data operations. Moreover, there’s the risk of data loss or corruption during the migration and the potential of compatibility issues with existing tools and workflows.

Enter Delta Lake’s new feature, UniForm. Recognizing the need for a smoother transition between different storage formats, UniForm unifies these formats, thereby reducing the barriers to adopting new technologies. By adding support for Apache Iceberg and Apache Hudi, UniForm allows organizations to leverage their existing investments in table storage formats while benefiting from the added flexibility and capabilities offered by Delta Lake.

With UniForm, transitioning to new Lakehouse table storage formats no longer needs to be disruptive. Instead, it becomes an opportunity to enhance data operations and drive innovation. It is insanely quick to get started with too.

Testing UniForm

Our initial test focused on AWS and Apache Iceberg, partly because a portion of our customers are intimately familiar with S3. We also tested UniForm on Azure Data Lake Store (ADLS), but we will save that for Part 2.

For those that want to follow along yet may be unfamiliar with an Apache Iceberg lakehouse setup, we recommend checking out Project Nessie and this great notebook from Dremio.

Some may notice that this testing setup leans towards a Snowflake environment. The promise of UniForm is unifying table storage formats because “the rising tide lifts all boats.” We are opinionated but, for the benefit of our customers, maintain a platform-agnostic view of the Lakehouse.

So, with UniForm, that means that data stored in Delta Lake can be accessed (read/write) as if it were natively stored Iceberg. Before we continue, it’s worth noting that if you are a Snowflake customer, support for Apache Iceberg tables must be enabled on your account

How easy is UniForm to enable? It's simply a property on your table.​

(If you’re curious, you can see how Github CoPilot explains this SQL code on the left)

Part 1: Initial Impressions

The Lakehouse architecture promises the best of data warehouse and data lake paradigms. It transcends vendors and unlocks the potential for incredible value from an organization’s data into new data products. For example, our Large Language Model (LLM) Center of Excellence  recommends a Lakehouse as a critical first step in a cohesive data and AI strategy. Organizations shouldn’t be punished for something like metadata formats.

As we began exploring UniForm, the simplicity stood out immediately. Unifying different table formats under one umbrella is a game-changer in data lake architectures. The addition of support for Apache Iceberg and Apache Hudi is significant, and it opens up new possibilities for organizations looking to optimize their data operations.

Stay tuned for part 2 where we took a look at UniForm testing in Azure.

Databricks Center of Excellence 

Your One-Stop Databricks Partner

Contact us to get started now.

Share with your network

You may also enjoy

Thanks for reaching out.

We'll be in touch soon.

In the meantime, find out more about what we do and read some of our cross-industry success stories.