A global renewable energy leader partnered with us to enhance their pricing capabilities using Databricks, improving revenue efficiency and overcoming data challenges to optimize financial decisions.
Mastering data migration from Cloudera to Databricks
In collaboration with a data-driven real estate investment firm, Blueprint joined its customer on a journey to streamline their data operations. With a focus on migrating from Cloudera to Databricks within a tight timeframe, the Blueprint team executed a seamless transition while addressing critical challenges surrounding data silos, and heavy storage costs. Through meticulous planning, a robust strategy, and hands-on mentorship, Blueprint not only facilitated a successful migration but also empowered its customer with enhanced data governance, automation capabilities, and insightful analytics.
Client Snapshot
Who:
A real estate investment firm
Industry:
Real Estate Investment and Development
Stakeholders:
Head of Engineering and Director of Engineering
Work Summary
What we did:
- Migrated more than 100TB of 20 companies’ complex data from Cloudera to Databricks on AWS
- Addressed critical Geographic Information System (GIS) requirements
- Implemented a robust CI/CD pipeline
- Finalized a security catalog, facilitated an onboarding process, and ensured compatibility with external application dependencies
- Implemented the Lakehouse Optimizer powered by Blueprint to enable cost transparency and offer insights into what they can improve within their Databricks platform
Client background
Our customer is a real estate and investment development company that focuses on the untapped pockets of real estate in the United States. With over $18B in assets under management, they focus on creating innovative solutions that are driven by technology to provide the best service for their investors. Combining data, technology, and analytics allows them to make time sensitive decisions that can positively impact their customer’s investment. The technology they use also allows them to see the value in properties, supporting better selection and risk management strategies.
The Blueprint way
The customer’s goal was to migrate from Cloudera to Databricks in approximately 3 months to save on large costs that they were incurring. The Blueprint team stepped in to help make that transition smooth and to help ensure that no data was lost in the process. The result was a full transition to Databricks within the allotted time frame. Within the 3-month window, the Blueprint team trained and mentored the customer’s engineers to help retain success once the migration was complete.
The challenge
Their biggest challenge was not being able to see all the information they needed, in the right place, at the right time. Most of their data was siloed, and this caused inefficiencies with the business that pulled valuable time away from their data teams. They were facing heavy costs for data storage on Cloudera and were charged an additional licensing fee annually. They also had some constraints with their on-prem server that were not ideal.
The solution
The Blueprint team developed a strategy where the names of the main data pipelines matched those used in the workstream approach. This ensured consistency across the organization, as these key data pipelines were grouped together to form Data Products that were easily recognizable and understandable by the business stakeholders. The team stepped in to learn, build on, and enhance their orchestration tool, allowing them to deploy code in an automated manner. Blueprint used a ‘move-and-improve’ technique to expedite migration and retain critical business logic. This included preserving a complex Java application that calculated tax impacts for real estate investors. Not only were we successfully able to migrate our customers’ data to Databricks, but we also segmented areas for security and privacy reasons, allowing them better control of their data governance procedures. Blueprint shared a daily tracker guide with leadership at each stand-up to demonstrate progress and communicate how the development and testing phases were progressing to the customer.
Impact
- Blueprint's engineering team ensured a complete cessation of data processing activities within Cloudera once each pipeline was promoted to the production Databricks environment.
- The Lakehouse Optimizer was deployed, allowing the customer to potentially save money and reduce waste within their Databricks platform.
- The Blueprint team introduced the customer to CI/CD, which automates processes and thereby reduces the risk of deployment errors.
- The customer can access the right data, in the right place, at the right time, enabling them to make quick decisions and stay ahead of their competition.
Share with your network
You may also enjoy
MissionWired partnered with Blueprint for a successful Unity Catalog migration, optimizing Databricks’ features and Lakehouse spend. The project exceeded expectations with 143 refactored notebooks and comprehensive post-migration training.