Skip to content
Blueprint Technologies - Data information specialists
Main Menu
  • What we do

      Artificial Intelligence

      Intelligent SOP
      Generative AI
      Video analytics

      Engineering

      Application development
      Cloud & infrastructure
      Lakehouse optimization

      Data & Analytics

      Data platform modernization
      Data governance
      Data management
      Data migration
      Data science & analytics

      Strategy

      TCO planning
      Productization
      Future proofing
  • Industries

      Manufacturing

      Enhance productivity and efficiency through tailored technology solutions, optimizing processes, and drive innovation in manufacturing operations.

      Retail

      Revolutionize customer experiences through innovative technology solutions for seamless shopping journeys and enhanced retail operations.

      Health & Life Sciences

      Advance healthcare outcomes and pharmaceutical innovations through cutting-edge technology solutions and data-driven strategies.

      Financial Services

      Empower financial institutions with secure and scalable technology solutions, driving digital transformation, and personalized customer experiences.

  • Databricks

      Databricks
      Center of Excellence

      Maximize your Databricks experience with our comprehensive Center of Excellence resources and support.

      QuickStarts

      Proof-of-value projects designed to get you started quickly on Databricks.

      Accelerated Data Migration

      Regardless of the source, we specialize in migration your data to Databricks with speed and quality.

      Unity Catalog Migration

      Accelerate your UC migration and minimize errors with our meticulously tested Brickbuilder approved solution.

      Lakehouse Optimizer

      Get higher return on your investment and minimize your total cost of ownership with self-facilitated optimization.

      Accelerated Snowflake to Databricks Migration

      Unlock increased cost savings, heightened operational efficiency, and enhanced analytical capabilities. 

  • Our work
  • Insights
  • About

      Our Approach

      Discover our holistic approach to uncovering strategic opportunities.

      Careers

      Explore exciting career opportunities and join our team today.

      News

      Get the latest updates and insights about our company.

      Events

      Stay updated on upcoming events and webinars.

      Our Partners

      Get to know our trusted technology partners and collaborators.

Connect
Blueprint Technologies - Data information specialists

The business case for Apache Beam

By Gary Nakanelua

You’ve just learned about a new streaming data processing technology that would solve many of the technical challenges you are experiencing within your organization today. Unfortunately, it would require significant time and budget to integrate and operationalize within your current solution.

Enter Apache Beam.

According to the main website, “Apache Beam provides an advanced unified programming model, allowing you to implement batch and streaming data processing jobs that can run on any execution engine.” It’s analogous to a general contractor; they utilize specialized subcontractors to perform the work yet you only have to interact with the general contractor. If you need a new roof on your home because a previous subcontractor did a sub-par job, you only have to work with the general contractor. They don’t have to rebuild the entire house; they simply hire a new subcontractor to put on a new roof.

Dealing with “Out of Scope”

Today’s agile sprint teams are driven by their solution backlog. This backlog is filled with bugs, feature requests and spikes written to address needs that should be delivered by the current solution. Yet, how often does a feature get requested, only to have the technical team dismiss it as “out of scope”? They note the original specification document didn’t include any mention of the need for stateful computations, event-time windowing or some other fancy set of words used to describe the technical approach to address your request. “If only you had made it part of the original requirements,” they say, “then we could have accounted for it in our architecture and approach”.

So another project team is started. One tasked to create the “v-next” version of the original solution that will include all current functionality plus the new features requested. It will be leaner, meaner and created in the latest technology so as to avoid the mistakes of the past. “It will scale with all your needs” the super motivated project team touts. Product backlogs are created. Releases are made. The world rejoices until an “out of scope” feature is requested. Then the cycle repeats itself. As a decision maker, how do you break this cycle?

Enter Apache Beam.

Beam gives you a unified, portable and extensible solution from which to answer your top level streaming architecture decisions. I’ve had the pleasure of meeting and talking with Andrew Psaltis, author of “Streaming Data: Understanding the real-time pipeline” on several occasions. In his Apache Beam presentation at QCon in 2016, he noted:

“You can switch to whatever is more performant, more scalable, maybe something that requires a smaller footprint. Whatever your requirements are, it becomes easy to switch”.

You can view his presentation in its entirety at https://www.infoq.com/presentations/apache-beam.

Encouraging The “New Hotness”

Engineers and developers love working with new frameworks, libraries and api’s. Whether it’s for performance, ease of development, speed of deployment or just intellectual curiosity, the desire to utilize < insert new technology here /> will always be a topic of conversation within technical teams.

Consider stream processing computation engines. In the last six years, we’ve seen Storm, Spark, Flink and Apex grow in popularity (to name a few). Each is/was the “new hotness” and all promise scalable, performant and fault tolerant solutions to today’s streaming data problems. In practice, they all have their pros and cons when used within a solution for any given organization. How do you enable a technical team to stay relevant, curious and motivated to experiment with the next big thing without draining your budget?

Enter Apache Beam.

Admittedly, my interest in Apache Beam grew from a conversation I had with another engineer, Ryan Harris, at a local Apache Spark meetup. I’ve spent a lot of time with Spark and wanted to see what his excitement was all about.

I ran through the Python quick start at https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python with a local runner. Next I gave it a go with Google’s Cloud Dataflow runner. Finally, I ran it using the Spark runner. Aside from a few local development environment configuration adjustments (those were my own fault), Apache Beam let me experiment with capabilities from a few different technologies quickly.

You can check out the current Apache Beam capability matrix at https://beam.apache.org/documentation/runners/capability-matrix/. Don’t see the latest technology listed? Apache Beam is open source and has well-documented SDK’s so new runners can be created. Plus, Apache Beam is a core component of Google’s Cloud Dataflow service, so look for new additions to Apache Beam all the time.

Conclusion

As a decision maker, you want the peace of mind that a technical solution can scale with future business needs and enable innovation within your organization through technology experimentation. Apache Beam is a worthwhile addition to a streaming data architecture to give you that peace of mind.

Share with your network

You may also enjoy

Classic vs. Serverless: Exploring Databricks’ latest Innovations

Explore the benefits of Databricks’ serverless solutions, which simplify resource management, improve productivity, and optimize costs. Discover key insights and best practices to enhance your data strategy with cutting-edge serverless technologies.

Help for FinOps Leaders – How the Lakehouse Optimizer can assist with your Lakehouse 

Discover how FinOps leaders manage cloud and data costs effectively while maximizing business value. Learn how the Lakehouse Optimizer (LHO) addresses common business problems through discovery, optimization, and operation.
Blueprint Technologies - Data information specialists

What we do

  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations
  • Generative AI
  • Video analytics
  • Application development
  • Cloud and infrastructure
  • Data platform modernization
  • Data governance
  • Data management
  • Data science and analytics
  • TCO Planning 
  • Productization
  • Future Proofing
  • Intelligent SOP
  • Lakehouse Optimization
  • Data Migrations

Industries

  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services
  • Manufacturing
  • Retail
  • Health & Life Sciences
  • Financial Services

Databricks

  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration
  • Databricks Center of Excellence
  • QuickStart Offerings
  • Accelerated Data Migration
  • Accelerated Unity Catalog Migration
  • The Lakehouse Optimizer
  • Accelerated Snowflake to Databricks Migration

About

  • Our approach
  • News
  • Events
  • Partners
  • Careers
  • Our approach
  • News
  • Events
  • Partners
  • Careers

Insights

Our work

Support

Contact us

Linkedin Youtube Facebook Instagram

© 2024 Blueprint Technologies, LLC.
2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.

Media Kit

Employer Health Plan

Privacy Notice