Talk

Using Open Source Tech to Swap Out Components of Your Data Pipeline

Conference
Big Data & Machine Learning

A few years ago, moving data between applications and datastores included expensive monolithic stacks from large software vendors with little flexibility. Now, with frameworks such as Apache Beam and Apache Airflow, we can schedule and run data processing jobs for both streaming and batch with the same underlying code. This presentation demonstrates the concepts of how this can glue your applications together and shows how we can run a data pipeline from Apache Kafka through Hadoop Flink to Hive and move this to Pub/Sub, Dataflow, and BigQuery by changing a few lines of Java in our Apache Beam code. We will be looking at how this can be deployed in different cloud solutions, like Oracle Cloud or any other cloud out there.

Cloud.
Data Streaming
Open Source
Serverless

Rustam Mehmandarov

Computas

Passionate computer scientist. Leader of JavaZone, ex-leader of Norwegian JUG – javaBin. A Java Champion, Google Developers Expert (GDE) for Cloud, and JavaOne Rockstar. Public speaker.

Talks by tracksTalks by session typesList of Speakers