Gunnar Morling is a software engineer and open-source enthusiast by heart, currently working at Decodable on real-time ETL based on Apache Flink. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.
Your mission, should you decide to accept it, is the following: aggregate temperature values from a CSV file and group them by weather station name. There’s only one caveat: the file has one 1,000,000,000 rows!
This is the task of the “One Billion Row Challenge” which went viral within the Java community earlier this year. Come and join me for this talk where I’ll dive into some of the tricks employed by the fastest solutions for processing the challenge’s 13 GB input file within less than two seconds. Parallelization and efficient memory access, optimized parsing routines using SIMD and SWAR, as well as custom map implementations are just some of the topics which we are going to discuss.
I will also share some of the personal experiences and learnings which I made while running this challenge for and with the community.
As the saying goes: nothing is older than yesterday’s news, uhm, data. Join us for an immersive hands-on lab to explore real-time ETL using the triumphant trio Apache Flink, Debezium, and LangChain4j.
Participants will gain practical experience in setting up different end-to-end real-time data pipelines, streaming data from an operational database to an analytics data store—continuously, efficiently, and with a very low latency—enabling use cases such as full-text search and live dashboarding, enriched with LLM-derived metadata.
In the lab, you will learn how to:
- Build a real-time data pipeline from Postgres to OpenSearch, based on Apache Flink and Debezium for change data capture (CDC)
- Use Flink's connector capabilities to set up seamless real-time ETL pipelines between various data sources and sinks
- Implement data transformations, filtering, and aggregations on top of CDC streams in real time with the help of streaming SQL
- Integrate a large language model (LLM) for sentiment analysis based on LangChain4j, enabling deeper insights into the processed data
Join this lab to advance your skills in working with real-time data and learn how robust and leading open-source technologies support your business-critical stream processing workloads.
please pull the following Docker images onto your laptop before.
This will save some time and network bandwidth on the day of the event:
docker image pull quay.io/debezium/example-postgres:2.7.3.Final
docker image pull quay.io/debezium/tooling:latest
docker image pull docker.io/opensearchproject/opensearch:1.3.19
docker image pull docker.io/flink:1.19.1-scala_2.12-java17
docker image pull docker.io/hpgrahsl/hol-devoxxbe-model-serving-app:1.0.0
docker image pull docker.io/hpgrahsl/hol-devoxxbe-review-app:1.0.1
docker image pull docker.io/hpgrahsl/data-generator:1.1.4
Searching for speaker images...