Roy van Rijn is a director at OpenValue Rotterdam and a Java Champion. He worked on numerous projects all over the Netherlands as developer, architect and agile coach. He talks at conferences all around the world, CodeOne (Rockstar), Devoxx BE & UK & PL, GOTO, Joy of Coding, and local JUG events. You can read more at his blog (http://www.royvanrijn.com) or follow him on Twitter/X (@royvanrijn).
Your mission, should you decide to accept it, is the following: aggregate temperature values from a CSV file and group them by weather station name. There’s only one caveat: the file has one 1,000,000,000 rows!
This is the task of the “One Billion Row Challenge” which went viral within the Java community earlier this year. Come and join me for this talk where I’ll dive into some of the tricks employed by the fastest solutions for processing the challenge’s 13 GB input file within less than two seconds. Parallelization and efficient memory access, optimized parsing routines using SIMD and SWAR, as well as custom map implementations are just some of the topics which we are going to discuss.
I will also share some of the personal experiences and learnings which I made while running this challenge for and with the community.
Last January a challenge was posted online by Gunnar Morling:
How fast can you parse a file with one billion rows of weather data using Java?
Little did we know this deceivingly simple question would lead us down a path that covered: parallelism, memory mapped files, SWAR techniques (SIMD as a register), bit twiddling, branchless code, mechanical sympathy, Graal native compilation and finally... turning to the dark side: using sun.misc.Unsafe.
Join Thomas and Roy during this deep dive where they explain all the improvements and tricks that enabled them to go from a >4 minute reference implementation to processing a billion rows in under two seconds.
Who knew Java could be _this_ fast?
Searching for speaker images...