Industry researcher, member of the GraalVM team, working on Espresso: a meta-circular Java bytecode interpreter.
Large Language Models (LLMs) have become essential in many applications, but integrating them effectively into Java environments can still be challenging. This session will explore practical approaches to implementing local LLM inference using modern Java.
We'll demonstrate how to leverage the latest Java features to implement local inference for a variety of open-source LLMs, starting with Llama 2&3 (Meta). Importantly, we'll show how the same approach can easily be extended to run other popular open-source models on standard CPUs without the need for specialized hardware.
Key topics we'll cover:
- Implementing efficient LLM inference engines in modern Java for local execution
- Utilizing Java 21+ features for optimized CPU-based performance
- Creating a flexible framework adaptable to multiple LLM architectures
- Maximizing standard CPU utilization for inference without GPU dependencies
- Integrating with LangChain4j for streamlined local inference execution
- Optimizing performance with Java Vector API for accelerated matrix operations and leveraging GraalVM to reduce latency and memory consumption.
Join us to learn about implementing and optimizing local LLM inference for open-source models in your Java projects and creating fast and efficient AI applications using the latest Java technologies.
Searching for speaker images...