Jlama: A Native Java LLM inference engine at Devoxx Belgium 2024

Jlama: A Native Java LLM inference engine

Conference (INTERMEDIATE level)

Thursday from 15:00 15:50

Room 4

In the rapidly evolving landscape of AI, Java developers deserve native building blocks for creating innovative AI applications. While Python has enabled rapid development and is the backbone of model training, production-grade services and enterprises running on Java require access to local models and AI tools.
Enter JLama, a modern inference engine designed to bring the power of AI directly into the Java ecosystem without requiring GPUs. JLama supports most open models like Llama, Gemma, and Mixtral, leveraging the new Vector API in Java 21 for faster inference.
Key features include:

Advanced model support and tokenizer compatibility
Implementation of the latest techniques like Flash Attention, Mixture of Experts, and Group Query Attention
Support for HuggingFace standard model formats and quantization
Distributed inference capabilities

JLama is integrated into the LangChain4j project and, combined with the Java native vector search capabilities of JVector, forms a comprehensive AI stack for Java.
This talk will delve into JLama's technical intricacies and practical applications, including a live demo. Discover how JLama revolutionizes Java-AI integration, paving the way for innovative applications that harness the full potential of large language models.

Jake Luciani

DataStax

Jake Luciani is a seasoned professional with over 20 years of expertise in distributed systems, finance, and manufacturing. Currently, he serves as the Chief Architect at DataStax, where he oversees the Astra serverless DBaaS. Jake is an active member of the Apache Foundation and contributes to the project committees of Apache Cassandra, Arrow, and Thrift. He holds a B.S. in Computer Science with a minor in Cognitive Science from Lehigh University.

Talk