
Kohei TokunagaNTT, Inc
Web browsers offer portable execution environments such as WebAssembly and WebGPU, making them convenient platforms for running LLMs. However, they can’t run models that exceed their memory capacity, which limits the range of the models they can handle.
In this talk, Kohei will introduce LLMlet, a llama.cpp-based on-browser model runner with support for distributed LLM inference across browsers. This enables the model to be split and executed across multiple P2P-connected browsers by integrating llama.cpp’s distributed inference feature (RPC) with WebRTC. The talk will provide a deep technical dive, explore potential use cases, and share the current status of integration with community tools.
Early Bird
Conference Ticket WASM I/O 26
Until December 4th
Barcelona
Mar • 19- 20 • 2026
2-Day Conference
AXA Convention Center
Standard
After 4th Dec
Until February 19th
Barcelona
Mar • 19- 20 • 2026
2-Day Conference
AXA Convention Center
Late Bird
After 19th Feb
24 Feb 26 - 18 Mar 26