On-Device AI Engineer. Building at RunAnywhere (YC W26).
I run large language models on phones — no server, no API, just native C++ and ARM silicon. Creator of ToolNeuron.
A technical deep dive into running large language models on Android using native C++ inference, JNI bindings, GGML, and llama.cpp — from model loading to token generation.