Siddhesh Sonar

On-Device AI Engineer. Building at RunAnywhere (YC W26).

I run large language models on phones — no server, no API, just native C++ and ARM silicon. Creator of ToolNeuron.


writing

How ToolNeuron Runs LLMs on Android: Architecture Deep Dive

A technical deep dive into running large language models on Android using native C++ inference, JNI bindings, GGML, and llama.cpp — from model loading to token generation.