turned my laptop into a personal cloud inference node today. the stack: engine: lm studio / quantized 4-bit gguf. ingress: ngrok for the public https tunnel. frontend: custom html/js utilizing the streams api for real-time token delivery. basically built a private, zero-cost api. i can hit my local cpu from my phone or any remote device now. shared the static frontend with @beeta (Noywrit) to test concurrency is stable and latency is minimal via the local relay.