When it comes to large language models on edge devices, there’s arguably one metric that matters the most: time to first token, or TTFT. This is the time lag or latency between when you submit a ...