Skip to content

SpeakClean: a Typeless clone with a local model

· 3 min read

Overview

SpeakClean is a macOS menu bar app inspired by Typeless, scoped down to the one workflow I actually use: hold a hotkey to dictate, release to paste cleaned text into the active app. Mic audio is transcribed on-device with Apple’s SpeechAnalyzer, and the transcript is rewritten by a locally-served Gemma 4 E2B (via Ollama) to strip filler words. That’s the whole app — no feature parity with Typeless intended, no network calls, no API keys, no subscription.

Architecture

Things I wanted but didn’t ship

Learnings

1. Prompt caching still helps when the model is resident

Ollama keeps the Gemma weights in memory across requests, but that alone didn’t give me the latency I wanted. Reusing a stable cached prefix (system prompt + dictionary) on each request made a noticeable difference on top of that. “Model loaded” and “prompt cached” aren’t the same thing.

2. Claude Code is weak on macOS / Swift API domain knowledge

It often confidently asserts the wrong thing for AppKit-level specifics and then writes code against its (wrong) mental model.

The clearest example in this project: listening for a global shortcut on macOS needs Accessibility permission, not Input Monitoring. The agent kept reaching for Input Monitoring and burned a long debugging loop around a permission that wasn’t the actual problem. Web search didn’t surface the fix either. I had to work it out by hand.

3. Agent coding can’t automate OS-interaction testing yet

A lot of the bugs only surfaced through real interaction — pressing the shortcut from different foreground apps, recording for different durations, checking that the pasted text actually lands in the right field. The agent can run swift test, but it can’t hold down a key, speak into a mic, or verify what ended up in Notes. Manual testing was the tight feedback loop for this project, and that shapes how the work divides between me and the agent.