← Echo Blog

Why Echo Keeps Your Voice on Your Device

By Xiang · April 16, 2026 · 5 min read

Two months ago I started building a voice keyboard. Before writing a single line of code, I spent a week mapping out what every existing voice tool did with your audio.

Here's what I found:

ToolWhere your voice goesRecording stored
Wispr FlowTheir serversYes (with opt-in "privacy mode")
Otter.aiTheir cloudYes, all of it
Google Voice TypingGoogle serversYes
Apple DictationOn-deviceNo (not saved anywhere)
EchoCloud for recognition, then discardedYes, locally only

Apple Dictation was the closest to my ideal — but it's slow, imprecise, and you can't edit mistakes or use AI to polish your rambling thoughts.

So I built Echo differently.

What "local" actually means in Echo

Three specific promises:

1. Your audio is discarded immediately after recognition.

When you speak, the audio gets sent to our ASR service (Doubao Seed-ASR 2.0 by default). The service transcribes it and returns the text. We don't save, log, or keep the audio data anywhere. Neither do we — as the developer — have access to any user audio.

2. Your transcripts stay in your device's local storage.

When Echo saves your recording history so you can review past transcriptions, that file lives in your iPhone or Mac's local sandbox. It's not uploaded to any cloud. It's not synced to our servers. If you delete the app, the data goes with it.

3. No training. Ever.

Your transcripts, your audio, your usage patterns — none of it is used to train any AI model. Not mine, not the ASR provider's, not any third party's.

What about the account?

Echo requires you to sign in if you want Pro (unlimited usage). But the account only stores:

It does not store:

The free tier (5,000 words/week) works without any account at all.

Why this matters

Most voice input tools treat your voice as fuel for their product. Your speech trains their models. Your transcripts inform their roadmap. Your patterns feed their analytics.

I wanted a tool where your voice is just for you.

Not because I'm a privacy absolutist (I'm not — I use ChatGPT, I use Google Maps). But because voice input is deeply personal. You use it for messages to family, private notes, work emails, half-formed thoughts you're trying to get out of your head. That stuff deserves to stay yours.

Technical specifics (for the curious)

The ASR layer uses Volcano Engine's Doubao Seed-ASR 2.0 — a state-of-the-art Chinese + multilingual model (13 languages, +20% keyword recall over the previous generation). When you speak:

  1. Audio captured locally via iOS AVAudioEngine
  2. Streamed (or batched) to Volcano's servers via WebSocket / HTTPS
  3. Server returns transcription text
  4. Audio data is not retained on their side per the service agreement
  5. Text returned to Echo, saved to local SQLite on your device
  6. AI polish (OpenAI or Volcano Doubao) runs on the text, not audio

API keys are stored in Apple Keychain (biometric-locked, same as your password manager).

The trade-off with full on-device ASR

You might ask: why not run ASR fully on-device like Whisper?

I considered it. Whisper Small (500MB) or SenseVoice (900MB) can run on iPhone via CoreML. It's on the roadmap for Pro users who want zero-cloud operation.

The trade-off today: cloud ASR is still more accurate for Chinese and multilingual content. Seed-ASR 2.0 beats Whisper Large on Mandarin benchmarks. On-device Whisper Small would be a noticeable accuracy drop.

My current compromise: Send audio to cloud for recognition (discarded immediately), but keep everything else — history, analytics, AI polish context — on your device. This gives 95% of the privacy benefit with 100% of the accuracy.

What Echo doesn't do

Being honest about what's not pure:

For maximum privacy, wait for the on-device Whisper option (planned for later this year).

Try it

Free to download, works without an account:

Download Echo (iOS) →

Questions or concerns? DM me on X @EchoVoiceApp or email hello@echovoice.me.

— Xiang, solo maker of Echo