What is a semantic voice keyboard?
A semantic voice keyboard is not just speech-to-text. It is a full input system where voice, visible text, correction, personal vocabulary, and keyboard editing work together.
Most dictation tools optimize the recording. You speak, wait, receive a transcript, and then discover whether the tool understood you. That works for short commands, but it breaks down when you are writing a real message, a long email, a product note, or a bilingual reply.
Echo starts from a different assumption: voice input should feel closer to typing. Text should appear while you speak. It should stay in the current text field. If the output is almost right, you should be able to fix it without switching tools. If the raw transcript is too rough, you should be able to polish it without leaving the keyboard.
The three-layer loop
| Layer | What it does | Why it matters |
|---|---|---|
| Stream | Shows words while you speak. | You do not wait in a blank loading state. |
| Finalize | Repairs homophones, punctuation, missing words, and boundaries after you pause. | The live draft becomes more reliable without losing speed. |
| Auto Edit | Turns speech into a cleaner message, email, list, translation, or rewrite. | AI is available when you want it, not forced into every input. |
Why the keyboard matters
Dictation often fails at the final ten percent. One name is wrong. A technical term is split. Chinese punctuation is off. A sentence should be shorter. If voice input lives in one tool and editing lives in another, the user pays a context-switching tax.
Echo puts the loop inside a keyboard on iOS and a quick input path on macOS. That means the output lands where you are already writing: Messages, Mail, Notes, Slack, ChatGPT, Cursor, Gmail, Notion, WeChat, or any normal text field.
Who it is for
- People who write many short replies and do not want to type every sentence.
- Bilingual users who switch between Chinese and English inside the same thought.
- Founders, engineers, PMs, salespeople, and creators who turn rough thoughts into daily written output.
- Anyone who likes voice input but hates cleaning up transcripts afterward.
The core promise is simple: speak naturally, see editable text quickly, and keep control of the final wording.