Text to Speech
Free text to speech with natural AI voices. Adjustable rate & pitch, word highlighting for proofreading, accessibility reader. Private, offline, no signup.
About the Text to Speech Tool
This free text-to-speech reader uses the Web Speech API built into modern browsers, so every word is spoken locally on your device using your system's natural and neural (AI) voices. No text is uploaded, nothing is stored on a server, and the tool works offline once the page is loaded — no signup required. Choose any voice your operating system provides, tune the speaking rate, pitch and volume, and follow the live word highlight while it speaks. A sentence-chunking engine reads long documents end-to-end without Chrome's usual cut-off, and the progress bar tracks the real reading position. It is ideal for proofreading drafts aloud, learning pronunciation in a foreign language, recording quick voice-overs, studying with dyslexia or ADHD, and making content accessible like a screen reader. To save the speech as MP3, play it while capturing system audio with your OS recorder (Windows Game Bar, macOS QuickTime) or a virtual audio cable into Audacity.
How does this text-to-speech tool work?
The tool calls the browser's built-in window.speechSynthesis interface, part of the W3C Web Speech API. When you click Speak, your text is sent to the operating system's speech engine — for example, Microsoft Speech Platform on Windows, AVSpeechSynthesizer on macOS and iOS, Google Text-to-Speech on Android and Chromebooks, or eSpeak NG on many Linux distributions. The engine generates audio waveforms locally and plays them through your speakers. No data leaves your device, which is why the tool is fully private and works without an internet connection once the page is loaded. Each voice you see comes from that operating system, so the list of voices changes depending on which device and OS you are using.
Why do I see different voices on different devices?
Voices are not bundled with the website; they are bundled with your operating system, browser, and any extra language packs you install. A fresh Windows 11 machine typically ships with Microsoft David and Zira in English plus one default voice per installed display language. macOS includes Siri voices and dozens of legacy AppleScript voices like Samantha, Daniel and Karen. Android devices use the Google Text-to-Speech engine, which can download additional high-quality voices on demand. Chromebooks add Google natural voices over the network. To get more voices, open your OS settings, look for a Speech, Voice Access or Language pack option, and install the languages or voice qualities you want — they will appear in this dropdown the next time you load the page.
What do the rate, pitch and volume sliders do?
Rate controls speaking speed, ranging from 0.5x (half speed) to 2.0x (double speed). A rate of 1.0 is the voice's natural cadence, around 150 to 180 words per minute for most English voices. Pitch shifts the fundamental frequency of the voice: 0 sounds very low and growly, 1.0 is the natural pitch, and 2.0 is a high cartoon-like tone. Volume scales playback from silence (0) to maximum (1.0); this is independent of your system volume, so set both for the final level. Try a few combinations to find a voice you can listen to comfortably for long periods — many listeners prefer 1.1x rate with a slightly lower pitch for sustained reading.
Can I save the spoken audio as an MP3 or WAV file?
Not directly. The Web Speech API exposes only playback; it does not return the raw waveform to JavaScript, so the page has no way to encode the speech into an audio file. This is a deliberate browser restriction to protect proprietary OS voices from being redistributed. To capture audio, use your operating system's built-in screen recorder (Windows Game Bar, macOS QuickTime Player, Chromebook Screen Capture) or a virtual audio cable plus any free audio recorder while the tool is playing. For an automated file export, you would need a cloud TTS service such as Amazon Polly, Google Cloud TTS, or Microsoft Azure Speech — these return MP3 or WAV but are paid services.
Why does speech cut off or stop unexpectedly in Chrome?
Chrome has a known limit of around 15 seconds per utterance and may silently stop long passages. The tool mitigates this by sending each Speak request as one utterance and by issuing a resume() nudge right after speak(), which keeps the engine awake on most recent Chrome versions. If you still hit truncation, split long passages into shorter paragraphs and click Speak again per paragraph, or switch to Microsoft Edge which uses higher-quality Azure voices with no such limit. Firefox and Safari handle long utterances reliably. Pausing and resuming repeatedly can also cause Chrome to drop the queue; a single Stop followed by Speak is the safest recovery.

How can I control pronunciation and pauses?
The Web Speech API does not accept SSML markup in most browsers, so pacing has to be done through punctuation. Commas insert a short pause of about 150 ms, semicolons and dashes give a mid-length pause, periods and question marks add a longer stop with intonation. To force a multi-second silence, place an ellipsis or a row of dots on its own line. For pronunciation, you can phonetically respell tricky words — for example writing 'Vietnam' as 'vee-et-nam' or 'IPv6' as 'I P V six'. Acronyms in all caps are usually read letter by letter, while mixed case is read as a word. Test different spellings and pick the one that sounds best with your chosen voice.
Is this tool really private?
Yes. All processing happens inside the browser tab using your operating system's local speech engine. The text you type never leaves your computer; we do not send it to our server, to any analytics platform, or to any third-party TTS provider. You can verify this by opening your browser's developer tools, switching to the Network panel and clicking Speak — no outgoing requests are made. The single exception is Chromebook 'natural' voices, which Google delivers over the network and which clearly say 'natural' in the voice name; if privacy is critical, deselect those and choose a voice marked as local-only or system-default.
What are the best settings for recording a voice-over?
For clean placeholder narration, start with a natural or neural voice — on Edge look for the Microsoft 'Online (Natural)' voices, on macOS the Siri voices, on Android the Google network voices, which all sound far more human than legacy eSpeak voices. Set the rate between 0.95x and 1.1x: slightly under 1.0 reads more deliberately and is easier to edit, while a touch over 1.0 keeps energy up for explainer videos. Keep pitch at 1.0 unless you want a deliberately deeper or brighter character, and set volume to 100% so your recorder captures the strongest signal, then trim levels afterwards. Because the Web Speech API cannot export audio directly, route the playback into your OS recorder: on Windows use the Game Bar or a virtual audio cable into Audacity, on macOS use QuickTime audio recording with system audio. Punctuate carefully — commas and periods control your pacing and breaths — and use the live word highlight to follow along and catch any mispronounced terms before you hit record.
How do I read very long documents without the audio cutting off?
Leave the 'Auto-split long text' switch on (it is enabled by default). Instead of sending your whole document as one request — which Chromium silently stops after about 15 seconds — the tool breaks the text into sentence-sized chunks using the browser's sentence segmenter and speaks them back-to-back, re-applying your chosen voice, rate, pitch and volume to every chunk. Very long sentences are further wrapped at the nearest comma or space so no single chunk hits the engine limit. The progress bar and the in-text highlight are driven by the real boundary position reported by the speech engine, not a clock estimate, so they stay accurate at any speed, voice or language — including non-English voices where word counting is unreliable. This means a 5,000-character article, a chapter, or a full script reads from start to finish on Chrome, Edge, Firefox and Safari without manual paragraph-by-paragraph clicking. If you ever need the legacy single-utterance behaviour, simply turn the switch off.
Who benefits the most from text-to-speech?
Writers use it to proofread drafts, because the ear catches awkward phrasing and dropped words that the eye glides over. Language learners use it to hear native pronunciation of vocabulary lists. People with dyslexia, ADHD or low vision use it as an assistive reading tool. Podcasters and YouTubers generate quick voice-overs for placeholder narration. Teachers turn handouts into audio versions for accessibility. Developers test interfaces with screen-reader-like output. Drivers and commuters convert articles into hands-free audio. The tool is intentionally lightweight and free so anyone — including users with slow connections or older hardware — can use it without signup, without payment, and without installing anything.
