AI Translator
Free private AI translator running NLLB-200 in your browser. Translate 15+ languages with WebGPU acceleration. No upload, no API key, no limits.
About AI Translator
AI Translator is a fully client-side neural machine translation tool that runs Meta's NLLB-200 distilled 600M model directly inside your browser using Transformers.js and WebGPU. Unlike Google Translate, DeepL, or ChatGPT, no text ever leaves your device: the model weights are fetched once from a public CDN and then translation is performed locally on your CPU or GPU, which means lawyers, doctors, journalists, and developers can translate confidential drafts, source code comments, NDAs, and internal documentation without sending them to a third party. The tool supports 15 commonly used languages out of NLLB-200's full 200-language vocabulary including English, Vietnamese, Spanish, Portuguese, French, German, Italian, Russian, Simplified Chinese, Japanese, Korean, Arabic, Hindi, Thai, and Indonesian. The first translation triggers a one-time download of about 600 megabytes of quantized weights, which the browser then caches indefinitely; subsequent translations are essentially free in terms of bandwidth and complete in under two seconds for a paragraph on a modern laptop. Long texts are split at sentence boundaries to fit the model's 1024-token context, then stitched back together. The tool is appropriate for short to medium passages (under 5,000 characters); for book-length documents, batch the input or run the translator in multiple browser tabs.
Which translation model does this tool use and how good is it?
We use NLLB-200 distilled 600M, Meta's open-source No-Language-Left-Behind model that was trained on 200 languages including dozens of low-resource ones. On standard benchmarks like FLORES-200, NLLB-200 distilled 600M scores within a few BLEU points of much larger commercial systems for high-resource pairs (English/Spanish, English/French, English/Chinese) and meaningfully outperforms older open models for low-resource pairs like English/Vietnamese, English/Khmer, or English/Yoruba. The distilled version is a smaller, faster student model trained on outputs of the 54B parameter teacher; quality is very close to the teacher for most language pairs while being small enough to run locally. For dense technical jargon, idioms, or stylistically demanding literary content you may still prefer DeepL or a fine-tuned LLM, but for everyday emails, web pages, legal boilerplate, and source-code comments NLLB-200 is more than sufficient and uniquely private.
Why does the first translation take so long?
The first time you click Translate, the browser must download the NLLB-200 weights (roughly 600 MB in quantized form) from a public CDN and compile them into a WebGPU or WebAssembly graph. Expect 30 seconds to a few minutes depending on your connection. Once cached, the weights stay in your browser's storage (IndexedDB / Cache API) and load in under a second on subsequent visits, so all later translations complete in 1-3 seconds for a paragraph. If you clear your browser data the model will need to be downloaded again. To verify, open DevTools, go to Application > Cache Storage; you should see entries under huggingface.co or jsdelivr after the first run.
Is my text really private? Where does it go?
Yes. Your text is processed entirely inside your browser tab by JavaScript and WebGPU. There is no server-side translation API and no network request that includes your text. You can verify by opening DevTools' Network tab before clicking Translate: you will see model weights being fetched from a CDN, but never your input text being sent anywhere. This makes the tool ideal for confidential legal or medical translation, internal corporate drafts, journalism source material, NDAs, and any other content where sending text to Google or OpenAI is not acceptable. The model weights themselves are open-source and were published by Meta in 2022; no telemetry or call-home behavior is embedded in them.

Why is WebGPU faster than WASM and how do I enable it?
WebGPU is a modern browser API that lets JavaScript run computations directly on your graphics card; for a 600M-parameter transformer like NLLB-200, that is typically 5-20x faster than the WebAssembly (CPU) backend. The tool auto-detects WebGPU support and uses it when available; a green 'WebGPU' badge appears at the top of the page if so. WebGPU ships enabled in Chrome 113+, Edge 113+, and recent Opera/Brave; on Firefox it is behind a flag (about:config -> dom.webgpu.enabled); Safari supports it in Technology Preview and recent stable builds. On WebGPU, a paragraph translates in about 1-2 seconds on a laptop iGPU; on pure WASM the same paragraph takes 8-15 seconds. If you see the yellow 'WASM' badge, your browser does not expose a GPU adapter to WebGPU and the tool falls back to CPU automatically.
What languages and language pairs are supported?
The full NLLB-200 model supports 200 languages and any direction between them; for UX simplicity we expose 15 of the most-requested ones in this build: English, Vietnamese, Spanish, Portuguese (Brazilian and European share a tokenizer), French, German, Italian, Russian, Simplified Chinese, Japanese, Korean, Arabic, Hindi, Thai, and Indonesian. You can translate in any direction between any two of these, so 15 x 14 = 210 pairs are available with no extra download once the model is cached. We pass NLLB language codes (eng_Latn, vie_Latn, zho_Hans, etc.) to the model under the hood; if you need a pair we have not exposed in the dropdown (Swahili, Tagalog, Bengali, etc.), open an issue and we will add it to a future build.
What is the character limit and how are long documents handled?
The UI accepts up to 5,000 characters per translate click, which corresponds to roughly 700-1,000 English words. Internally the tool splits the input at sentence boundaries (periods, question marks, full-width punctuation for CJK) into chunks of about 500 characters, sends each chunk through the model with its 1,024-token context window, and stitches the outputs back together. This means very long sentences (over ~500 chars) may be truncated and you should manually split them; for book-length documents we recommend running the translator in batches of a few pages each, copying results to a master file. We chose 5,000 chars as a per-run limit because the wall-clock time grows roughly linearly with input length, and longer runs significantly increase the chance of a tab refresh or memory pressure on lower-end hardware.
