Language Detector

Instant on-device language detector. Identify the language of any text with ISO 639-3 codes, ranked match scores, and JSON/CSV export. No upload, fully private.

Have feedback? Report bugs, suggest features, or share your thoughts — we read them all

About the Language Detector

This Language Detector identifies the language of any text using franc, a lightweight character-trigram statistical algorithm that runs entirely in your browser. It compares the distribution of three-character sequences (trigrams) in your text against trigram profiles for roughly 80+ languages and returns a ranked list with normalized match scores.

There is no neural network, no model download, and no server request — franc is a ~40KB pure-JavaScript library that loads once with the page and runs instantly and offline. The tool returns standard ISO 639-3 codes (plus ISO 639-1 where available), shows a confidence bar for each candidate, and lets you copy the ranked results as JSON or download them as CSV to feed into localization pipelines and other tooling.

How does this language detector work under the hood?

It uses franc, a character-trigram (n-gram) statistical detector. The text is broken into overlapping three-character sequences, and the resulting frequency profile is compared to precomputed trigram profiles for each supported language. The closest profile wins. This is a purely statistical, dictionary-free method — fast, tiny, and language-agnostic — not a neural network or AI model. Everything runs synchronously in your browser with no download and no WebGPU/WASM dependency.

Does this detector send my text to any server?

No. franc is a ~40KB pure-JavaScript library that loads once with the page and runs locally; there is no server call and no model download at detection time. You can verify it by opening DevTools, switching to the Network tab, and confirming that clicking Detect fires no outbound requests. This makes the tool safe for confidential emails, drafts, legal evidence, or any private content where you only need to know which language it is written in.

What does the match score / confidence percentage actually mean?

franc returns a normalized score in the range 0 to 1 for each candidate, where 1 is the best possible match and the highest value is the most likely language. The tool relabels and renders this as a percentage with a bar — higher is better. The top result is the most probable language. When the top two scores are close (within roughly 10 points), treat the result as ambiguous; this commonly happens with related languages such as Spanish vs Portuguese, Norwegian vs Danish, or Indonesian vs Malay, and with very short input.

How short can my text be and still get an accurate detection?

Trigram detection needs enough characters to build a stable profile. franc ignores input shorter than its minimum length and returns an 'undetermined' result, which this tool shows as a clear notice rather than a fake confident guess. For reliable results, paste at least one full sentence (roughly 30 to 100+ characters). Very short strings, proper nouns, or single words are genuinely ambiguous even to humans and may be reported as undetermined or with low, closely-ranked scores — so watch the ranked list, not just the single top guess.

Language Detector — Instant on-device language detector. Identify the language of any text with ISO 639-3 codes, ranked match scores, and JS — **Language Detector**

Why do results use ISO 639-3 three-letter codes like 'eng' and 'cmn'?

franc identifies languages using ISO 639-3, the three-letter standard that can name far more languages than the two-letter ISO 639-1 set. English is 'eng', Mandarin Chinese is 'cmn', Vietnamese is 'vie'. Where a two-letter ISO 639-1 equivalent exists (en, zh, vi) the tool shows it as well, so you can pick whichever code your i18n framework or database expects. The JSON export includes both iso639_3 and iso639_1 plus the human-readable name, so no manual mapping is needed downstream.

How many languages does franc-min support here?

This tool loads franc-min, the compact build covering roughly the 80+ most common languages (the full franc package supports 400+). It handles all widely used European languages, CJK (Chinese, Japanese, Korean), Arabic, Hindi, Bengali, Tamil, Telugu, Thai, Vietnamese, Indonesian, Turkish, Persian, Hebrew, and many regional languages. Each candidate is returned with its ISO code, native name, and normalized match score, so you can resolve ambiguous or mixed-language cases yourself.

Can I export the ranked results for a pipeline or spreadsheet?

Yes — that is the main pro feature. After detection the tool shows the full ranked breakdown with confidence bars, then offers Copy JSON and Download CSV. The JSON object includes input_length, word_count, a generated_at ISO timestamp, and a detected array of {rank, iso639_3, iso639_1, name, score, confidence_pct}. The CSV uses the header rank,iso639_3,iso639_1,name,confidence_pct. Both are produced entirely in-browser via a Blob download, so nothing is uploaded.

Why does it sometimes confuse Chinese, Japanese, and Korean?

CJK detection is tricky because Japanese kanji and Chinese hanzi share thousands of characters, and a short Japanese sentence written only in kanji can statistically resemble Chinese. Hiragana, katakana, and hangul are unique to one language each, so even a single such character pushes franc decisively toward Japanese or Korean. Longer, mixed-script input is almost always resolved correctly. For very short kanji-only strings, check whether the top two candidates (cmn vs jpn) are close in score before trusting the single best guess.