Face Similarity Meter
Compare two face photos on-device with a 128-d FaceNet embedding. Get the raw L2/cosine distance, a tunable threshold and JSON export. No upload, fully private.
About the Face Similarity Meter
The Face Similarity Meter compares two photos and reports how visually similar the faces look as a percentage from 0% to 100%. It runs entirely in your browser via @vladmandic/face-api (a maintained fork of face-api.js). Both images are decoded locally, faces are detected with a MobileNet-style detector, each face is encoded as a 128-dimensional descriptor by a FaceNet-inspired network, and the Euclidean (L2) distance between the two descriptors is converted to a similarity percentage. No image bytes ever leave your device — there is no server-side processing, no upload, no logging.
Use it for casual exploration: 'do my two cousins really look alike?', 'does that selfie still look like me five years later?', 'do these two unrelated celebrities look similar?'. It is fun, it is fast, and it gives you a number. We have intentionally chosen a permissive operating mode (single best face per image, no liveness check, no quality filtering) so that small or low-quality photos still produce a result instead of an error. That makes the tool friendly but it also means the percentages are not calibrated for biometric authentication. Treat the score as a relative indicator of visual similarity, not as identity proof.
Best results require front-facing, well-lit photos at decent resolution where the face occupies a meaningful portion of the frame. Photos with sunglasses, masks, heavy makeup, hats covering the eyebrows, side angles, or low resolution will distort the descriptor and lower the score, even when the same person is in both photos. Conversely, identical twins, parents and children, and even unrelated people of similar age, ethnicity, and hairstyle can produce surprisingly high scores. Plastic surgery, significant weight change, aging by ten or more years, or strong filters / face-tune software also matter — the model was not trained to be invariant to those.
We deliberately designed this tool as a curiosity rather than a verification system. It is not suitable for biometric identity verification, KYC (know-your-customer), border control, surveillance, employment screening, dating-app verification, or any situation where the result is used to allow or deny a person something. For those use cases you need a vendor system that has been benchmarked under NIST FRVT 1:1, that supports liveness detection (anti-spoofing), demographic-fairness evaluation, secure enrolment, and revocation. A free in-browser demo cannot satisfy those operational, legal, or audit requirements.
Privacy is by design. The model weights are downloaded to your browser once (about 6 MB, cached for future visits) and the entire comparison runs locally as JavaScript. There is no upload step, no temporary server-side cache, no API call. The page itself uses standard analytics for traffic counts only. Closing the tab clears all state. We don't store, log, sell, or share the photos you compare.
How the comparison works
The pipeline has three stages — detection, alignment + encoding, and distance scoring — and each stage uses an established open-source neural network. The detector is an SSD MobileNetV1 trained on WIDER FACE. It returns one or more bounding boxes per image with confidence scores; we keep the single most confident face per photo for comparison. If you want a multi-face workflow, see our Age & Gender Predictor, which iterates over every detection.
Each detected face is then aligned. A 68-point landmark detector (a small ConvNet trained on the iBUG 300-W dataset) predicts landmark coordinates: outer and inner eye corners, nose bridge, nose tip, mouth corners, and jawline. The face is rotated and cropped so the eyes are horizontal and the inter-pupillary distance is normalised. Alignment matters: the encoding network was trained on aligned faces and will produce inconsistent descriptors on non-aligned input.
The aligned crop is fed through a face-encoding network — in face-api.js this is a ResNet-34 architecture inspired by FaceNet (Schroff, Kalenichenko & Philbin, 2015) and trained with a triplet loss to produce 128-dimensional unit-length vectors that cluster tightly within identity and spread across identities. ArcFace (Deng et al., 2019) is a more recent improvement that uses an additive angular-margin loss; vladmandic's fork supports newer ArcFace-style backbones for higher accuracy when needed. We use the default ResNet-34 model for browser compatibility and footprint.
The two 128-d descriptors are compared with the Euclidean (L2) distance that face-api computes directly: distance = sqrt(Σ (aᵢ − bᵢ)²). Smaller distance = more similar. The standard face-api.js threshold for 'same person' is approximately 0.6. We map that L2 distance to a similarity percentage using a smooth anchored curve: 0.0 → 100%, 0.4 → ~70%, 0.6 → ~50%, 1.0 → 0%, so the 0.6 same-person threshold lands in the 'same person likely' band rather than a 'partial match'. For pros we also expose the raw L2 distance, the cosine distance (1 − dot(a,b)/(|a||b|)), and the descriptor norms in the Technical-details panel, plus an adjustable threshold and a JSON export. The percentage is empirical and friendly rather than calibrated against a biometric standard, so a score of 85% should be read as 'very similar', not as a probability of being the same person.
All weights are stored as 32-bit float for the in-browser TensorFlow.js runtime; total download is ~6 MB, and only the detector that is actually used is downloaded on the common path. Inference uses the first backend that initialises in the order WebGL → WebAssembly → CPU (you can see the active backend logged in the browser console); there is no WebGPU path in this build. End-to-end comparison of two faces typically takes 200 ms to 1 s on a modern laptop, longer on mobile. The UI shows a confidence-style bar and one of five qualitative bands (very similar, similar, somewhat similar, not similar, very different) chosen for friendliness, not for biometric rigour.
Accuracy, thresholds, and where this tool fails
On the LFW (Labelled Faces in the Wild) academic benchmark, well-trained 128-d face encoders reach ~99% verification accuracy on matched pairs. That number is not the accuracy you should expect on arbitrary internet photos. LFW pairs are pre-selected for image quality and frontal pose; in-the-wild performance is much noisier. NIST FRVT 1:1 — which evaluates dozens of commercial vendors on hundreds of thousands of operational photos — shows that even leading systems have FAR (False Acceptance Rate) and FRR (False Rejection Rate) values that vary by an order of magnitude with demographic, age, and image quality. Our open-source backbone is older and smaller than the leaders on FRVT.
Concrete failure modes you will encounter: identical twins almost always score above 80% — the encoder cannot reliably distinguish them. Parents and adult children, siblings, or unrelated people of the same ethnicity and similar age and hairstyle can all score 70–85%. The same person photographed ten years apart can drop to 50% if facial features have changed. Heavy filters (FaceTune, Snapchat, beauty filters) effectively edit a different face into the photo and will lower the score significantly. Glasses, masks, beards, hijabs, hats, and partial occlusion all reduce accuracy because they hide informative landmarks.
Demographic fairness is a known limitation. Buolamwini & Gebru (2018), NIST FRVT (2019, 2024), and many other audits have shown that face-recognition models trained predominantly on lighter-skinned subjects produce higher error rates for darker-skinned faces, women, and children. The face-api.js descriptor used here inherits those biases. Treat any single comparison cautiously, especially when one or both subjects are from groups that are under-represented in standard public training sets.
Do not use this tool as a biometric authenticator, identity-proofing system, fraud-prevention check, surveillance match, employment-screening filter, or dating-app verifier. For those uses you need an audited commercial system with liveness detection (so a printed photo or a deep-fake video doesn't pass), a revocation pipeline, and a documented bias assessment. We have no such guarantees and we explicitly tell you not to deploy this in production. It is a curiosity tool, and the score is a guideline, not a verdict.
- Identical twins typically score 85–95% — the model cannot reliably tell them apart.
- Same person aged 10+ years apart may drop to 50–70% similarity due to natural ageing.
- Sunglasses, masks, beards, hats, or other occlusions block landmarks and reduce score.
- Strong filters (FaceTune, beauty filters, Snapchat lenses) effectively edit the face and distort the descriptor.
- Demographic fairness is uneven: darker skin tones, women, and children have higher error rates due to training-set imbalance.
- The tool reports the single best face per image; group photos must be cropped to one face first.
- There is no liveness detection — a printed photo of someone's face will produce the same descriptor as a live capture.
- Not suitable for biometric identity verification, KYC, border control, employment, or dating-app safety checks.
- Mapping from the Euclidean (L2) distance to a percentage is friendly and not calibrated against a biometric standard.
Glossary
- Face embedding (descriptor)
- A fixed-length numeric vector — here, 128 floating-point numbers — produced by a neural network that encodes the visual identity of a face. Photos of the same person should have similar embeddings; photos of different people should have dissimilar ones.
- Euclidean (L2) distance
- The straight-line distance between two descriptor vectors: sqrt(Σ (aᵢ − bᵢ)²). This is the canonical metric this tool reports and the one the ~0.6 'same person' threshold refers to. Smaller = more similar.
- Cosine distance
- An alternative geometric measure of how aligned two vectors are: cosine distance = 1 − dot(a,b)/(|a||b|). Exposed alongside the L2 distance in the Technical-details panel for pros who prefer an angle-based metric.
- Threshold
- The Euclidean (L2) distance below which two descriptors are declared a match. face-api.js uses ~0.6 as a default; this corresponds to roughly 50% on our friendly UI scale. The Technical-details panel lets you slide this threshold and see the verdict update live. Lowering it makes the tool stricter (fewer false matches, more missed matches).
- FAR (False Acceptance Rate)
- The rate at which a face-matcher incorrectly says two different people are the same. Critical for security systems — a high FAR means impostors get through.
- FRR (False Rejection Rate)
- The rate at which a face-matcher incorrectly says photos of the same person are different. A high FRR means genuine users are inconvenienced.
- FaceNet
- A landmark 2015 paper by Schroff, Kalenichenko & Philbin (Google) that introduced the triplet-loss training scheme to produce 128-d face embeddings on a unit hypersphere.
- ArcFace
- A 2019 face-recognition loss function (Deng et al., InsightFace) that uses an additive angular margin to push descriptor classes further apart on the hypersphere. State-of-the-art on academic benchmarks like LFW and IJB-B.
- LFW / NIST FRVT
- Academic and government benchmarks for face-recognition systems. LFW (Labelled Faces in the Wild, 2007) is small and high-quality. NIST FRVT (Face Recognition Vendor Test) is the gold standard government evaluation, with hundreds of thousands of operational photos and ongoing publication.
Frequently Asked Questions
How does the AI compare two faces?
It detects one face per image, aligns each face using 68 facial landmarks, encodes each aligned face as a 128-dimensional vector with a FaceNet-style ResNet, and computes the Euclidean (L2) distance between the two vectors. That distance is mapped to a friendly 0–100% similarity score, and the raw L2 distance (plus cosine distance) is shown in the Technical-details panel. All inference is JavaScript in your browser via @vladmandic/face-api on TensorFlow.js.
What does the percentage actually mean?
It is a smooth mapping of the Euclidean (L2) distance to a 0–100 scale. In rough terms: 90–100% = visually nearly identical (same person, twins, or an extreme lookalike); 70–90% (distance ≤ ~0.4) = same person likely, or close relatives; 50–70% (distance ~0.4–0.6) = some shared features, around the same-person threshold; below 50% (distance > 0.6) = different people. It is NOT a probability and it is NOT a biometric verification result.
Why do unrelated people score 60%?
Because the descriptor encodes the broad shape of the face — eye spacing, nose width, jaw angle, ethnicity, age — and many unrelated people share enough of those features to land in a similar region of the embedding space. This is a fundamental property of 128-d face descriptors, not a bug.
Why did my own photo score only 75% against another photo of me?
Common causes: (1) different lighting or camera angle; (2) different age (more than a few years can shift descriptors meaningfully); (3) glasses or facial hair in only one of the photos; (4) heavy filters or FaceTune in one photo; (5) one photo is much lower resolution; (6) you are wearing makeup in one and not the other. Try another pair of photos with similar conditions.
Are my photos uploaded?
No. All face detection, encoding, and comparison happens locally in your browser via TensorFlow.js. The model weights are downloaded once (about 6 MB, cached) and the inference runs on the JPEGs you select. Photo bytes never leave the device. We don't store, log, or share photos.
Can I use this for identity verification?
No. Do not use this tool to verify someone's identity, gate access to a service, prevent fraud, or screen employees / dates. It has no liveness detection, no calibrated threshold, no demographic fairness audit, and uses a smaller open-source model than commercial systems. For identity verification you need a vendor evaluated under NIST FRVT with documented FAR/FRR and operational guarantees.
Why are identical twins not distinguishable?
The face descriptor is trained to be invariant to lighting, expression, and pose, but is not designed to capture micro-features that even humans use to tell twins apart (small mole, slight asymmetry). Twin-discrimination is an active research area; standard face encoders generally fail at it.
Can it tell parents and children apart?
Often, but not always. Parents and adult children share many genetic facial features and may score 60–80%. The model is trained to put the same identity close together, but it has no notion of 'family resemblance' versus 'identity', so high scores between relatives are common and expected.
Does it support multiple faces in one photo?
Currently it picks the single most confident face per image and compares those two. If your photos are group shots, crop each one to a single face first. For multi-face workflows we have a dedicated Age & Gender Predictor that iterates over every detected face in an image.
What if no face is detected?
The tool reports 'no face detected' for that image. Common causes: face is too small (below detector minimum size), face is at an extreme angle, lighting is too dark or too bright, image is heavily blurred, or the image is not actually a photo of a face. Try a clearer, larger, frontal photo.

How do I see the raw distance and pick my own threshold?
Open the Technical-details panel under the result. It shows the raw Euclidean (L2) distance, the cosine distance, and both 128-d descriptor norms — the exact numbers a genealogy, dedup, or dataset-cleaning workflow needs. The threshold slider lets you set your own L2 decision point (default 0.6) and the Match / Borderline / No-match verdict updates live without re-running inference. To choose a threshold for your own dataset, label a few dozen known same-person and known different-person pairs, look at where their distances cluster, and set the threshold between the two clusters to trade off false accepts against false rejects.
Can I export or copy the result for record-keeping?
Yes. The Copy JSON button in the Technical-details panel copies a structured object — metric, raw L2 distance, cosine distance, similarity percent, the threshold you used, the verdict, and an ISO timestamp — to your clipboard, with a note that it is an on-device estimate and not identity proof. Because everything is computed locally, the JSON never touches a server, so it is safe to paste into a private evidence or batch log.
References & academic sources
- Schroff, F., Kalenichenko, D., & Philbin, J.. (2015). FaceNet: A Unified Embedding for Face Recognition and Clustering IEEE CVPR.
- Deng, J., Guo, J., Xue, N., & Zafeiriou, S.. (2019). ArcFace: Additive Angular Margin Loss for Deep Face Recognition IEEE CVPR (InsightFace project).
- Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E.. (2007). Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments University of Massachusetts Amherst Technical Report.
- Grother, P., Ngan, M., & Hanaoka, K.. (2024). NIST Face Recognition Vendor Test (FRVT) — ongoing evaluation U.S. National Institute of Standards and Technology.
- Buolamwini, J., & Gebru, T.. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification Proceedings of Machine Learning Research.
- Mandic, V.. (2024). @vladmandic/face-api — maintained TypeScript fork of face-api.js Open-source project, MIT licence.
Last reviewed: · Reviewed by WuTools AI Ethics & Engineering Team
Frequently Asked Questions
Does face comparison happen in my browser or are my photos uploaded?
Everything runs inside your browser. The face-detection and face-embedding models (FaceNet-style ResNet via @vladmandic/face-api on TensorFlow.js) are downloaded once and then every comparison is computed locally using WebGL, WebAssembly, or plain CPU. Your face photos never leave your device — there is no upload, no server-side processing, no biometric template stored anywhere outside your browser. This matters enormously for face data because under GDPR and Illinois BIPA, face embeddings count as sensitive biometric identifiers, and many enterprise security policies explicitly forbid uploading them to third-party APIs. The only network traffic after the model download is the static page assets.
What image formats and conditions give the best comparison results?
Accepted formats: JPEG, PNG, WebP, AVIF, GIF (first frame), BMP and HEIC on supported browsers. For accurate comparison, both faces should be: at least 160x160 pixels in the cropped face region, frontal or near-frontal (yaw within ±30°), evenly lit without harsh shadows, in focus and unobstructed by glasses-glare, masks, or heavy hair coverage. Profile shots, extreme lighting, motion blur, and faces smaller than 80 pixels degrade the embedding quality. If multiple faces appear in either image, the tool uses the largest detected face — for portraits with several people, crop manually first.
What does the similarity percentage actually represent?
It is a friendly remapping of the Euclidean (L2) distance between two 128-dimensional face embeddings produced by the FaceNet-style network. Each face is encoded as a vector in 128D space; the L2 distance is then mapped to a 0-100% similarity score along an anchored curve (0.0 → 100%, 0.4 → ~70%, 0.6 → ~50%, 1.0 → 0%). At or above ~70% (distance ≤ 0.4) typically means "same person" with high confidence on good-quality images; ~50-70% (distance 0.4-0.6) is the same-person-likely / borderline region affected by lighting, age, or expression; below ~50% (distance > 0.6) leans "different person." The percentage is not a literal probability. For legal or security applications, open the Technical-details panel and use the raw L2 distance (the cosine distance is shown too) with a threshold you have validated on your own dataset.
Why does the model sometimes say my own old and recent photos are not the same person?
Face embeddings are highly sensitive to age changes (>5 years can drop similarity 10-20%), facial hair, glasses, weight changes, hairstyle, makeup, and lighting color temperature. A FaceNet model trained on web photos learned to discriminate hundreds of thousands of identities under typical conditions; significantly out-of-distribution changes will reduce the score even for the same person. Twins and close relatives can also fool it (high similarity but technically different people). For genealogy-style comparisons across decades, expect 50-70% scores between genuine matches; for security or unlock applications you typically want to require ≥75% with both images captured under similar conditions.
Which compute backend does the tool use, and why is WebGL fastest here?
The face-detection pass (TinyFaceDetector / SSD-MobileNet) and the 128D embedding pass (ResNet) are convolution-heavy networks that benefit from GPU parallelism. At startup the tool initialises the first backend that is available in the order WebGL → WebAssembly → CPU, and logs the active one to the browser console. WebGL is GPU-accelerated and is the fast path on most machines (a full detect-align-embed-compare cycle in roughly 100-300 ms per pair); WebAssembly is the CPU fallback when WebGL is blocked or unavailable, and plain CPU is the last resort. This build does not use WebGPU — the face-api TensorFlow.js bundle here ships WebGL and WASM kernels, so WebGL is the GPU option.
Can the tool be fooled by a printed photo, a photo of a screen, or a deepfake?
This tool is a face-recognition similarity meter, not a presentation-attack detector. It will happily compare a photo of a printed photo of you to a real photo of you, and the embedding will match. It does not check for liveness, depth, screen reflections, or AI-generated artifacts. For payment verification, identity proofing, or unlock applications, you need a separate liveness-detection layer (active challenges like blink or head turn, or passive depth analysis from a true-depth camera) on top of the similarity check. Browser-only liveness detection is possible (MediaPipe Face Landmarker can detect head pose changes) but is not part of this tool. For casual or genealogy use, the absence of liveness checks does not matter.
Which neural architecture is doing the heavy lifting — FaceNet, ArcFace, or DeepFace?
The default pipeline uses face-api.js / @vladmandic/face-api, which combines an SSD-MobileNet v1 face detector, a 68-point landmark regressor (for alignment), and a FaceNet-style ResNet-34 embedding network that outputs a 128-dimensional descriptor trained with triplet loss. ArcFace (2019) and CosFace (2018) are newer architectures using angular margin losses and 512-dimensional embeddings that achieve higher accuracy on LFW (99.83% vs FaceNet's 99.65%) but require larger models and slightly different alignment. They are available as advanced options if you need state-of-the-art accuracy. For everyday comparison, the default FaceNet pipeline is fast, well-tested, and good enough.
What is the difference between FP32 and INT8 for face embeddings, and does it matter for accuracy?
FP32 stores each network weight as a 32-bit float; INT8 stores it as an 8-bit integer, shrinking the model ~4x and speeding up CPU inference 2-3x. For face embeddings, INT8 typically reduces LFW accuracy by 0.1-0.3% — invisible to a human comparing scores but measurable on a 6000-image benchmark. More importantly, INT8 embeddings are slightly noisier, which can push borderline pairs (around the 0.6 distance threshold) into the wrong bucket. This tool loads the standard @vladmandic/face-api weights and runs them in the in-browser TensorFlow.js runtime at 32-bit float, so the embedding precision is not the limiting factor here; image quality, pose, and lighting matter far more.
