Face Similarity Meter
Free AI face similarity comparison. Upload two photos and see how similar the faces are — great for finding lookalikes. Works in-browser, no upload, no signup.
About the Face Similarity Meter
The Face Similarity Meter compares two photos and reports how visually similar the faces look as a percentage from 0% to 100%. It runs entirely in your browser via @vladmandic/face-api (a maintained fork of face-api.js). Both images are decoded locally, faces are detected with a MobileNet-style detector, each face is encoded as a 128-dimensional descriptor by a FaceNet-inspired network, and the cosine distance between the two descriptors is converted to a similarity percentage. No image bytes ever leave your device — there is no server-side processing, no upload, no logging.
Use it for casual exploration: 'do my two cousins really look alike?', 'does that selfie still look like me five years later?', 'do these two unrelated celebrities look similar?'. It is fun, it is fast, and it gives you a number. We have intentionally chosen a permissive operating mode (single best face per image, no liveness check, no quality filtering) so that small or low-quality photos still produce a result instead of an error. That makes the tool friendly but it also means the percentages are not calibrated for biometric authentication. Treat the score as a relative indicator of visual similarity, not as identity proof.
Best results require front-facing, well-lit photos at decent resolution where the face occupies a meaningful portion of the frame. Photos with sunglasses, masks, heavy makeup, hats covering the eyebrows, side angles, or low resolution will distort the descriptor and lower the score, even when the same person is in both photos. Conversely, identical twins, parents and children, and even unrelated people of similar age, ethnicity, and hairstyle can produce surprisingly high scores. Plastic surgery, significant weight change, aging by ten or more years, or strong filters / face-tune software also matter — the model was not trained to be invariant to those.
We deliberately designed this tool as a curiosity rather than a verification system. It is not suitable for biometric identity verification, KYC (know-your-customer), border control, surveillance, employment screening, dating-app verification, or any situation where the result is used to allow or deny a person something. For those use cases you need a vendor system that has been benchmarked under NIST FRVT 1:1, that supports liveness detection (anti-spoofing), demographic-fairness evaluation, secure enrolment, and revocation. A free in-browser demo cannot satisfy those operational, legal, or audit requirements.
Privacy is by design. The model weights are downloaded to your browser once (about 6 MB, cached for future visits) and the entire comparison runs locally as JavaScript. There is no upload step, no temporary server-side cache, no API call. The page itself uses standard analytics for traffic counts only. Closing the tab clears all state. We don't store, log, sell, or share the photos you compare.
How the comparison works
The pipeline has three stages — detection, alignment + encoding, and distance scoring — and each stage uses an established open-source neural network. The detector is an SSD MobileNetV1 trained on WIDER FACE. It returns one or more bounding boxes per image with confidence scores; we keep the single most confident face per photo for comparison. If you want a multi-face workflow, see our Age & Gender Predictor, which iterates over every detection.
Each detected face is then aligned. A 68-point landmark detector (a small ConvNet trained on the iBUG 300-W dataset) predicts landmark coordinates: outer and inner eye corners, nose bridge, nose tip, mouth corners, and jawline. The face is rotated and cropped so the eyes are horizontal and the inter-pupillary distance is normalised. Alignment matters: the encoding network was trained on aligned faces and will produce inconsistent descriptors on non-aligned input.
The aligned crop is fed through a face-encoding network — in face-api.js this is a ResNet-34 architecture inspired by FaceNet (Schroff, Kalenichenko & Philbin, 2015) and trained with a triplet loss to produce 128-dimensional unit-length vectors that cluster tightly within identity and spread across identities. ArcFace (Deng et al., 2019) is a more recent improvement that uses an additive angular-margin loss; vladmandic's fork supports newer ArcFace-style backbones for higher accuracy when needed. We use the default ResNet-34 model for browser compatibility and footprint.
Two 128-d descriptors are compared with cosine distance: distance = 1 − (dot(a,b) / (|a||b|)). Smaller distance = more similar. The standard face-api.js threshold for 'same person' is approximately 0.6 (a 0.6 distance means roughly 70% similarity in our display). We map distance to a similarity percentage using a smooth curve: 0.0 → 100%, 0.4 → ~70%, 0.6 → ~50%, 1.0 → 0%. This mapping is empirical and friendly rather than calibrated against a biometric standard, so a score of 85% should be read as 'very similar' but not as a probability of being the same person.
All weights are quantised to 32-bit float for the in-browser TensorFlow.js runtime; total download is ~6 MB. Inference runs on WebGL when available (GPU-accelerated) and falls back to CPU via WebAssembly. End-to-end comparison of two faces typically takes 200 ms to 1 s on a modern laptop, longer on mobile. The UI shows a confidence-style bar and one of five qualitative bands (very similar, similar, somewhat similar, not similar, very different) chosen for friendliness, not for biometric rigour.
Accuracy, thresholds, and where this tool fails
On the LFW (Labelled Faces in the Wild) academic benchmark, well-trained 128-d face encoders reach ~99% verification accuracy on matched pairs. That number is not the accuracy you should expect on arbitrary internet photos. LFW pairs are pre-selected for image quality and frontal pose; in-the-wild performance is much noisier. NIST FRVT 1:1 — which evaluates dozens of commercial vendors on hundreds of thousands of operational photos — shows that even leading systems have FAR (False Acceptance Rate) and FRR (False Rejection Rate) values that vary by an order of magnitude with demographic, age, and image quality. Our open-source backbone is older and smaller than the leaders on FRVT.
Concrete failure modes you will encounter: identical twins almost always score above 80% — the encoder cannot reliably distinguish them. Parents and adult children, siblings, or unrelated people of the same ethnicity and similar age and hairstyle can all score 70–85%. The same person photographed ten years apart can drop to 50% if facial features have changed. Heavy filters (FaceTune, Snapchat, beauty filters) effectively edit a different face into the photo and will lower the score significantly. Glasses, masks, beards, hijabs, hats, and partial occlusion all reduce accuracy because they hide informative landmarks.
Demographic fairness is a known limitation. Buolamwini & Gebru (2018), NIST FRVT (2019, 2024), and many other audits have shown that face-recognition models trained predominantly on lighter-skinned subjects produce higher error rates for darker-skinned faces, women, and children. The face-api.js descriptor used here inherits those biases. Treat any single comparison cautiously, especially when one or both subjects are from groups that are under-represented in standard public training sets.
Do not use this tool as a biometric authenticator, identity-proofing system, fraud-prevention check, surveillance match, employment-screening filter, or dating-app verifier. For those uses you need an audited commercial system with liveness detection (so a printed photo or a deep-fake video doesn't pass), a revocation pipeline, and a documented bias assessment. We have no such guarantees and we explicitly tell you not to deploy this in production. It is a curiosity tool, and the score is a guideline, not a verdict.
- Identical twins typically score 85–95% — the model cannot reliably tell them apart.
- Same person aged 10+ years apart may drop to 50–70% similarity due to natural ageing.
- Sunglasses, masks, beards, hats, or other occlusions block landmarks and reduce score.
- Strong filters (FaceTune, beauty filters, Snapchat lenses) effectively edit the face and distort the descriptor.
- Demographic fairness is uneven: darker skin tones, women, and children have higher error rates due to training-set imbalance.
- The tool reports the single best face per image; group photos must be cropped to one face first.
- There is no liveness detection — a printed photo of someone's face will produce the same descriptor as a live capture.
- Not suitable for biometric identity verification, KYC, border control, employment, or dating-app safety checks.
- Mapping from cosine distance to percentage is friendly and not calibrated against a biometric standard.
Glossary
- Face embedding (descriptor)
- A fixed-length numeric vector — here, 128 floating-point numbers — produced by a neural network that encodes the visual identity of a face. Photos of the same person should have similar embeddings; photos of different people should have dissimilar ones.
- Cosine similarity / cosine distance
- A geometric measure of how aligned two vectors are. Cosine similarity = dot(a,b) / (|a||b|), range [-1, 1]; cosine distance = 1 − cosine similarity. Used because descriptors live on a high-dimensional sphere.
- Threshold
- The cosine distance below which two descriptors are declared a match. face-api.js uses ~0.6 as a default; this corresponds to roughly 50% in our friendly UI scale. Lowering it makes the tool stricter (fewer false matches, more missed matches).
- FAR (False Acceptance Rate)
- The rate at which a face-matcher incorrectly says two different people are the same. Critical for security systems — a high FAR means impostors get through.
- FRR (False Rejection Rate)
- The rate at which a face-matcher incorrectly says photos of the same person are different. A high FRR means genuine users are inconvenienced.
- FaceNet
- A landmark 2015 paper by Schroff, Kalenichenko & Philbin (Google) that introduced the triplet-loss training scheme to produce 128-d face embeddings on a unit hypersphere.
- ArcFace
- A 2019 face-recognition loss function (Deng et al., InsightFace) that uses an additive angular margin to push descriptor classes further apart on the hypersphere. State-of-the-art on academic benchmarks like LFW and IJB-B.
- LFW / NIST FRVT
- Academic and government benchmarks for face-recognition systems. LFW (Labelled Faces in the Wild, 2007) is small and high-quality. NIST FRVT (Face Recognition Vendor Test) is the gold standard government evaluation, with hundreds of thousands of operational photos and ongoing publication.
Frequently Asked Questions
How does the AI compare two faces?
It detects one face per image, aligns each face using 68 facial landmarks, encodes each aligned face as a 128-dimensional vector with a FaceNet-style ResNet, and computes cosine distance between the two vectors. Cosine distance is mapped to a friendly 0–100% similarity score. All inference is JavaScript in your browser via @vladmandic/face-api on TensorFlow.js.
What does the percentage actually mean?
It is a smooth mapping of cosine distance to a 0–100 scale. In rough terms: 90–100% = visually nearly identical (same person, twins, or an extreme lookalike); 70–90% = same person likely, or close relatives; 50–70% = some shared features but not necessarily the same person; below 50% = different people. It is NOT a probability and it is NOT a biometric verification result.
Why do unrelated people score 60%?
Because the descriptor encodes the broad shape of the face — eye spacing, nose width, jaw angle, ethnicity, age — and many unrelated people share enough of those features to land in a similar region of the embedding space. This is a fundamental property of 128-d face descriptors, not a bug.
Why did my own photo score only 75% against another photo of me?
Common causes: (1) different lighting or camera angle; (2) different age (more than a few years can shift descriptors meaningfully); (3) glasses or facial hair in only one of the photos; (4) heavy filters or FaceTune in one photo; (5) one photo is much lower resolution; (6) you are wearing makeup in one and not the other. Try another pair of photos with similar conditions.
Are my photos uploaded?
No. All face detection, encoding, and comparison happens locally in your browser via TensorFlow.js. The model weights are downloaded once (about 6 MB, cached) and the inference runs on the JPEGs you select. Photo bytes never leave the device. We don't store, log, or share photos.
Can I use this for identity verification?
No. Do not use this tool to verify someone's identity, gate access to a service, prevent fraud, or screen employees / dates. It has no liveness detection, no calibrated threshold, no demographic fairness audit, and uses a smaller open-source model than commercial systems. For identity verification you need a vendor evaluated under NIST FRVT with documented FAR/FRR and operational guarantees.
Why are identical twins not distinguishable?
The face descriptor is trained to be invariant to lighting, expression, and pose, but is not designed to capture micro-features that even humans use to tell twins apart (small mole, slight asymmetry). Twin-discrimination is an active research area; standard face encoders generally fail at it.
Can it tell parents and children apart?
Often, but not always. Parents and adult children share many genetic facial features and may score 60–80%. The model is trained to put the same identity close together, but it has no notion of 'family resemblance' versus 'identity', so high scores between relatives are common and expected.
Does it support multiple faces in one photo?
Currently it picks the single most confident face per image and compares those two. If your photos are group shots, crop each one to a single face first. For multi-face workflows we have a dedicated Age & Gender Predictor that iterates over every detected face in an image.

What if no face is detected?
The tool reports 'no face detected' for that image. Common causes: face is too small (below detector minimum size), face is at an extreme angle, lighting is too dark or too bright, image is heavily blurred, or the image is not actually a photo of a face. Try a clearer, larger, frontal photo.
References & academic sources
- Schroff, F., Kalenichenko, D., & Philbin, J.. (2015). FaceNet: A Unified Embedding for Face Recognition and Clustering IEEE CVPR.
- Deng, J., Guo, J., Xue, N., & Zafeiriou, S.. (2019). ArcFace: Additive Angular Margin Loss for Deep Face Recognition IEEE CVPR (InsightFace project).
- Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E.. (2007). Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments University of Massachusetts Amherst Technical Report.
- Grother, P., Ngan, M., & Hanaoka, K.. (2024). NIST Face Recognition Vendor Test (FRVT) — ongoing evaluation U.S. National Institute of Standards and Technology.
- Buolamwini, J., & Gebru, T.. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification Proceedings of Machine Learning Research.
- Mandic, V.. (2024). @vladmandic/face-api — maintained TypeScript fork of face-api.js Open-source project, MIT licence.
Last reviewed: · Reviewed by WuTools AI Ethics & Engineering Team
Frequently Asked Questions
Does face comparison happen in my browser or are my photos uploaded?
Everything runs inside your browser. The face-detection and face-embedding models (FaceNet-style ResNet via @vladmandic/face-api on TensorFlow.js) are downloaded once and then every comparison is computed locally using WebGL, WebGPU, or WebAssembly. Your face photos never leave your device — there is no upload, no server-side processing, no biometric template stored anywhere outside your browser. This matters enormously for face data because under GDPR and Illinois BIPA, face embeddings count as sensitive biometric identifiers, and many enterprise security policies explicitly forbid uploading them to third-party APIs. The only network traffic after the model download is the static page assets.
What image formats and conditions give the best comparison results?
Accepted formats: JPEG, PNG, WebP, AVIF, GIF (first frame), BMP and HEIC on supported browsers. For accurate comparison, both faces should be: at least 160x160 pixels in the cropped face region, frontal or near-frontal (yaw within ±30°), evenly lit without harsh shadows, in focus and unobstructed by glasses-glare, masks, or heavy hair coverage. Profile shots, extreme lighting, motion blur, and faces smaller than 80 pixels degrade the embedding quality. If multiple faces appear in either image, the tool uses the largest detected face — for portraits with several people, crop manually first.
What does the similarity percentage actually represent?
It is a friendly remapping of the cosine distance between two 128-dimensional face embeddings produced by the FaceNet-style network. Internally, each face is encoded as a unit vector in 128D space; the cosine distance (1 - cos(angle) between the two vectors) is then mapped to a 0-100% similarity score. Above 70% typically means "same person" with high confidence on good-quality images. 50-70% is a borderline region affected by lighting, age, or expression differences. Below 50% is almost certainly "different person." The percentage is not a literal probability — it is a calibrated convenience metric. For legal or security applications, always use the raw cosine distance with a properly validated threshold for your dataset.
Why does the model sometimes say my own old and recent photos are not the same person?
Face embeddings are highly sensitive to age changes (>5 years can drop similarity 10-20%), facial hair, glasses, weight changes, hairstyle, makeup, and lighting color temperature. A FaceNet model trained on web photos learned to discriminate hundreds of thousands of identities under typical conditions; significantly out-of-distribution changes will reduce the score even for the same person. Twins and close relatives can also fool it (high similarity but technically different people). For genealogy-style comparisons across decades, expect 50-70% scores between genuine matches; for security or unlock applications you typically want to require ≥75% with both images captured under similar conditions.
Is WebGPU faster than WebAssembly for face comparison?
Yes, significantly. The face-detection pass (SSD-MobileNet) and the 128D embedding pass (ResNet) are both convolution-heavy networks that benefit from GPU parallelism. On a typical laptop, WebGPU completes a full detect-align-embed-compare cycle in 100-300 ms per face pair; WebAssembly with SIMD takes 500-2000 ms, and WebGL (an older GPU backend) is somewhere in between. For batch comparisons (one query face vs many references), the GPU advantage compounds. The tool autodetects the best backend at startup: WebGPU > WebGL > WebAssembly-SIMD > WebAssembly fallback. You can see which backend is active in the browser console.
Can the tool be fooled by a printed photo, a photo of a screen, or a deepfake?
This tool is a face-recognition similarity meter, not a presentation-attack detector. It will happily compare a photo of a printed photo of you to a real photo of you, and the embedding will match. It does not check for liveness, depth, screen reflections, or AI-generated artifacts. For payment verification, identity proofing, or unlock applications, you need a separate liveness-detection layer (active challenges like blink or head turn, or passive depth analysis from a true-depth camera) on top of the similarity check. Browser-only liveness detection is possible (MediaPipe Face Landmarker can detect head pose changes) but is not part of this tool. For casual or genealogy use, the absence of liveness checks does not matter.
Which neural architecture is doing the heavy lifting — FaceNet, ArcFace, or DeepFace?
The default pipeline uses face-api.js / @vladmandic/face-api, which combines an SSD-MobileNet v1 face detector, a 68-point landmark regressor (for alignment), and a FaceNet-style ResNet-34 embedding network that outputs a 128-dimensional descriptor trained with triplet loss. ArcFace (2019) and CosFace (2018) are newer architectures using angular margin losses and 512-dimensional embeddings that achieve higher accuracy on LFW (99.83% vs FaceNet's 99.65%) but require larger models and slightly different alignment. They are available as advanced options if you need state-of-the-art accuracy. For everyday comparison, the default FaceNet pipeline is fast, well-tested, and good enough.
What is the difference between FP32 and INT8 for face embeddings, and does it matter for accuracy?
FP32 stores each network weight as a 32-bit float; INT8 stores it as an 8-bit integer, shrinking the model 4x and speeding up CPU inference 2-3x. For face embeddings, INT8 typically reduces the LFW accuracy by 0.1-0.3% — invisible to a human comparing scores but measurable on a 6000-image benchmark. More importantly, INT8 embeddings are slightly noisier, which can push borderline pairs (around 70% similarity) into the wrong bucket. The tool ships an FP32 default for the embedding network because face data deserves the precision; the detector and landmark networks use INT8 because their output (bounding box, 68 points) is robust to quantization noise.
