Face Similarity Meter

Free AI-powered face similarity comparison tool. Upload two photos and let AI calculate how similar the faces are. Perfect for finding lookalikes. Works offline in browser.

Upload
Drag and drop image here
Image 1
Upload
Drag and drop image here
Image 2

About the Face Similarity Meter

The Face Similarity Meter compares two photos and reports how visually similar the faces look as a percentage from 0% to 100%. It runs entirely in your browser via @vladmandic/face-api (a maintained fork of face-api.js). Both images are decoded locally, faces are detected with a MobileNet-style detector, each face is encoded as a 128-dimensional descriptor by a FaceNet-inspired network, and the cosine distance between the two descriptors is converted to a similarity percentage. No image bytes ever leave your device — there is no server-side processing, no upload, no logging.

Use it for casual exploration: 'do my two cousins really look alike?', 'does that selfie still look like me five years later?', 'do these two unrelated celebrities look similar?'. It is fun, it is fast, and it gives you a number. We have intentionally chosen a permissive operating mode (single best face per image, no liveness check, no quality filtering) so that small or low-quality photos still produce a result instead of an error. That makes the tool friendly but it also means the percentages are not calibrated for biometric authentication. Treat the score as a relative indicator of visual similarity, not as identity proof.

Best results require front-facing, well-lit photos at decent resolution where the face occupies a meaningful portion of the frame. Photos with sunglasses, masks, heavy makeup, hats covering the eyebrows, side angles, or low resolution will distort the descriptor and lower the score, even when the same person is in both photos. Conversely, identical twins, parents and children, and even unrelated people of similar age, ethnicity, and hairstyle can produce surprisingly high scores. Plastic surgery, significant weight change, aging by ten or more years, or strong filters / face-tune software also matter — the model was not trained to be invariant to those.

We deliberately designed this tool as a curiosity rather than a verification system. It is not suitable for biometric identity verification, KYC (know-your-customer), border control, surveillance, employment screening, dating-app verification, or any situation where the result is used to allow or deny a person something. For those use cases you need a vendor system that has been benchmarked under NIST FRVT 1:1, that supports liveness detection (anti-spoofing), demographic-fairness evaluation, secure enrolment, and revocation. A free in-browser demo cannot satisfy those operational, legal, or audit requirements.

Privacy is by design. The model weights are downloaded to your browser once (about 6 MB, cached for future visits) and the entire comparison runs locally as JavaScript. There is no upload step, no temporary server-side cache, no API call. The page itself uses standard analytics for traffic counts only. Closing the tab clears all state. We don't store, log, sell, or share the photos you compare.

How the comparison works

The pipeline has three stages — detection, alignment + encoding, and distance scoring — and each stage uses an established open-source neural network. The detector is an SSD MobileNetV1 trained on WIDER FACE. It returns one or more bounding boxes per image with confidence scores; we keep the single most confident face per photo for comparison. If you want a multi-face workflow, see our Age & Gender Predictor, which iterates over every detection.

Each detected face is then aligned. A 68-point landmark detector (a small ConvNet trained on the iBUG 300-W dataset) predicts landmark coordinates: outer and inner eye corners, nose bridge, nose tip, mouth corners, and jawline. The face is rotated and cropped so the eyes are horizontal and the inter-pupillary distance is normalised. Alignment matters: the encoding network was trained on aligned faces and will produce inconsistent descriptors on non-aligned input.

The aligned crop is fed through a face-encoding network — in face-api.js this is a ResNet-34 architecture inspired by FaceNet (Schroff, Kalenichenko & Philbin, 2015) and trained with a triplet loss to produce 128-dimensional unit-length vectors that cluster tightly within identity and spread across identities. ArcFace (Deng et al., 2019) is a more recent improvement that uses an additive angular-margin loss; vladmandic's fork supports newer ArcFace-style backbones for higher accuracy when needed. We use the default ResNet-34 model for browser compatibility and footprint.

Two 128-d descriptors are compared with cosine distance: distance = 1 − (dot(a,b) / (|a||b|)). Smaller distance = more similar. The standard face-api.js threshold for 'same person' is approximately 0.6 (a 0.6 distance means roughly 70% similarity in our display). We map distance to a similarity percentage using a smooth curve: 0.0 → 100%, 0.4 → ~70%, 0.6 → ~50%, 1.0 → 0%. This mapping is empirical and friendly rather than calibrated against a biometric standard, so a score of 85% should be read as 'very similar' but not as a probability of being the same person.

All weights are quantised to 32-bit float for the in-browser TensorFlow.js runtime; total download is ~6 MB. Inference runs on WebGL when available (GPU-accelerated) and falls back to CPU via WebAssembly. End-to-end comparison of two faces typically takes 200 ms to 1 s on a modern laptop, longer on mobile. The UI shows a confidence-style bar and one of five qualitative bands (very similar, similar, somewhat similar, not similar, very different) chosen for friendliness, not for biometric rigour.

Accuracy, thresholds, and where this tool fails

On the LFW (Labelled Faces in the Wild) academic benchmark, well-trained 128-d face encoders reach ~99% verification accuracy on matched pairs. That number is not the accuracy you should expect on arbitrary internet photos. LFW pairs are pre-selected for image quality and frontal pose; in-the-wild performance is much noisier. NIST FRVT 1:1 — which evaluates dozens of commercial vendors on hundreds of thousands of operational photos — shows that even leading systems have FAR (False Acceptance Rate) and FRR (False Rejection Rate) values that vary by an order of magnitude with demographic, age, and image quality. Our open-source backbone is older and smaller than the leaders on FRVT.

Concrete failure modes you will encounter: identical twins almost always score above 80% — the encoder cannot reliably distinguish them. Parents and adult children, siblings, or unrelated people of the same ethnicity and similar age and hairstyle can all score 70–85%. The same person photographed ten years apart can drop to 50% if facial features have changed. Heavy filters (FaceTune, Snapchat, beauty filters) effectively edit a different face into the photo and will lower the score significantly. Glasses, masks, beards, hijabs, hats, and partial occlusion all reduce accuracy because they hide informative landmarks.

Demographic fairness is a known limitation. Buolamwini & Gebru (2018), NIST FRVT (2019, 2024), and many other audits have shown that face-recognition models trained predominantly on lighter-skinned subjects produce higher error rates for darker-skinned faces, women, and children. The face-api.js descriptor used here inherits those biases. Treat any single comparison cautiously, especially when one or both subjects are from groups that are under-represented in standard public training sets.

Do not use this tool as a biometric authenticator, identity-proofing system, fraud-prevention check, surveillance match, employment-screening filter, or dating-app verifier. For those uses you need an audited commercial system with liveness detection (so a printed photo or a deep-fake video doesn't pass), a revocation pipeline, and a documented bias assessment. We have no such guarantees and we explicitly tell you not to deploy this in production. It is a curiosity tool, and the score is a guideline, not a verdict.

  • Identical twins typically score 85–95% — the model cannot reliably tell them apart.
  • Same person aged 10+ years apart may drop to 50–70% similarity due to natural ageing.
  • Sunglasses, masks, beards, hats, or other occlusions block landmarks and reduce score.
  • Strong filters (FaceTune, beauty filters, Snapchat lenses) effectively edit the face and distort the descriptor.
  • Demographic fairness is uneven: darker skin tones, women, and children have higher error rates due to training-set imbalance.
  • The tool reports the single best face per image; group photos must be cropped to one face first.
  • There is no liveness detection — a printed photo of someone's face will produce the same descriptor as a live capture.
  • Not suitable for biometric identity verification, KYC, border control, employment, or dating-app safety checks.
  • Mapping from cosine distance to percentage is friendly and not calibrated against a biometric standard.

Glossary

Face embedding (descriptor)
A fixed-length numeric vector — here, 128 floating-point numbers — produced by a neural network that encodes the visual identity of a face. Photos of the same person should have similar embeddings; photos of different people should have dissimilar ones.
Cosine similarity / cosine distance
A geometric measure of how aligned two vectors are. Cosine similarity = dot(a,b) / (|a||b|), range [-1, 1]; cosine distance = 1 − cosine similarity. Used because descriptors live on a high-dimensional sphere.
Threshold
The cosine distance below which two descriptors are declared a match. face-api.js uses ~0.6 as a default; this corresponds to roughly 50% in our friendly UI scale. Lowering it makes the tool stricter (fewer false matches, more missed matches).
FAR (False Acceptance Rate)
The rate at which a face-matcher incorrectly says two different people are the same. Critical for security systems — a high FAR means impostors get through.
FRR (False Rejection Rate)
The rate at which a face-matcher incorrectly says photos of the same person are different. A high FRR means genuine users are inconvenienced.
FaceNet
A landmark 2015 paper by Schroff, Kalenichenko & Philbin (Google) that introduced the triplet-loss training scheme to produce 128-d face embeddings on a unit hypersphere.
ArcFace
A 2019 face-recognition loss function (Deng et al., InsightFace) that uses an additive angular margin to push descriptor classes further apart on the hypersphere. State-of-the-art on academic benchmarks like LFW and IJB-B.
LFW / NIST FRVT
Academic and government benchmarks for face-recognition systems. LFW (Labelled Faces in the Wild, 2007) is small and high-quality. NIST FRVT (Face Recognition Vendor Test) is the gold standard government evaluation, with hundreds of thousands of operational photos and ongoing publication.

Frequently Asked Questions

How does the AI compare two faces?

It detects one face per image, aligns each face using 68 facial landmarks, encodes each aligned face as a 128-dimensional vector with a FaceNet-style ResNet, and computes cosine distance between the two vectors. Cosine distance is mapped to a friendly 0–100% similarity score. All inference is JavaScript in your browser via @vladmandic/face-api on TensorFlow.js.

What does the percentage actually mean?

It is a smooth mapping of cosine distance to a 0–100 scale. In rough terms: 90–100% = visually nearly identical (same person, twins, or an extreme lookalike); 70–90% = same person likely, or close relatives; 50–70% = some shared features but not necessarily the same person; below 50% = different people. It is NOT a probability and it is NOT a biometric verification result.

Why do unrelated people score 60%?

Because the descriptor encodes the broad shape of the face — eye spacing, nose width, jaw angle, ethnicity, age — and many unrelated people share enough of those features to land in a similar region of the embedding space. This is a fundamental property of 128-d face descriptors, not a bug.

Why did my own photo score only 75% against another photo of me?

Common causes: (1) different lighting or camera angle; (2) different age (more than a few years can shift descriptors meaningfully); (3) glasses or facial hair in only one of the photos; (4) heavy filters or FaceTune in one photo; (5) one photo is much lower resolution; (6) you are wearing makeup in one and not the other. Try another pair of photos with similar conditions.

Are my photos uploaded?

No. All face detection, encoding, and comparison happens locally in your browser via TensorFlow.js. The model weights are downloaded once (about 6 MB, cached) and the inference runs on the JPEGs you select. Photo bytes never leave the device. We don't store, log, or share photos.

Can I use this for identity verification?

No. Do not use this tool to verify someone's identity, gate access to a service, prevent fraud, or screen employees / dates. It has no liveness detection, no calibrated threshold, no demographic fairness audit, and uses a smaller open-source model than commercial systems. For identity verification you need a vendor evaluated under NIST FRVT with documented FAR/FRR and operational guarantees.

Why are identical twins not distinguishable?

The face descriptor is trained to be invariant to lighting, expression, and pose, but is not designed to capture micro-features that even humans use to tell twins apart (small mole, slight asymmetry). Twin-discrimination is an active research area; standard face encoders generally fail at it.

Can it tell parents and children apart?

Often, but not always. Parents and adult children share many genetic facial features and may score 60–80%. The model is trained to put the same identity close together, but it has no notion of 'family resemblance' versus 'identity', so high scores between relatives are common and expected.

Does it support multiple faces in one photo?

Currently it picks the single most confident face per image and compares those two. If your photos are group shots, crop each one to a single face first. For multi-face workflows we have a dedicated Age & Gender Predictor that iterates over every detected face in an image.

What if no face is detected?

The tool reports 'no face detected' for that image. Common causes: face is too small (below detector minimum size), face is at an extreme angle, lighting is too dark or too bright, image is heavily blurred, or the image is not actually a photo of a face. Try a clearer, larger, frontal photo.

References & academic sources

  1. Schroff, F., Kalenichenko, D., & Philbin, J.. (2015). FaceNet: A Unified Embedding for Face Recognition and Clustering IEEE CVPR.
  2. Deng, J., Guo, J., Xue, N., & Zafeiriou, S.. (2019). ArcFace: Additive Angular Margin Loss for Deep Face Recognition IEEE CVPR (InsightFace project).
  3. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E.. (2007). Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments University of Massachusetts Amherst Technical Report.
  4. Grother, P., Ngan, M., & Hanaoka, K.. (2024). NIST Face Recognition Vendor Test (FRVT) — ongoing evaluation U.S. National Institute of Standards and Technology.
  5. Buolamwini, J., & Gebru, T.. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification Proceedings of Machine Learning Research.
  6. Mandic, V.. (2024). @vladmandic/face-api — maintained TypeScript fork of face-api.js Open-source project, MIT licence.

Last reviewed: · Reviewed by WuTools AI Ethics & Engineering Team