More games at WuGames.ioSponsoredDiscover free browser games — play instantly, no download, no sign-up.Play

AI Object Detector

Detect and identify 80+ types of objects in images using COCO-SSD AI model. Real-time webcam detection, bounding boxes, and confidence scores. Free online tool.

AI Uses COCO-SSD AI model to detect 80+ object types. The model downloads automatically when you first detect objects (~6MB).
Upload
Drag and drop an image here
or click to browse (JPG, PNG, WebP)

About AI Object Detector

AI Object Detector uses COCO-SSD, a pre-trained object detection model that can identify 80 different object categories including people, vehicles, animals, furniture, electronics, and more. All processing happens directly in your browser using TensorFlow.js - no images are uploaded to any server.

Does object detection happen in my browser, or are my images uploaded?

All detection runs locally in your browser. The YOLO/MobileNet weights are downloaded once via Transformers.js or TensorFlow.js, then every inference happens client-side using WebGPU or WebAssembly. Your photos never leave your device — there is no upload, no server-side processing, no logging. This is critical for surveillance footage, medical imagery, ID photos, or any picture covered by GDPR, HIPAA, or workplace confidentiality rules. The only network traffic after the initial model download is the static HTML/CSS/JS for the page itself; you can verify this by opening DevTools > Network, dropping an image, and watching that no POST request is sent.

Which image formats can I drop into the detector?

The tool accepts every format a modern browser can decode: JPEG, PNG, WebP, AVIF, GIF (first frame), BMP, and SVG (after rasterization). It also handles screenshots from clipboard paste and frames captured live from your webcam. Internally, the image is rendered to a hidden canvas, resized to the model's expected input (typically 640x640 for YOLO, 300x300 for SSD-MobileNet, 320x320 for EfficientDet-Lite), normalized to floats in 0-1 or -1 to 1 depending on the model's preprocessing spec, and fed as a tensor. HEIC from iPhone usually works in Safari and recent Chrome; older browsers may need you to export to JPEG first.

How many object classes can the model recognize?

The default checkpoint is trained on COCO, which contains 80 everyday categories: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, dog, cat, bird, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, ball, baseball bat/glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, fruits and food items, chair, couch, bed, dining table, toilet, TV, laptop, mouse, remote, keyboard, cell phone, microwave, oven, sink, refrigerator, books, clock, vase, scissors, teddy bear, hair drier, toothbrush. For specialized domains (medical, retail, manufacturing, wildlife species) you need a fine-tuned model trained on a domain dataset like Open Images, LVIS, or a private corpus.

Why does the detector miss small or partially hidden objects?

Small-object detection is the long-standing weakness of single-shot detectors like YOLO and SSD. The image is downsampled to a fixed input size (640x640 for YOLOv8), so a 30-pixel face in a 4K photo becomes about 5 pixels after resize — below the resolution the network can resolve. Occlusion (objects hidden behind others) is also hard because the convolutional features blend together. Workarounds: use a higher-resolution input (YOLOv8x trained at 1280x1280 helps but doubles compute), crop and re-detect on regions of interest, run "tiled inference" splitting the image into overlapping 640x640 tiles, or switch to a two-stage detector like Faster R-CNN which is more accurate but much slower and rarely available in browsers.

AI Object Detector — Detect and identify 80+ types of objects in images using COCO-SSD AI model. Real-time webcam detection, bounding boxes,
AI Object Detector

What do the confidence score and IoU threshold actually mean?

Each detection has two key numbers. Confidence (0-1) is the model's estimated probability that the object exists at the predicted box. The default threshold of about 0.25 keeps detections the model is at least somewhat sure about; raise it to 0.5 for cleaner output, lower it to 0.1 to catch hard cases at the cost of false positives. IoU (Intersection-over-Union) controls Non-Maximum Suppression: when the model proposes two overlapping boxes for the same object, NMS keeps the higher-confidence one and discards the other if their IoU exceeds the threshold (default ~0.45). Lowering IoU is more aggressive (fewer duplicates), raising it lets more overlapping detections through — useful for crowds where people physically overlap.

How accurate is browser-side YOLO compared to the server version?

Numerically identical for a given quantization level. The browser uses the same ONNX or TensorFlow.js export of the official Ultralytics or PyTorch weights, so a YOLOv8n quantized to INT8 will produce identical bounding boxes and confidence scores whether it runs in Chrome, Node.js, or a Python server. What changes is throughput: a server with an NVIDIA A100 reaches 1000+ FPS at 640x640, while WebGPU on an M2 MacBook hits 30-60 FPS, and WebAssembly on a 5-year-old laptop drops to 2-5 FPS. For real-time webcam detection, prefer a small "n" or "s" YOLO variant on WebGPU; for batch processing single images, accuracy-first variants like YOLOv8m or YOLOv8l are practical even on CPU.

Which detection architecture is used — YOLO, SSD, EfficientDet, or DETR?

The default is YOLOv8 (nano or small) in ONNX format, a single-shot anchor-free CNN detector that predicts class probabilities and box coordinates in one forward pass per image. YOLO trades a little accuracy for huge speed, which is essential in the browser. SSD-MobileNet is available as a lighter fallback (lower mAP, faster on low-end mobile). EfficientDet-Lite is a TensorFlow.js option with a better accuracy/compute Pareto curve on COCO. DETR (DEtection TRansformer) is research-grade and not yet practical in-browser due to model size and inference latency. For most browser-side use cases, YOLOv8n at 640x640 with WebGPU is the sweet spot — about 6 MB INT8, real-time, 37+ mAP on COCO.

What is INT8 quantization for a detector and does it affect accuracy?

Quantization converts model weights from 32-bit floats to 8-bit integers, shrinking the file 4x (a YOLOv8n drops from about 12 MB FP32 to roughly 3 MB INT8) and doubling CPU inference speed. For COCO detection, dynamic INT8 typically loses 0.5-1.5 mAP — invisible on everyday images but measurable on benchmark suites. Per-channel INT8 with calibration loses even less. INT8 also enables WebNN/NPU acceleration on supported devices (recent Snapdragon, Apple Neural Engine via Core ML web bridge). The ONNX Runtime Web wasm-simd backend automatically handles INT8 dequantization at runtime, so you get the size/speed benefits without writing any low-level code yourself.