AI Object Detector
Detect 80 object types in images with the COCO-SSD AI model, fully on-device. Real-time webcam, bounding box JSON/CSV export, and confidence scores.
About AI Object Detector
AI Object Detector uses COCO-SSD, a pre-trained object detection model that can identify 80 different object categories including people, vehicles, animals, furniture, electronics, and more. All processing happens directly in your browser using TensorFlow.js - no images are uploaded to any server.
Does object detection happen in my browser, or are my images uploaded?
All detection runs locally in your browser. The COCO-SSD model (SSD-MobileNet v2 weights, about 6 MB) is downloaded once from a CDN via TensorFlow.js, cached by the browser, and then every inference happens client-side on the WebGL (GPU) backend, falling back to WebAssembly or CPU if WebGL is unavailable. Your photos never leave your device — there is no upload, no server-side processing, no logging. This matters for surveillance footage, ID photos, internal documents, or any picture covered by GDPR or workplace confidentiality rules. The only network traffic after the initial model download is the static HTML/CSS/JS for the page; you can verify this in DevTools > Network: you will see coco-ssd and tfjs load, and no POST request is sent when you detect.
Which model and architecture does this tool actually run?
It runs COCO-SSD as published by the TensorFlow.js team: a Single-Shot Detector (SSD) with a MobileNet v2 backbone, trained on the COCO dataset. It is a single-shot convolutional detector that predicts class probabilities and box coordinates in one forward pass, which is why it is fast enough for real-time webcam use on modest hardware. This tool does not use YOLO, ONNX Runtime, DETR, or EfficientDet — if you inspect DevTools > Network you will see @tensorflow-models/coco-ssd and @tensorflow/tfjs, nothing else. The model is roughly 6 MB and is cached after the first load, so subsequent visits start instantly.
Which image formats can I use, and how is the image fed to the model?
The tool accepts any format your browser can decode: JPEG, PNG, WebP, AVIF, GIF (first frame), BMP, and many others. You can upload a file, load an image by URL, or capture a frame live from your webcam. Internally the image is drawn to a canvas and passed directly to the COCO-SSD detect() call; SSD-MobileNet v2 resizes it to its own fixed input internally, so you do not need to pre-resize. HEIC from iPhone usually works in Safari and recent Chrome; older browsers may need you to export to JPEG first.
How many object classes can the model recognize?
COCO-SSD recognizes the 80 COCO categories: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch, potted plant, bed, dining table, toilet, TV, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair drier, and toothbrush. For specialized domains (medical, retail, manufacturing, wildlife species) you would need a model fine-tuned on a domain dataset — this tool only covers those 80 everyday classes.

Why does the detector miss small or partially hidden objects?
Small-object detection is the well-known weakness of single-shot detectors like SSD. SSD-MobileNet v2 works on a relatively small internal feature map, so a tiny face in a high-resolution photo can fall below the resolution the network can resolve. Occlusion (objects hidden behind others) is also hard because the convolutional features blend together. Practical workarounds: crop and re-detect on the region of interest, lower the confidence threshold to surface borderline cases (at the cost of false positives), or photograph the subject larger in frame. For demanding small-object or specialized work, a larger server-side detector would be more accurate but is not what this in-browser tool targets.
What does the confidence score mean and how should I set the threshold?
Each detection carries a confidence score from 0 to 1: the model's estimated probability that the object exists at the predicted box. The confidence-threshold slider filters the results — raise it (for example to 0.6) for cleaner, high-precision output, or lower it (to 0.2) to catch borderline cases at the cost of more false positives. COCO-SSD already applies non-maximum suppression internally to remove duplicate overlapping boxes, so you control the result purely through the confidence threshold and the max-detections cap. Confidence is an estimate, not ground truth.
What are the accuracy caveats — can I rely on this for critical decisions?
Treat every result as an estimate, not a verified fact. COCO-SSD is a general-purpose detector limited to 80 everyday classes; it is not an identity-verification, medical, legal, or safety system, and it does not recognize specific people, brands, text, or fine-grained species. It can miss small or occluded objects, mislabel visually similar classes, and produce false positives at low thresholds. Use it for triage, tagging, dataset bootstrapping, QA, and integration prototyping — and always have a human verify before any decision that matters.
What is the export schema for the bounding boxes (JSON/CSV)?
Download JSON, Copy JSON, and Download CSV all export the same data, reflecting the currently visible (class-filtered) detections. Coordinates are in the original image's pixel space with a top-left origin: x and y are the top-left corner of the box, width and height are its size in pixels. JSON gives an array of detections, each with class (string), confidence (0-1, rounded to 3 decimals), and boundingBox { x, y, width, height }, plus a coordinateSystem note and an ISO timestamp. CSV uses the columns index, class, confidence, x, y, width, height. The on-screen Detection Table shows the same fields (confidence as a percentage) so you can scan, sort, or paste results straight into code or a spreadsheet without a download round-trip.
