AI Depth Estimator
Free on-device depth map generator. MiDaS AI estimates relative depth from any photo in your browser. Colormaps, histogram, 16-bit PNG export, no upload.
About AI Depth Estimator
AI Depth Estimator uses MiDaS (Monocular Depth Estimation in the Wild), a state-of-the-art deep learning model that can estimate depth from a single 2D image. It creates a depth map showing relative distances of objects in the scene. All processing happens directly in your browser - no images are uploaded to any server.
Are my photos uploaded when I run depth estimation?
No. The AI Depth Estimator runs entirely in your browser using ONNX Runtime Web (no transformers.js, no backend). Your photo is decoded into an in-memory Canvas, resized to 256x256, and passed to the MiDaS model running on your CPU (WebAssembly) or GPU (WebGPU). The resulting depth map is rendered locally and never leaves the device — you can confirm in DevTools Network that no request carries your image bytes. This matters for personal photos, sensitive scenes, or any visual content you do not want to send to a cloud API. The ~66MB model weights are cached by the browser on first run, so subsequent estimations are faster and work offline.
Which depth model and input resolution does this tool use?
It uses MiDaS v2.1 small (midas_v21_small_256.onnx), Intel's well-established monocular depth network trained across many datasets to generalize over indoor and outdoor scenes. The model takes a fixed 256x256 RGB input that is ImageNet-normalized (mean subtract, std divide) before inference, then the predicted map is rescaled back to your image's dimensions for display and export. It is a single, fixed model — there is no model picker, resolution slider, or webcam mode. The trade-off is speed and small download size in exchange for a modest fixed resolution; fine details at object edges can be soft.
Is the output relative depth or metric (real-world) distance?
Relative depth only. MiDaS predicts inverse depth (disparity), so the tool tells you which pixels are closer or farther than others, but not absolute distances in meters. After inference the values are min-max normalized to [0,1] with the convention that 1.0 = nearest (foreground) and 0.0 = farthest (background). True metric depth would need a stereo camera, LiDAR, or a metric-fine-tuned model, and depends heavily on the scene matching the training distribution. Treat the Near/Mid/Far zone percentages and the histogram as relative estimates, not measurements.

How does the colormap, histogram, and side-by-side/overlay view help?
You can render the depth map with six perceptual colormaps (inferno, viridis, plasma, magma, grayscale, turbo) to read structure at a glance, and switch between Depth Map Only, Side by Side with the original, or a semi-transparent Overlay with adjustable opacity. The Invert toggle flips the brightness so Near=dark if you prefer that convention. The statistics panel reports min/mean/max normalized depth, splits the scene into Near/Mid/Far zones, and draws a 32-bin histogram of the depth distribution — useful for checking foreground/background separation before using the map for bokeh or compositing.
Can I export a 16-bit PNG and JSON for Blender, Photoshop, or Nuke?
Yes. Besides the colored depth-map PNG and the 8-bit grayscale PNG, you can export a true 16-bit grayscale PNG. 8-bit only has 256 levels and causes visible banding on smooth gradients (ground planes, sky); 16-bit gives 65,536 levels, which is the real deliverable for displacement mapping in Blender, depth-blur in Photoshop, and DOF/parallax compositing in Nuke. Photoshop, GIMP, Blender, and Krita all load 16-bit PNG natively. A JSON sidecar is also exported with the model name, 256x256 input resolution, colormap, invert flag, min/mean/max stats, and Near/Mid/Far zone percentages so your results stay reproducible and auditable.
Why does the tool show a WebGPU or WASM badge, and which is faster?
On load the tool tries the WebGPU execution provider first and falls back to WebAssembly (with SIMD and up to 4 threads) if WebGPU is unavailable, then shows a badge indicating the active backend. WebGPU offloads the matrix math to your GPU and is typically several times faster than CPU-only WASM, especially on larger images. On browsers without WebGPU (older Safari, some mobile devices) the tool automatically uses WASM so it still works everywhere — only the speed differs, not the result.
Why are object edges sometimes soft in the depth map?
Because MiDaS v2.1 small runs at a fixed 256x256 input, fine structures like hair, fences, wires, glass, and reflections can blur into the foreground or background, and the upscaled map inherits that softness. This is expected for a small, fast monocular model. For parallax and shallow depth-of-field effects the result is usually consistent enough; for crisper boundaries you can post-process the exported 16-bit map with edge-preserving (guided/bilateral) filtering in your 3D or compositing software.
