More games at WuGames.ioSponsoredDiscover free browser games — play instantly, no download, no sign-up.Play

AI Depth Estimator

Create depth maps from 2D images using MiDaS AI model. Estimate distance and depth information from any photo. Free online monocular depth estimation tool.

AI Uses MiDaS AI model for monocular depth estimation. The model downloads automatically when you first estimate depth (~66MB).
Upload
Drag and drop an image here
or click to browse (JPG, PNG, WebP)

About AI Depth Estimator

AI Depth Estimator uses MiDaS (Monocular Depth Estimation in the Wild), a state-of-the-art deep learning model that can estimate depth from a single 2D image. It creates a depth map showing relative distances of objects in the scene. All processing happens directly in your browser - no images are uploaded to any server.

Are my photos uploaded when I run depth estimation?

No. The AI Depth Estimator processes images entirely in your browser using Transformers.js with ONNX Runtime Web. Your photo is decoded into a Canvas in memory, passed to a depth model running on your CPU or GPU, and the resulting depth map is displayed without ever leaving the device. There is no upload, no logging, no backend call — you can confirm in DevTools Network that no XHR carries your image bytes. This is essential for personal photos, sensitive scenes, surveillance frames, medical imaging tests, or any visual content where you want monocular depth without trusting a cloud API. The model weights are cached on first load so subsequent estimations are fully offline.

Which depth-estimation model is used by default?

The default is Intel MiDaS-small or Depth-Anything-small, two state-of-the-art monocular depth models distilled and quantized for browser inference. MiDaS (Multi-Image Dataset for Depth) was introduced by Intel in 2019-2020 and trained on a mix of 10+ datasets to generalize across indoor and outdoor scenes. Depth-Anything (Yang et al., 2024) uses a DINOv2-pretrained ViT backbone and 1.5M labeled + 62M unlabeled images, and currently leads zero-shot monocular depth benchmarks. The small variants are around 25-40 MB after INT8 quantization and run at 5-15 frames per second on WebGPU. The output is a relative depth map — brighter pixels are closer, darker are farther — not metric depth in meters.

What is the difference between relative depth and metric depth?

Relative depth tells you which pixels are closer or farther than which others, but does not give absolute distances in meters. The output is typically normalized to [0, 1] or rescaled to fill a grayscale range. Metric depth requires the model to output actual distances calibrated by the camera's focal length and sensor size, which is much harder from a single image because of the scale ambiguity inherent to monocular vision. Models like Depth-Anything V2 Metric, ZoeDepth, or Marigold can produce approximate metric depth, but accuracy depends on whether the input scene resembles the training distribution. This tool returns relative depth by default; for absolute distances you would need a stereo camera, LiDAR, or a metric-fine-tuned model.

Can I use the depth map to create a 3D effect or parallax animation?

Yes — relative depth maps are perfect for 2.5D parallax effects, fake 3D photos (the kind Facebook popularized in 2018), or generative 3D-aware editing. The standard pipeline is: feed the RGB image and the depth map into a fragment shader that displaces texture coordinates by depth, then animate the camera position. WebGL or Three.js can do this in real time. For higher-quality 3D meshes, you can lift the depth map into a point cloud (each pixel becomes a 3D vertex at depth z) and reconstruct a textured mesh. The depth from this tool is consistent enough for parallax and shallow-DOF effects but may produce flat or warped regions for textureless surfaces like blue sky or white walls.

AI Depth Estimator — Create depth maps from 2D images using MiDaS AI model. Estimate distance and depth information from any photo. Free onli
AI Depth Estimator

Why are the edges of my objects sometimes blurry in the depth map?

Depth models struggle at object boundaries because the network's effective receptive field blurs sharp depth discontinuities. Hair, fences, glass, water reflections, and thin structures like wires often get smoothed into the background or foreground depth. Depth-Anything reduces this with a teacher-student distillation pipeline that uses 62M unlabeled images for sharper edge supervision, and is significantly better than MiDaS on fine structures. For maximum edge fidelity, run the input at higher resolution (518x518 or 1036x1036) and apply edge-preserving guided filtering as a post-process. The tool exposes a resolution slider — at the cost of slower inference, higher resolution typically gives crisper boundaries.

How does inference speed compare on WebGPU vs WebAssembly?

MiDaS-small at 384x384 input takes roughly 150-250 ms on WebAssembly with SIMD on a mid-range laptop CPU (4 cores), or about 4-6 fps. On WebGPU with a recent integrated GPU (Intel Iris Xe or Apple M-series) the same model runs in 25-50 ms (20-40 fps), a 5-8x speedup. Depth-Anything-small is heavier (vision transformer) and benefits even more from WebGPU because attention layers are matrix-multiply-bound. For real-time webcam depth at 30 fps, WebGPU is effectively required. The tool auto-selects the backend; check the badge in the toolbar to see which one is active. On Safari before 18.0, WebGPU is disabled by default and you may need to enable it via Develop → Experimental Features.

Should I prefer a CNN-based model (MiDaS) or a transformer (Depth-Anything)?

Both architectures have merits. MiDaS-small uses an EfficientNet/MobileNet backbone with a multi-scale CNN decoder, which makes it extremely fast on CPU and small enough for memory-constrained devices (around 25 MB INT8). Depth-Anything uses a DINOv2 ViT backbone, which gives substantially better zero-shot accuracy on novel scenes — the original paper reports a 10-15% lower RMSE on indoor NYU and outdoor KITTI than MiDaS — but the ViT is heavier and slower on CPU. Rule of thumb: use MiDaS-small for CPU-only browsers, mobile, or webcam streams under 480p; use Depth-Anything-small for higher-quality single-image processing on a desktop browser with WebGPU.

Can I export the depth map as a 16-bit PNG for use in Blender or Photoshop?

Yes — 8-bit grayscale PNG is convenient for previewing but only gives 256 depth levels, which causes visible banding in shallow gradients (smooth ground planes, sky). 16-bit PNG gives 65536 levels, which is enough for high-quality 3D effects, displacement mapping in Blender, and Photoshop depth-blur filters. The tool offers both export formats: 8-bit for quick sharing, 16-bit when you intend to use the map in 3D software. Most image libraries (Photoshop, GIMP, Blender, Krita) load 16-bit PNG natively. For even more precision you can export as 32-bit float EXR, but that requires a separate codec; most depth pipelines do fine with 16-bit PNG.