AI Pose Estimator
Detect human body poses in images using MoveNet AI. Identify 17 keypoints including joints and facial features. Free online pose estimation tool.
About AI Pose Estimator
AI Pose Estimator uses MoveNet, a state-of-the-art pose detection model from TensorFlow, to identify human body poses in images. It detects 17 keypoints including facial features (eyes, ears, nose) and body joints (shoulders, elbows, wrists, hips, knees, ankles). All processing happens directly in your browser - no images are uploaded to any server.
Does this pose estimator send my webcam or photo to a server?
No. The AI Pose Estimator runs the entire pose-detection pipeline in your browser using MediaPipe Pose or MoveNet via TensorFlow.js. Your webcam stream or uploaded photo is decoded into a Canvas/VideoFrame in memory, the keypoint detector runs on your CPU or GPU, and the resulting 33 (or 17) body landmarks are drawn back onto the canvas — all without a single byte leaving your device. There is no upload, no telemetry, no cloud inference. This is essential for fitness apps that should not stream your training videos to a third party, for medical posture screening where patient privacy is regulated, and for any AR/VR experience that needs sub-50ms latency only achievable with local inference.
Which pose model is used and how many body landmarks does it detect?
The default is Google MediaPipe Pose (Pose Landmarker), which detects 33 3D landmarks covering the full body — face outline, shoulders, elbows, wrists, hips, knees, ankles, plus hand and foot keypoints. The lite/full/heavy variants offer trade-offs between speed and accuracy: lite (~6 MB, 60+ fps) is great for mobile, full (~25 MB) gives standard accuracy at 30 fps on most laptops, heavy (~50 MB) for offline fitness analysis. As an alternative the tool supports MoveNet (Lightning/Thunder) from TensorFlow.js, which detects 17 COCO-format keypoints and is even faster than MediaPipe-lite on CPU. Both also output a per-keypoint confidence score so you can filter out low-confidence joints.
How accurate is browser-based pose estimation for fitness or physical-therapy tracking?
MediaPipe Pose-full achieves a Percent of Detected Keypoints ([email protected]) of around 92% on the standard COCO val set, which is on par with cloud-based pose APIs from 2022. For most fitness use cases — counting reps of squats, push-ups, lunges, plank-hold timing, posture alerts — the accuracy is more than sufficient. For physical therapy, joint-angle measurement (knee flexion, shoulder abduction) is reliable within ±5 degrees in good lighting with the camera at hip height. The main limitations are: depth perception is approximate (2.5D, not true 3D), occlusion of joints behind the body or limbs cuts accuracy significantly, and side-views are harder than front-views because hip and shoulder keypoints align in 2D.
Can it track multiple people in the same frame at once?
MediaPipe Pose detects a single person per frame by design — it is optimized for the most prominent body in view and offers extremely low latency. For multi-person tracking you can switch to MoveNet MultiPose (also TensorFlow.js, ~12 MB), which detects up to 6 people simultaneously by first running a person-detector and then a per-instance keypoint head. The trade-off is that MoveNet MultiPose runs at ~15 fps on a typical laptop instead of 60+ for single-person, and the keypoint accuracy on each person drops slightly. For dance studios, group fitness, or sports analysis with multiple athletes, MultiPose is the right choice; for solo workouts or yoga apps, stick with MediaPipe.

How does MediaPipe Pose differ from OpenPose or YOLOv8-Pose?
OpenPose (CMU, 2017) is the historical multi-person pose pioneer, using bottom-up part-affinity fields, but the model is huge (~200 MB) and slow without a CUDA GPU — impractical for browser deployment. YOLOv8-Pose is a unified detection+keypoint model that runs well on GPU and gives strong multi-person results in 17 COCO format. MediaPipe Pose uses a two-stage top-down approach: a person detector localizes the body, then a keypoint regressor refines 33 landmarks in 3D. The two-stage design is much faster on CPU (mobile-first) and gives smoother temporal tracking because the second stage is initialized from the previous frame. For browsers, MediaPipe is the practical winner; YOLOv8-Pose wins when you have a GPU server.
Can I run pose estimation in real time from my webcam?
Yes — this is exactly what MediaPipe Pose is designed for. The tool uses navigator.mediaDevices.getUserMedia to request webcam access (the browser asks for your permission), pipes frames into the pose model via WebGL/WebGPU, and overlays the skeleton in real time. On a 2020+ laptop with integrated GPU you can expect 30-60 fps for MediaPipe-lite single-person, or 15-25 fps for MoveNet MultiPose. The webcam feed never leaves your computer. To minimize latency, the tool uses requestVideoFrameCallback when available (Chrome 83+) which gives sub-frame-rate scheduling accuracy — important for live applications like AR mirror filters, sign-language interpreting, or motion-capture for indie game dev.
Does the tool support 3D pose or only 2D?
MediaPipe Pose outputs both 2D (x, y in image pixels) and approximate 3D world coordinates (x, y, z in meters relative to the hip center). The 3D z-coordinate is estimated from the 2D body proportions and is reasonably accurate when the camera is roughly perpendicular to the subject, but it is not metric-quality 3D. For genuine 3D you would need a depth sensor (LiDAR, Kinect, structured light) or multi-view triangulation with two or more synchronized cameras. The 3D output from MediaPipe is sufficient for AR effects, basic motion analysis, and animating 3D avatars (this is what VTuber tools use), but not for biomechanical research-grade measurement.
What is the difference between MediaPipe (BlazePose) and MoveNet — which should I pick?
BlazePose / MediaPipe Pose was developed by Google Research for AR and fitness with 33 landmarks and built-in 3D estimation. It is implemented in C++ with WebAssembly bindings and tightly integrated with the MediaPipe Solutions JS API. MoveNet was developed by Google TensorFlow team for fitness with 17 COCO keypoints, in two sizes: Lightning (fastest, mobile) and Thunder (more accurate). MoveNet is built on TFJS and TFLite which makes it easier to fine-tune in the standard TF training stack. Rule of thumb: use MediaPipe Pose for AR/avatar applications needing 33-point coverage including face and feet; use MoveNet when you only need the standard 17 COCO joints and want maximum CPU speed.
