Extract Images from PDF

Free online tool to extract all images from a PDF file. Choose PNG, JPEG, or WebP output, set minimum size, deduplicate, and download as ZIP. No upload.

Have feedback? Report bugs, suggest features, or share your thoughts — we read them all

About Extract Images from PDF

Extract Images from PDF is a browser-only tool that pulls every embedded raster image out of a PDF document and lets you save them individually or as a single ZIP archive. The extractor uses Mozilla's PDF.js to parse each page, walks the page's operator list to find paintImageXObject and paintJpegXObject commands, resolves those references against the page's object store, and rebuilds each image on an HTML canvas in PNG, JPEG, or WebP format of your choosing. Because the entire pipeline runs locally in your browser tab, sensitive PDFs (contracts, scanned IDs, internal reports, medical records) never leave your device, no upload is performed and no server stores your file. A minimum-size slider lets you ignore tiny decorative icons and headers; a deduplication switch hashes each image with SHA-256 so that the same logo repeated on every page is saved only once. Output formats trade off quality and file size: PNG preserves transparency and lossless quality (good for screenshots, diagrams, line art); JPEG produces 2-5x smaller files for photos at the cost of some quality; WebP often beats both with modern codecs. Most users get clean, full-resolution extractions out of standard PDFs; the only case where extraction is impossible is when a PDF's images are encrypted under JBIG2 or proprietary stream filters, which is rare in everyday documents.

How does this tool find images inside a PDF?

PDFs store images as XObject streams referenced from each page's content stream. We use PDF.js to parse the document and call page.getOperatorList(), which gives us the sequence of drawing commands. We scan that list for paintImageXObject, paintImageXObjectRepeat, paintJpegXObject, and paintInlineImageXObject opcodes, each of which carries the name of an image object. We then look up that name in page.objs, which returns either an already-decoded ImageBitmap or a raw pixel buffer with a colorspace tag. The pixel buffer is painted onto an HTML canvas at the image's native resolution and exported via canvas.toBlob() to PNG, JPEG, or WebP. This approach catches every standard inline image and XObject image used by a PDF; it does not catch vector graphics drawn with path operators, which are not raster images at all.

Are the extracted images at full original resolution?

Yes. We decode each image at its native pixel dimensions as embedded in the PDF, not at the on-page rendered size. So a 3000x2000 photograph squeezed onto a quarter of an A4 page in the PDF is still extracted at the full 3000x2000 pixels. If a JPEG was stored at quality 70 inside the PDF, you cannot recover detail it had been compressed out of, but you do get exactly the bytes the PDF carried. For PDFs where the same photo is downsampled to multiple resolutions (for thumbnail vs. full-page display), the tool extracts every variant so you may see two or three near-duplicates; the dedupe switch can collapse them by SHA-256 hash if they are byte-identical, but not if they differ in resolution.

Why does the tool skip some images and how does the minimum size work?

The minimum-size slider lets you ignore raster images below a width or height threshold (default 32 pixels). This is useful because PDFs are full of tiny decorative graphics: bullet markers, page borders, font subset rasters, watermarks. Setting the threshold to 100 or 200 typically filters out everything that is not a real photo, diagram, chart, or scanned page. Set it to 0 if you want absolutely every image including invisible spacers and 1x1 anti-aliasing pixels. The size check uses the image's native dimensions, not its display dimensions on the page, so a logo embedded at 400x400 in the PDF will pass even if it is rendered tiny in a corner.

Extract Images from PDF — Free online tool to extract all images from a PDF file. Choose PNG, JPEG, or WebP output, set minimum size, deduplicate, — **Extract Images from PDF**

Does this work on encrypted or password-protected PDFs?

For owner-password-protected PDFs (which restrict editing but allow viewing), yes — PDF.js opens them transparently. For user-password-protected PDFs that require a password to view, the document will fail to parse and you will see an error; first unlock the PDF using the WuTools Unlock PDF tool with the correct password. Many scanned PDFs are not actually images per page but contain real raster XObjects we can extract; on the other hand, PDFs containing JBIG2-encoded scans (common in compressed archival scans) may render correctly in viewers but PDF.js cannot always decode JBIG2 to a pixel buffer, in which case those particular images will be missed. Modern documents from Word, LibreOffice, InDesign, browser print-to-PDF, and most scanning apps use formats we handle fully.

Is my PDF uploaded to a server?

No. The PDF is read into a JavaScript ArrayBuffer in your browser tab and parsed entirely client-side by PDF.js. Image decoding, canvas export, hashing for deduplication, and ZIP packaging all run on your CPU. The only network traffic is fetching the PDF.js library and worker file from a public CDN on the first load (and then cached). You can verify by watching the Network tab in DevTools before clicking Extract: no upload request will be made. This makes the tool safe for confidential PDFs (financial reports, NDAs, legal contracts, medical records, internal presentations) where uploading to a third-party SaaS would be unacceptable.

What is the maximum PDF size I can process?

Practically, you can extract from PDFs up to about 200 MB on a modern desktop with 8 GB of RAM, and up to 50 MB on most phones. The bottleneck is browser memory rather than disk: PDF.js needs to hold the parsed document plus decoded pixel buffers for each image as it processes pages. If you have a very large PDF (e.g. a 1 GB image archive), split it first with the Split PDF tool, run extraction on each chunk, and combine the resulting ZIPs. The tool processes pages strictly in order and frees per-page memory as it goes, so peak memory usage is roughly proportional to the largest single image rather than the total document size.