More games at WuGames.ioSponsoredDiscover free browser games — play instantly, no download, no sign-up.Play

File Type Detector

Detect a file's real type by its magic number, not its extension. Catch spoofed extensions, mismatched MIME types, and executables disguised as images.

Upload
Drag & drop a file here
or click to browse
Choose a file to detect its type

About File Type Detector

This tool detects the actual file type by analyzing magic bytes (file signatures) rather than relying on file extensions. Upload any file to discover its true MIME type and recommended extension. Useful for verifying file authenticity, detecting renamed files, or identifying unknown files. All processing happens in your browser for complete privacy.

How does a file type detector work without the extension?

Real file type detectors read the file's first few bytes — called the magic number or file signature — rather than trusting the extension. Most binary formats begin with a unique byte pattern: PNG starts with 89 50 4E 47 0D 0A 1A 0A, JPEG with FF D8 FF, PDF with 25 50 44 46 (which is ASCII for %PDF), and ZIP with 50 4B 03 04 (PK plus version bytes). The detector compares these bytes against a database of known signatures and returns the best match. This approach is much more reliable than extension-based detection because users can rename file.exe to file.txt, but they cannot easily change the embedded magic bytes without corrupting the file. The technique is documented in standards like RFC 2046 (MIME) and underpins tools like the Unix file(1) command.

Why does my file have the wrong extension but still opens correctly?

Modern applications often ignore the extension and inspect the actual content. When you double-click a file, the operating system uses the extension as a first guess to pick an application, but the application itself reads the magic bytes to decide how to parse the content. So a JPEG renamed to photo.png will still open in any image viewer because the viewer detects the FF D8 FF JPEG header and switches its parser accordingly. The extension matters mainly for the OS file-association layer and for users browsing folders. This is also why a malicious .pdf attachment can actually be a .exe — the OS may launch a PDF reader expecting valid PDF content, but if Windows is configured to hide extensions and the file is actually invoice.pdf.exe, it can execute as code. Always trust content-based detection over the extension for security decisions.

What are MIME types and how do they relate to file types?

MIME types (Multipurpose Internet Mail Extensions, defined in RFC 2045–2049 and registered via RFC 6838) are standardized strings like image/png, application/pdf, or text/html that describe a file's format for use in HTTP headers, email attachments, and web APIs. Each MIME type has a top-level category (text, image, audio, video, application, multipart, message, model, font) and a subtype. File type detectors usually return both the human-readable format name and the MIME type so the result can be used directly in code: setting a Content-Type header on a web response, choosing a file icon, or routing the file to the correct processor. IANA maintains the official registry of MIME types at iana.org/assignments/media-types — over 2000 are registered, but only a few hundred are common in practice.

When should I detect file type instead of trusting the extension?

Always detect by content when handling user uploads, processing email attachments, scanning malware, or building any system where security matters. Extensions are user-controlled metadata and can be wrong by accident (Windows hides extensions by default, so users rename casually) or by malicious intent (attackers disguise executables as images or documents). Detect by extension only for low-stakes UI hints — picking an icon in a file browser, sorting a folder, or guessing a starting application. Web applications uploading user content should reject files whose detected MIME does not match the claimed extension, or store the detected type and serve files with the correct Content-Type and Content-Disposition headers. The Open Web Application Security Project (OWASP) ranks file-type confusion as a common upload vulnerability.

What file types are hardest to detect reliably?

Plain text formats have the weakest signatures because they consist mostly of ASCII characters with no fixed header. Distinguishing between CSV, TSV, JSON, YAML, XML, Markdown, and source code often requires statistical analysis or schema sniffing — checking for common delimiters, balanced braces, or YAML's indentation rules. UTF-8 files may begin with a Byte Order Mark (EF BB BF) but this is optional. Container formats like ZIP, OOXML (.docx, .xlsx), JAR, and EPUB all share the same PK signature because OOXML and friends are technically ZIP archives with a specific internal layout. Detectors must read the central directory of the ZIP to find files like [Content_Types].xml or META-INF/MANIFEST.MF to refine the classification. Encrypted files, by design, look like random noise and cannot be classified beyond "high entropy."

File Type Detector — Detect a file's real type by its magic number, not its extension. Catch spoofed extensions, mismatched MIME types, and e
File Type Detector

Can a file have multiple valid file types or be a hybrid?

Yes — polyglot files are deliberately crafted to be valid in two or more formats simultaneously. A classic example is GIFAR (GIF+JAR), a file that loads as an image in a browser but executes as a Java archive in a JVM, used in early web attacks. PDF/JPEG and PDF/ZIP polyglots also exist because PDF tolerates trailing data while ZIP scans backward from the end of the file. These are not bugs in any single format but rather exploits of overlapping parser tolerances. Beyond polyglots, container formats like Matroska (MKV) and ISO BMFF (MP4) can hold many codecs, so the file type only narrows the wrapper — the actual audio and video streams require deeper inspection. For most everyday files, a single best-match classification is fine, but security-conscious systems should also flag any file that matches more than one signature.

How accurate is MIME sniffing in browsers, and what is the security risk?

Browsers historically performed aggressive MIME sniffing — overriding the server's Content-Type header by inspecting the first few hundred bytes of the response. This was helpful when servers misconfigured headers but became a security disaster: an HTML file served with image/png could still be interpreted as HTML and execute scripts, enabling content-injection attacks. The fix is the X-Content-Type-Options: nosniff response header (defined in WHATWG fetch standard), which forces browsers to honor the declared Content-Type. Modern browsers also restrict sniffing in many contexts (e.g., script tags require text/javascript). Server-side file type detectors complement this by ensuring uploaded content actually matches its declared type before storage, preventing users from uploading HTML disguised as PNG and tricking other users into running it.

What is the difference between file format, container, and codec?

A file format is the on-disk layout (byte order, headers, metadata sections, payload locations). A container is a specific kind of format designed to wrap streams of media data without dictating how those streams are encoded — examples include MP4, MKV, OGG, WebM, and AVI. A codec is the algorithm that compresses and decompresses the actual audio or video samples inside the container — examples include H.264, H.265, VP9, AV1 for video and AAC, MP3, Opus, FLAC for audio. The same container can hold many codecs, and the same codec can live in many containers. File-type detection identifies the container reliably from magic bytes but usually needs to parse internal metadata (the moov atom in MP4, the EBML head in MKV) to enumerate the codecs and tracks inside.

Which disguised file types are actually dangerous, and how do I tell a benign mismatch from a real threat?

Not every mismatch is an attack. A photo saved as photo.jpeg when the library calls it jpg, or a browser reporting image/jpg instead of image/jpeg, is a harmless naming variance — this tool grades those as SAFE because the extension and MIME are valid aliases of the real type. A genuine MISMATCH (amber) means the extension is simply wrong but the real content is still a passive data format, like a PNG saved with a .jpg name: annoying, rarely hostile. The dangerous pattern is HIGH RISK (red): the magic bytes reveal an executable or active-content format — a Windows PE/EXE (MZ, hex 4D 5A), a Linux ELF (hex 7F 45 4C 46), a Mach-O binary, a shell/batch script, a JAR, or a WebAssembly module — while the file wears a passive extension such as .jpg, .pdf, .png, or .docx. That is the classic upload/email malware disguise OWASP warns about: a victim trusts the image or document icon and runs code instead. When you see HIGH RISK, do not open the file; quarantine it and inspect it in a sandbox. Archives (ZIP) and HTML hidden under image or document extensions deserve the same suspicion because they can carry scripts or trigger MIME-sniffing attacks.

What do the hex magic-byte signature values mean?

The Magic Byte Signature field shows the file's first 16 bytes in hexadecimal — the same raw header the Unix file(1) command and antivirus engines read to identify content. Each pair (two hex digits, 00–FF) is one byte. Known formats begin with a fixed pattern you can verify by eye against a signature reference: JPEG is FF D8 FF, PNG is 89 50 4E 47 0D 0A 1A 0A, PDF is 25 50 44 46 (ASCII %PDF), GIF is 47 49 46 38 (GIF8), ZIP and Office files are 50 4B 03 04 (PK..), a Windows executable is 4D 5A (MZ), and an ELF binary is 7F 45 4C 46. If a file named image.jpg shows 4D 5A instead of FF D8 FF, the hex alone proves it is really an executable. Reading the header by eye is standard practice in incident response, lets you confirm the tool's verdict, and helps classify rare formats the library does not recognize.