Parquet Viewer

Browser-native Apache Parquet viewer. Preview columnar data, inspect schema, export to CSV or JSON. Powered by WebAssembly — files stay on device.

Upload
Drag & drop a Parquet file here
or click to browse your device
Choose a Parquet file (.parquet) to preview
Limit preview for large files

About Parquet Viewer

Apache Parquet is the de facto columnar storage format for big data — created at Twitter and Cloudera in 2013, donated to the Apache Software Foundation, and now the default file format for Snowflake stages, AWS S3 data lakes, Google BigQuery exports, Databricks Delta Lake, and dbt models. Its columnar layout, Snappy/ZSTD compression, and rich schema metadata make it 3-10x smaller than equivalent CSV while supporting predicate pushdown and column pruning for fast analytical queries. But unlike CSV, you cannot open a Parquet file in Excel or a text editor — the binary layout requires specialized tooling. This viewer uses the parquet-wasm library (a Rust implementation compiled to WebAssembly) to read your file entirely in the browser, render a spreadsheet preview, and export to CSV or JSON without ever uploading bytes to a server.

What is a Parquet file?

Apache Parquet is a columnar storage file format optimized for use with big data processing frameworks. It provides efficient data compression and encoding schemes, making it popular for data analytics, data lakes, and machine learning pipelines. Parquet files are widely used with tools like Apache Spark, Hadoop, and AWS Athena.

Does my data leave my device?

No. All Parquet parsing and processing happens locally in your browser using WebAssembly (parquet-wasm). Your data never leaves your machine, ensuring complete privacy for sensitive datasets like customer data, financial records, or confidential analytics.

Can I edit Parquet data?

This tool is read-only for viewing Parquet files. You can preview the data and export it to CSV or JSON formats. If you need to edit the data, export to CSV first and use our CSV Viewer & Editor tool.

What file size can I view?

The tool can handle Parquet files of various sizes. For very large files (>100MB), you may want to limit the number of displayed rows to ensure smooth performance. The tool uses efficient WebAssembly parsing to handle files quickly.

Can I export to different formats?

Yes. You can export your Parquet data as a CSV file (comma-separated) or JSON format. This makes it easy to use the data in spreadsheet applications, databases, or web applications.

Parquet Viewer — Browser-native Apache Parquet viewer. Preview columnar data, inspect schema, export to CSV or JSON. Powered by WebAssemb
Parquet Viewer

Why use Parquet format?

Parquet is ideal for big data and analytics because it stores data in columns rather than rows. This provides better compression, faster query performance for analytical workloads, and efficient encoding schemes. It's widely used in data engineering, data science, and cloud data warehouses.

How does Parquet compare to CSV, JSON, and Avro?

CSV is row-oriented, uncompressed, untyped, and human-readable — perfect for small handoffs but slow and bloated for analytics. JSON is row-oriented with full nesting and types, but verbose. Avro is a row-oriented binary format with embedded schema, good for streaming data (Kafka) where you write once and replay sequentially. Parquet is columnar with embedded schema and aggressive compression — files are typically 30-50% the size of equivalent gzipped CSV. The advantage shows up in analytical queries: SELECT avg(price) FROM 10M rows reads only the price column from disk (column pruning), and predicate pushdown skips entire row groups that fail the WHERE clause. For interactive analytics on >1M rows, Parquet is 5-50x faster than CSV. For row-by-row inserts or single-record lookups, a row-oriented format or a database is still better.

What compression codecs does Parquet support?

Parquet supports six compression codecs per column chunk: SNAPPY (the default — fast compress/decompress, modest ratio), GZIP (smaller files but slower), ZSTD (modern, faster than GZIP at similar ratios — recommended for new pipelines since Parquet 2.4), LZ4_RAW (the fastest, lowest ratio), BROTLI (best ratio for text-heavy columns, slower compress), and UNCOMPRESSED. The compression is applied per column, so a single file can mix codecs based on what works best per column type. SNAPPY became the default because it offers a good balance of speed and size for typical analytical workloads; ZSTD level 3 is now the recommendation for cold storage where read speed matters less than disk savings. To check what codec a file uses, this viewer shows it in the schema metadata; from the CLI, run 'parquet-tools meta file.parquet'.

What is the typical structure of a Parquet file?

A Parquet file is organized hierarchically: file -> row groups -> column chunks -> pages. Row groups are horizontal partitions (typically 128 MB or 100k-1M rows) that allow parallel reading. Within each row group, data for each column is stored together in column chunks, which are split into pages (usually 1 MB). Page-level statistics (min, max, null count) enable predicate pushdown: a query like 'price > 100' can skip pages where max(price) < 100 without reading the actual data. The file footer contains the schema, row-group offsets, and metadata — readers seek to the end first to learn the structure, then read only the relevant column chunks. This design is why Parquet is so efficient on cloud storage: most queries fetch only a few MB of a multi-GB file.

Can I view nested or complex Parquet schemas (structs, maps, lists)?

Yes. Parquet supports the full Apache Arrow type system: primitives (int32, double, string, timestamp, decimal), structs (nested records), lists (variable-length arrays), maps (key-value pairs), and arbitrary combinations. This viewer flattens nested fields using dot notation in the column headers (user.address.city) and renders lists/maps as JSON in their cells. Complex schemas are common in Spark/Databricks output, Snowflake VARIANT data, and event-streaming exports. If your file uses logical types like TIMESTAMP_MILLIS, DECIMAL(18,2), or UUID, the viewer respects them and renders human-readable values. For schema-only inspection without seeing data (e.g. when auditing third-party Parquet), use 'parquet-tools schema file.parquet' or the DuckDB DESCRIBE statement: DESCRIBE SELECT * FROM 'file.parquet' LIMIT 0.

How does this tool handle large Parquet files efficiently?

Three optimizations: (1) the parquet-wasm library reads only the footer metadata first (typically <100 KB), so opening a 1 GB file takes milliseconds before any row data is requested; (2) the 'Max rows to display' control limits how many rows are decoded for preview — useful for files with millions of rows where you only need to verify schema and sample values; (3) only the columns you scroll to are eagerly decoded thanks to columnar layout. For files larger than browser memory (>500 MB on mobile, >2 GB on desktop), consider tools like DuckDB-WASM which can stream Parquet from URLs without ever loading the full file. For production analytical workloads, query Parquet directly from cloud storage using Polars, Pandas read_parquet, Spark, BigQuery external tables, or Athena — never load the file into memory just to query it.