Image to Prompt Generator
100% in-browser, image never uploads. Reverse any image into a structured prompt for Midjourney v7, Flux, SDXL, ComfyUI and DALL-E 3 with palette and negative.
About the Image to Prompt Generator
Reverse-engineering a great AI image prompt usually takes 10-30 minutes of trial and error: extract dominant colors by hand, eyeball the lighting and mood, find the right Midjourney parameter syntax, write a neat negative prompt, then paste-iterate. This tool does the mechanical work in under a second: drop a reference photo or AI-generated image, the browser extracts the dominant color palette via k-means clustering on a downsampled grid, measures mean luminance, detects the aspect ratio (with snap to 1:1, 16:9, 9:16, 4:3, 3:2, 21:9), and assembles a structured prompt in the exact dialect your target engine expects (Midjourney 6 / v7 --ar --v --stylize, SDXL weighted tags, Flux natural sentences, ComfyUI JSON with sampler/scheduler, or DALL-E 3 plain English).
Layer in style, lighting, mood and camera chips with one click each and the prompt rewrites in real time. Everything runs in your browser — image never uploads, palette extraction is instant, no AI captioning model required.
Is my image private and does the tool work offline?
Yes to both. The entire pipeline — file reading, palette extraction, aspect detection, brightness, prompt assembly and JSON export — runs 100% in your browser via standard JavaScript and Canvas APIs. The image bytes never reach our servers, a CDN, or any third-party AI API. You can disconnect your network after the page loads and the tool keeps working. For commercial photo work, NDA reference boards, or unreleased product imagery this is safe to use. Important honesty note: the palette and brightness are heuristic estimates (k-means color clustering and weighted Rec.709 luminance), not semantic recognition — the tool does not 'understand' the subject the way a vision model would, so write or chip-pick the subject yourself for best results.
Can I export the prompt, palette and settings as JSON for my pipeline?
Yes — that's the Prompt pack (JSON) block. One click copies or downloads a structured bundle containing the positive prompt, the negative prompt, the source dimensions, the detected and snapped aspect ratio, the frequency-weighted whole-image brightness, the full dominant palette (hex + nearest color name + frequency %), every selected style/lighting/mood/camera chip, the target engine, and the ComfyUI sampler/scheduler/steps/CFG defaults. Because it is plain parseable JSON you can version-control it, diff two runs, feed it into a ComfyUI node or an automation script, and re-derive identical prompts later — reproducibility the copy-the-textarea workflow loses. The ComfyUI engine export now also embeds the negative prompt directly, so it drops complete into both CLIP-Text-Encode nodes with no hand-merge.
Why a heuristic prompt builder instead of CLIP / BLIP image captioning?
True image-to-text captioning needs a 200-700 MB neural model (BLIP-2, BLIP-3, LLaVA, MoonDream) loaded into the browser via transformers.js, plus a WebGPU-capable device, plus 10-40 seconds of first-load and 2-5 seconds per image. That is technically possible and we may ship it as an opt-in Web Worker upgrade, but in practice 80% of the prompt quality comes from accurate aspect ratio + palette + style/lighting/mood/camera tags, all of which we extract in under 100ms with zero model download. The chip palette lets you add the high-information words a vision model would have guessed, and you usually know your subject better than CLIP does anyway.
How are the dominant colors extracted?
Standard k-means clustering with k=5. We downsample the image to a 64-pixel-wide grid (so 64×36 to 64×85 pixels depending on aspect), drop fully transparent pixels, then iteratively cluster the remaining RGB triples into 5 groups for 8 rounds. The centroid of each cluster is the dominant color and the cluster size is its frequency. We snap each centroid to the nearest named color (red, orange, yellow, green, teal, blue, purple, pink, brown, black, white, gray, beige) for the prompt and show the raw hex value in the swatch row. The whole pass is one rAF tick on a modern phone.
Why does the prompt format change by engine?
Each text-to-image system has its own syntax that material affects quality. Midjourney v6/v7 uses parameter flags (--ar 16:9 --v 6 --style raw --stylize 250) and treats commas as soft separators. SDXL and SD 1.5 respond to weighted parens (masterpiece:1.2) and prefer comma-separated tags. Flux Dev/Pro is trained on natural-language captions and prefers full sentences with periods, not tags. ComfyUI is a node graph — we export a JSON snippet that drops into the CLIP-Text-Encode node with sensible default sampler (dpmpp_2m), scheduler (karras), steps (28) and CFG (4.5). DALL-E 3 prefers plain conversational English. Pick the engine selector before building and you skip the syntax-translation step.

What does the brightness measurement tell me?
Frequency-weighted mean luminance using the Rec.709 formula (0.2126·R + 0.7152·G + 0.0722·B) averaged across all dominant color clusters by their pixel count — so it reflects whole-image brightness, not just the single most-dominant swatch (a dark background behind a bright subject no longer mislabels the image). Below 60 it labels 'low-key / dark' (Caravaggio, film noir, horror). 60-110 'moody'. 110-160 'balanced' (typical daylight). 160-200 'bright' (clean product photography, beach). Above 200 'high-key / overexposed' (fashion editorials, wedding). If you have not picked a lighting chip, the tool auto-adds a real lighting phrase mapped from this bucket (e.g. 'low-key dramatic lighting', 'soft natural daylight', 'bright high-key lighting') — a usable cue, not a bare label — which you can overwrite with a specific chip like 'golden hour' for stronger steering.
Why does aspect ratio matter so much in prompts?
Diffusion models bake aspect ratio into their training: a 9:16 prompt gets phone-portrait composition (single subject, tight head shot, background falls off), 16:9 gets cinematic landscape (wide subject, environmental detail, distant horizon), 1:1 gets centered product shots, 21:9 gets extreme cinemascope. Sending a 1:1 prompt to a 9:16 sampler at default 512×512 can produce stretched faces or cropped subjects. The tool auto-detects the aspect of your reference image and snaps to the nearest standard ratio supported by your target engine; override the snap if you want to recompose.
Can I use this for upscaling or img2img workflows?
Indirectly. The generated prompt is the textual input for an img2img run: take this prompt, send it alongside your reference image to Midjourney with --iw (image weight), or to SDXL/Flux with the same image as init_image at 0.4-0.7 denoise strength. The palette extraction is especially useful for upscaling — pasting the palette into the prompt during a tile-upscale pass prevents the upscaler from color-drifting toward generic warm tones. For ComfyUI img2img workflows, the JSON snippet drops directly into the CLIP-Text-Encode node connected to your VAEEncodeForInpaint stack.
Does the tool support EXIF camera and lens metadata?
Not yet in this release — EXIF parsing is a stretch goal. When added, the tool will extract focal length (auto-suggesting '85mm portrait' or '24mm wide' camera chip), aperture (suggesting 'shallow depth of field' for f/1.4-f/2.8 or 'deep focus' for f/8+), ISO (suggesting 'film grain' for ISO 1600+), and camera make/model (some prompts respond to 'shot on Hasselblad' or 'Leica Q3' as a quality booster). For now you can read EXIF in your camera app or any EXIF viewer tool on this site, then click the matching chip manually.
Is my image uploaded anywhere?
No. The entire pipeline — file reading, palette extraction, aspect detection, brightness calculation, prompt assembly — runs in your browser via standard JavaScript and Canvas APIs. The image bytes never reach our servers, never reach a CDN, never reach a third-party AI API. You can disconnect your network after the page loads and the tool keeps working. For commercial photo work, NDA reference boards, or unreleased product imagery, this tool is safe to use. The only network call after page load is standard page analytics (respects do-not-track).
