Build with gpt-image-2
Everything you need to prompt, edit, and ship images through fal.openai/gpt-image-2 is live, fal-ai/gpt-image-1.5 stays around for comparison.
What gpt-image-2 is and how this site is laid out
gpt-image-2 is the next image model from OpenAI. It is expected to improve on four fronts: photorealism, text rendering inside the image, instruction following, and control over composition, style, and detail.
You will see the upgrade most clearly on jobs that mix a subject, a scene, and on-image typography. Posters, product shots, editorial spreads, diagrams, UI mocks, photo edits that hold identity, and reference-driven style work are all zones where prior-generation models tend to smear. gpt-image-2 is built to sit closer to what you asked for with less re-rolling.
This site has three jobs. It hosts a prompt library with working examples you can copy. It runs a playground so you can ship a render in under a minute. It documents the fal API so you can wire gpt-image-2 into your own product.
Every tool on the site defaults to openai/gpt-image-2 and lets you flip back to fal-ai/gpt-image-1.5 from the Model dropdown for a side-by-side. Swap the endpoint string in your own code and the API shape stays the same.
Skin, fabric, specular metal, rain on pavement, volumetric light. Fewer plastic faces and fewer painted hands.
Short quoted strings render as drawn type, not smudges. Good for posters, covers, signage, and product mocks.
Long prompts with constraints land closer to the spec. Negative instructions are honored more often.
Style references, input fidelity on edits, aspect and quality knobs. You steer the model at each step.
From zero to your first render in three steps
You need a fal account, a key, and a prompt. Two minutes if the fal tab is already open.
- 1. Grab a fal API key
Head to fal.ai/dashboard/keys, create a key, and copy it. Keys start with
fal-. Treat them like passwords. You can revoke and rotate them from the same page. - 2. Paste it into Settings
Open Settings, paste the key, and save. Your key lives in your browser and never leaves for a backend we control. Every render goes straight from your browser to fal.
- 3. Send your first prompt
Open the playground, drop in the sample below, press run. You should see an image in about three to eight seconds depending on quality.
A 35mm film photo of a tired golden retriever sleeping under a kitchen table at noon. Soft window light, dust in the air, shallow depth of field, warm color grade. No text, no watermark.
Prompt structure, aspect, quality, and typography
A good text to image prompt names the subject, the scene, the light, the style, and any on-image text. In that order. Everything else is refinement.
The five-layer prompt
You can write this as one paragraph, one sentence per layer, or a loose list. The model reads all three. What matters is that every layer is present.
- Subject. What or who is in the frame, and what they are doing.
- Scene. Where the subject sits, and what else is visible.
- Light. Source, direction, hardness, color temperature.
- Style. Photograph, illustration, 3D render, poster, oil.
- Typography and negatives. Any text on the image, and what you do not want.
A brass pocket watch lying on wet cobblestones at dusk. Blue hour, soft rain, reflections on the stones, 35mm film photograph, shallow depth of field. Text reading "TIMEKEEPERS" engraved on the case. No watermark, no caption bar.
Aspect ratio choices
The model supports three canonical sizes plus auto. Pick by final use, not by taste.
Square. Profile shots, album covers, social tiles, product on a plate.
Landscape. Hero art, banners, cinematic frames, cover images.
Portrait. Posters, book covers, phone wallpaper, fashion stills.
Let the model choose based on prompt cues. Safe default when you do not know the crop yet.
Quality knob
Quality is a cost and latency dial. Use low to iterate, bump to medium once the prompt holds, and send high for the final. You will usually want to lock the seed before the final bump so the image upgrades instead of re-rolling.
Typography tips
- Put the exact text in quotes. Short strings render cleaner than paragraphs.
- Call out the font family if it matters. Sans serif, bold condensed, uppercase, hand-lettered. These reach.
- State the position. Top center, bottom third, on the product, on a sign behind the subject.
- If text is not the point, say
no text, no captions, no watermarksat the end.
Realism tips
- Name a camera or film. 35mm film photograph, iPhone snapshot, medium format still life. The model picks grain and compression from the reference.
- Lock the light. Rim light from behind, overcast north window, single tungsten bulb overhead. Avoid vague words like
dramatic. - Leave room for imperfection. Dust, skin texture, slight vignetting, tiny motion blur. These are the tells that make an image read as real.
Edits that change one thing and hold everything else
Editing is not regenerating. Good edits name the change and name what must stay the same. The preserve list does most of the work.
The preserve-rules pattern
Split an edit prompt into two halves. The first half is what changes. The second half is what you pin. Pinning matters because any region you do not mention is fair game for the model to rework.
Change: replace the sky with a soft pink sunset. Preserve: the subject's face, pose, hair, clothing, the exact position of the hands, the foreground rocks, the camera angle, and the overall exposure on the subject. Do not alter identity. Do not change the focal length.
Input fidelity
The edit endpoint takes an input_fidelity parameter that controls how closely the output matches the source. Three settings:
high. Stick as close as possible. Good for logo swaps, text changes, small object insertions.low. Let the model restyle the frame. Good for relighting, medium changes, mood shifts.auto. Let the model pick based on the prompt. Safe default.
An end-to-end example
You have a product photo on a white background. You want the same bottle on wet cobblestones at dusk, same angle, same label, same glass.
Edit this product photo. Change: place the bottle on wet cobblestones at dusk with soft blue hour light and faint reflections on the stones. Preserve: the bottle's exact silhouette, the label text and artwork, the glass color, the specular highlights on the neck, the camera angle, and the relative scale. Input fidelity: high. No new logos, no added text, no watermark.
Lock a look with one or more reference images
Style reference lets you hand the model a visual mood board. The prompt describes the subject. The reference decides the vibe.
What a good reference looks like
- Consistent palette. If your reference has five colors the output will favor those five.
- One dominant texture. Grain, halftone, oil impasto, flat gouache, glossy 3D render. Mix textures only if you know you want a collage.
- Clear lighting signature. A single specular hit or a broad soft wrap will read across generations better than a chaotic scene.
- Clean edges. Avoid references with heavy watermarks, screenshots with UI chrome, or busy collages.
Blending multiple references
You can pass more than one reference. Two is usually the sweet spot. One for palette and one for texture, or one for pose and one for light. Above three you tend to get muddy output. If you want a specific blend weight, write it into the prompt: match reference 1 for palette, reference 2 for brushwork.
Write the prompt like the reference does not exist
Counterintuitive but reliable. Describe the subject and the scene in plain language, then let the reference pull the style. The common mistake is over-describing the style in the prompt, which fights the reference and produces a blended mess.
Prompt: a small red fox drinking from a forest stream at dawn. Reference: a moody low-key oil painting with cool shadows and a single warm rim light. Expected output: the fox and stream in that painterly, low-key treatment, with the reference's color discipline carried cleanly.
When to regenerate from a reference instead of editing
Image to image takes a source image, applies a transform, and returns a new composition. Editing preserves. Image to image remakes.
How to decide between edit and image to image
- Identity must carry across frames.
- The layout is already right.
- You are changing one or two elements.
- You want to keep the exact camera and pose.
- You are restyling or re-staging.
- Pose, crop, or camera can change.
- You want a loose interpretation of the source.
- You are iterating toward a new concept.
Strength semantics
Strength is how much the model is allowed to diverge from the source. Low strength hugs the input. High strength treats the input as a rough hint. A rough mental model:
0.2 to 0.35. Color grade, atmosphere, and small retouch work. The subject survives.0.4 to 0.6. Restyle. The composition carries but the execution changes.0.65 to 0.85. Re-interpretation. Only the rough layout remains.0.9 and above. Near text to image. You barely see the source.
Hook first, then expand. Deadpan beats dramatic.
The best prompts on this site follow a flowersslop-inspired pattern. You write a short punchy hook, then an expanded production prompt that keeps the tone but spells out the mechanics.
The hook-first pattern
Think of the hook as the line you would tweet. One clause. Memorable. Slightly uncomfortable or deadpan funny. The expanded prompt below it is the stage direction.
Hook: A random photo that makes you feel pain. Expanded: Create an accidental smartphone photo of a bare heel about to come down on a tiny sharp-edged plastic toy brick on a wooden floor at night. Harsh flash, crooked framing, realistic skin, dust on the floor, slight motion blur, zero stylization. No blood, no injury, no text.
Plausible impossible
The strongest results sit at a specific angle of the prompt space: plausible at a glance, impossible on a second look. A CCTV still of a house cat applying to college. A waterproof disposable camera photo of a diver meeting a jellyfish the size of a van. Hold the realism knob all the way up, even when the subject is ridiculous. The mismatch is the joke.
Casual realism
Phone realism, overhead fluorescent light, slightly crooked framing, one item out of focus. These cues sell the scene more than perfect composition ever will. You are aiming for the I took this on my way to the fridge look.
Deadpan concepts
Describe the absurd as if it were a municipal notice. Flat tone, no exclamation marks, no adjectives reaching for laughs. The model will not soften the image. The tone does the work.
Negative constraints
Sometimes it is easier to tell the model what not to do. End the prompt with a short comma-separated list.
No text, no watermark, no caption bar, no cinematic color grade, no blood, no staged studio light.
When not to overprompt
If you can describe the picture in two sentences, do that. The model answers short prompts well. Pile on rules when the output keeps drifting. The sweet spot is usually forty to one hundred and twenty words. Above that you are fighting yourself.
Calling the model from your own code
The fal client handles auth, submission, and polling. Below are full working snippets for TypeScript and Python, followed by a parameter reference for the text to image and edit endpoints.
Install and configure
# Node (pnpm, yarn, or npm) pnpm add @fal-ai/client # Python pip install fal-client
TypeScript, text to image
import { fal } from "@fal-ai/client";
fal.config({ credentials: process.env.FAL_KEY });
const r = await fal.subscribe("fal-ai/gpt-image-1.5", {
input: {
prompt:
"A 35mm film photo of a brass pocket watch on wet cobblestones at dusk, blue hour, soft rain, no text.",
image_size: "1024x1024",
quality: "high",
num_images: 1,
output_format: "png",
background: "auto",
},
logs: true,
});
console.log(r.data.images[0].url);Python, text to image
import os
import fal_client
os.environ["FAL_KEY"] = os.environ["FAL_KEY"]
result = fal_client.subscribe(
"fal-ai/gpt-image-1.5",
arguments={
"prompt": "A 35mm film photo of a brass pocket watch on wet cobblestones at dusk, blue hour, soft rain, no text.",
"image_size": "1024x1024",
"quality": "high",
"num_images": 1,
"output_format": "png",
"background": "auto",
},
with_logs=True,
)
print(result["images"][0]["url"])TypeScript, image editing
import { fal } from "@fal-ai/client";
fal.config({ credentials: process.env.FAL_KEY });
const r = await fal.subscribe("fal-ai/gpt-image-1.5/edit", {
input: {
prompt:
"Change the sky to a soft pink sunset. Preserve the subject, pose, and foreground rocks.",
image_urls: ["https://example.com/source.png"],
input_fidelity: "high",
image_size: "1536x1024",
quality: "high",
num_images: 1,
output_format: "png",
},
logs: true,
});
console.log(r.data.images[0].url);Queue submit, for long running jobs
Use fal.queue.submit when you want to fire a job and poll for results from another process, for example from a worker or a webhook handler.
import { fal } from "@fal-ai/client";
fal.config({ credentials: process.env.FAL_KEY });
const { request_id } = await fal.queue.submit("fal-ai/gpt-image-1.5", {
input: { prompt: "A moody low-key still life of citrus on a slate plate." },
});
const status = await fal.queue.status("fal-ai/gpt-image-1.5", {
requestId: request_id,
logs: true,
});
if (status.status === "COMPLETED") {
const r = await fal.queue.result("fal-ai/gpt-image-1.5", {
requestId: request_id,
});
console.log(r.data.images[0].url);
}Text to image parameters
| Parameter | Type | Description |
|---|---|---|
| promptreq | string | The full text prompt. No length cap you will hit in practice, but forty to one hundred and twenty words is the sweet spot. |
| image_size | 1024x1024 | 1536x1024 | 1024x1536 | auto | Output dimensions. Auto lets the model pick based on the prompt. |
| quality | low | medium | high | auto | Cost and latency dial. Iterate on low, finalize on high. |
| num_images | integer | How many images to generate per request. Useful for side by side picks. |
| output_format | png | jpeg | PNG preserves alpha and hard edges. JPEG is smaller for photos. |
| background | auto | transparent | opaque | Transparent works on subjects you plan to composite. Requires PNG output. |
Edit parameters
The edit endpoint accepts everything from the text to image endpoint plus two more fields.
| Parameter | Type | Description |
|---|---|---|
| image_urlsreq | string[] | One or more URLs to source images. You can also upload via fal.storage and pass the returned URL. |
| input_fidelity | low | high | auto | How tightly the output hugs the source. High for logo and text swaps, low for restyles. |
You pay fal directly, per image
Every render is billed to the fal key you provided in Settings. This site does not markup or proxy anything.
Image generation on fal is priced per image and varies by endpoint, resolution, and quality. The current OpenAI image endpoint on fal follows the standard pricing table. When gpt-image-2 launches, fal publishes its own row on the same page.
For the exact, current numbers go to fal.ai/pricing. We do not list cents in these docs because the authoritative price lives there and nowhere else.
Two rules of thumb for budget planning. First, quality high typically costs a few times what low costs, so iterate cheap and finalize expensive. Second, landscape and portrait sizes are often the same rate as the square size, but check the pricing page if you are running large volume.
Status codes you will actually see
Three error codes cover more than ninety percent of what goes wrong. Here is how to recognize each and what to do about it.
Your fal key is missing, malformed, or revoked. Re-paste it in Settings or generate a new one at fal.ai/dashboard/keys.
The request shape is wrong. The most common causes: an unsupported image_size value, a prompt field left empty, or a content policy rejection. Read the detail field on the response, it names the offending field.
You crossed the burst or sustained rate limit on your fal account. Back off with exponential delay, or move to fal.queue.submit and let the queue absorb the spike.
Transient and retryable. Retry once after a short delay. If it persists, check status.fal.ai before digging further.
Batching, retries, caching, and the cost dial
These tips save money and wall-clock time in production. None of them require a rewrite of your prompt flow.
If you need four variations of the same prompt, pass num_images: 4. One request, one round trip. Cheaper and faster than four separate calls.
Lock the prompt on quality: low or medium. When the composition is right, bump to high for the one you ship.
If your product shows the same image across users, cache the rendered URL by a hash of the prompt plus parameters. You will see cache hit rates over ninety percent on popular prompts.
On user-triggered jobs, cap concurrent in-flight requests per user. Three is a good starting number. Beyond that you are paying to sit in the queue.
Every time you persist a render, persist the full prompt, seed if available, and parameters. Future you will thank you when you need to re-generate with a tweak.
fal returns a request id on every job. Log it. It is the fastest way to debug a user-reported issue against the fal dashboard.
The questions that come up the most
How do I call gpt-image-2 on fal?
openai/gpt-image-2 for text-to-image and openai/gpt-image-2/edit for edits. Both take the same schema the previous generation used, plus a few new knobs.Is this site run by OpenAI or by fal?
Do I need to be a developer to use the playground?
Where is my fal key stored?
Can I use these prompts commercially?
How many images can I run per minute?
Can I pass a reference image in the playground?
Why does the model sometimes ignore part of my prompt?
Can I get transparent backgrounds?
background: transparent and output_format: png. Works best on subjects with clear edges and no background interaction in the prompt.What size should I generate for social posts?
Can I use the same prompt across different models?
How do I report a bad output?
Is there a safe-for-work filter?
422 with a content policy detail. Rewrite the prompt and try again.Where can I see example prompts I can just copy?
Two ways to keep building
Grab a prompt from the library and ship a render in the playground, or wire openai/gpt-image-2 into your own product with the API reference.