Docs

Build with gpt-image-2

Everything you need to prompt, edit, and ship images through fal.openai/gpt-image-2 is live, fal-ai/gpt-image-1.5 stays around for comparison.

01 Overview

What gpt-image-2 is and how this site is laid out

gpt-image-2 is the next image model from OpenAI. It is expected to improve on four fronts: photorealism, text rendering inside the image, instruction following, and control over composition, style, and detail.

You will see the upgrade most clearly on jobs that mix a subject, a scene, and on-image typography. Posters, product shots, editorial spreads, diagrams, UI mocks, photo edits that hold identity, and reference-driven style work are all zones where prior-generation models tend to smear. gpt-image-2 is built to sit closer to what you asked for with less re-rolling.

This site has three jobs. It hosts a prompt library with working examples you can copy. It runs a playground so you can ship a render in under a minute. It documents the fal API so you can wire gpt-image-2 into your own product.

Every tool on the site defaults to openai/gpt-image-2 and lets you flip back to fal-ai/gpt-image-1.5 from the Model dropdown for a side-by-side. Swap the endpoint string in your own code and the API shape stays the same.

Photorealism

Skin, fabric, specular metal, rain on pavement, volumetric light. Fewer plastic faces and fewer painted hands.

Text rendering

Short quoted strings render as drawn type, not smudges. Good for posters, covers, signage, and product mocks.

Instruction following

Long prompts with constraints land closer to the spec. Negative instructions are honored more often.

Control surfaces

Style references, input fidelity on edits, aspect and quality knobs. You steer the model at each step.

02 Quick start

From zero to your first render in three steps

You need a fal account, a key, and a prompt. Two minutes if the fal tab is already open.

  1. 1. Grab a fal API key

    Head to fal.ai/dashboard/keys, create a key, and copy it. Keys start with fal-. Treat them like passwords. You can revoke and rotate them from the same page.

  2. 2. Paste it into Settings

    Open Settings, paste the key, and save. Your key lives in your browser and never leaves for a backend we control. Every render goes straight from your browser to fal.

  3. 3. Send your first prompt

    Open the playground, drop in the sample below, press run. You should see an image in about three to eight seconds depending on quality.

    A 35mm film photo of a tired golden retriever
    sleeping under a kitchen table at noon.
    Soft window light, dust in the air,
    shallow depth of field, warm color grade.
    No text, no watermark.
03 Text to image

Prompt structure, aspect, quality, and typography

A good text to image prompt names the subject, the scene, the light, the style, and any on-image text. In that order. Everything else is refinement.

The five-layer prompt

You can write this as one paragraph, one sentence per layer, or a loose list. The model reads all three. What matters is that every layer is present.

  1. Subject. What or who is in the frame, and what they are doing.
  2. Scene. Where the subject sits, and what else is visible.
  3. Light. Source, direction, hardness, color temperature.
  4. Style. Photograph, illustration, 3D render, poster, oil.
  5. Typography and negatives. Any text on the image, and what you do not want.
A brass pocket watch lying on wet cobblestones at dusk.
Blue hour, soft rain, reflections on the stones,
35mm film photograph, shallow depth of field.
Text reading "TIMEKEEPERS" engraved on the case.
No watermark, no caption bar.

Aspect ratio choices

The model supports three canonical sizes plus auto. Pick by final use, not by taste.

1024x1024

Square. Profile shots, album covers, social tiles, product on a plate.

1536x1024

Landscape. Hero art, banners, cinematic frames, cover images.

1024x1536

Portrait. Posters, book covers, phone wallpaper, fashion stills.

auto

Let the model choose based on prompt cues. Safe default when you do not know the crop yet.

Quality knob

Quality is a cost and latency dial. Use low to iterate, bump to medium once the prompt holds, and send high for the final. You will usually want to lock the seed before the final bump so the image upgrades instead of re-rolling.

Typography tips

  • Put the exact text in quotes. Short strings render cleaner than paragraphs.
  • Call out the font family if it matters. Sans serif, bold condensed, uppercase, hand-lettered. These reach.
  • State the position. Top center, bottom third, on the product, on a sign behind the subject.
  • If text is not the point, say no text, no captions, no watermarks at the end.

Realism tips

  • Name a camera or film. 35mm film photograph, iPhone snapshot, medium format still life. The model picks grain and compression from the reference.
  • Lock the light. Rim light from behind, overcast north window, single tungsten bulb overhead. Avoid vague words like dramatic.
  • Leave room for imperfection. Dust, skin texture, slight vignetting, tiny motion blur. These are the tells that make an image read as real.
04 Image editing

Edits that change one thing and hold everything else

Editing is not regenerating. Good edits name the change and name what must stay the same. The preserve list does most of the work.

The preserve-rules pattern

Split an edit prompt into two halves. The first half is what changes. The second half is what you pin. Pinning matters because any region you do not mention is fair game for the model to rework.

Change: replace the sky with a soft pink sunset.
Preserve: the subject's face, pose, hair, clothing,
the exact position of the hands, the foreground rocks,
the camera angle, and the overall exposure on the subject.
Do not alter identity. Do not change the focal length.

Input fidelity

The edit endpoint takes an input_fidelity parameter that controls how closely the output matches the source. Three settings:

  • high. Stick as close as possible. Good for logo swaps, text changes, small object insertions.
  • low. Let the model restyle the frame. Good for relighting, medium changes, mood shifts.
  • auto. Let the model pick based on the prompt. Safe default.

An end-to-end example

You have a product photo on a white background. You want the same bottle on wet cobblestones at dusk, same angle, same label, same glass.

Edit this product photo.
Change: place the bottle on wet cobblestones
at dusk with soft blue hour light and faint
reflections on the stones.
Preserve: the bottle's exact silhouette, the
label text and artwork, the glass color, the
specular highlights on the neck, the camera
angle, and the relative scale.
Input fidelity: high.
No new logos, no added text, no watermark.
05 Style reference

Lock a look with one or more reference images

Style reference lets you hand the model a visual mood board. The prompt describes the subject. The reference decides the vibe.

What a good reference looks like

  • Consistent palette. If your reference has five colors the output will favor those five.
  • One dominant texture. Grain, halftone, oil impasto, flat gouache, glossy 3D render. Mix textures only if you know you want a collage.
  • Clear lighting signature. A single specular hit or a broad soft wrap will read across generations better than a chaotic scene.
  • Clean edges. Avoid references with heavy watermarks, screenshots with UI chrome, or busy collages.

Blending multiple references

You can pass more than one reference. Two is usually the sweet spot. One for palette and one for texture, or one for pose and one for light. Above three you tend to get muddy output. If you want a specific blend weight, write it into the prompt: match reference 1 for palette, reference 2 for brushwork.

Write the prompt like the reference does not exist

Counterintuitive but reliable. Describe the subject and the scene in plain language, then let the reference pull the style. The common mistake is over-describing the style in the prompt, which fights the reference and produces a blended mess.

Prompt: a small red fox drinking from a forest stream at dawn.

Reference: a moody low-key oil painting with cool shadows
and a single warm rim light.

Expected output: the fox and stream in that painterly,
low-key treatment, with the reference's color discipline
carried cleanly.
06 Image to image

When to regenerate from a reference instead of editing

Image to image takes a source image, applies a transform, and returns a new composition. Editing preserves. Image to image remakes.

How to decide between edit and image to image

Use edit when
  • Identity must carry across frames.
  • The layout is already right.
  • You are changing one or two elements.
  • You want to keep the exact camera and pose.
Use image to image when
  • You are restyling or re-staging.
  • Pose, crop, or camera can change.
  • You want a loose interpretation of the source.
  • You are iterating toward a new concept.

Strength semantics

Strength is how much the model is allowed to diverge from the source. Low strength hugs the input. High strength treats the input as a rough hint. A rough mental model:

  • 0.2 to 0.35. Color grade, atmosphere, and small retouch work. The subject survives.
  • 0.4 to 0.6. Restyle. The composition carries but the execution changes.
  • 0.65 to 0.85. Re-interpretation. Only the rough layout remains.
  • 0.9 and above. Near text to image. You barely see the source.
07 Prompt engineering

Hook first, then expand. Deadpan beats dramatic.

The best prompts on this site follow a flowersslop-inspired pattern. You write a short punchy hook, then an expanded production prompt that keeps the tone but spells out the mechanics.

The hook-first pattern

Think of the hook as the line you would tweet. One clause. Memorable. Slightly uncomfortable or deadpan funny. The expanded prompt below it is the stage direction.

Hook:
A random photo that makes you feel pain.

Expanded:
Create an accidental smartphone photo of a bare heel
about to come down on a tiny sharp-edged plastic toy
brick on a wooden floor at night. Harsh flash, crooked
framing, realistic skin, dust on the floor, slight motion
blur, zero stylization. No blood, no injury, no text.

Plausible impossible

The strongest results sit at a specific angle of the prompt space: plausible at a glance, impossible on a second look. A CCTV still of a house cat applying to college. A waterproof disposable camera photo of a diver meeting a jellyfish the size of a van. Hold the realism knob all the way up, even when the subject is ridiculous. The mismatch is the joke.

Casual realism

Phone realism, overhead fluorescent light, slightly crooked framing, one item out of focus. These cues sell the scene more than perfect composition ever will. You are aiming for the I took this on my way to the fridge look.

Deadpan concepts

Describe the absurd as if it were a municipal notice. Flat tone, no exclamation marks, no adjectives reaching for laughs. The model will not soften the image. The tone does the work.

Negative constraints

Sometimes it is easier to tell the model what not to do. End the prompt with a short comma-separated list.

No text, no watermark, no caption bar,
no cinematic color grade, no blood,
no staged studio light.

When not to overprompt

If you can describe the picture in two sentences, do that. The model answers short prompts well. Pile on rules when the output keeps drifting. The sweet spot is usually forty to one hundred and twenty words. Above that you are fighting yourself.

08 API reference

Calling the model from your own code

The fal client handles auth, submission, and polling. Below are full working snippets for TypeScript and Python, followed by a parameter reference for the text to image and edit endpoints.

Install and configure

# Node (pnpm, yarn, or npm)
pnpm add @fal-ai/client

# Python
pip install fal-client

TypeScript, text to image

import { fal } from "@fal-ai/client";

fal.config({ credentials: process.env.FAL_KEY });

const r = await fal.subscribe("fal-ai/gpt-image-1.5", {
  input: {
    prompt:
      "A 35mm film photo of a brass pocket watch on wet cobblestones at dusk, blue hour, soft rain, no text.",
    image_size: "1024x1024",
    quality: "high",
    num_images: 1,
    output_format: "png",
    background: "auto",
  },
  logs: true,
});

console.log(r.data.images[0].url);

Python, text to image

import os
import fal_client

os.environ["FAL_KEY"] = os.environ["FAL_KEY"]

result = fal_client.subscribe(
    "fal-ai/gpt-image-1.5",
    arguments={
        "prompt": "A 35mm film photo of a brass pocket watch on wet cobblestones at dusk, blue hour, soft rain, no text.",
        "image_size": "1024x1024",
        "quality": "high",
        "num_images": 1,
        "output_format": "png",
        "background": "auto",
    },
    with_logs=True,
)

print(result["images"][0]["url"])

TypeScript, image editing

import { fal } from "@fal-ai/client";

fal.config({ credentials: process.env.FAL_KEY });

const r = await fal.subscribe("fal-ai/gpt-image-1.5/edit", {
  input: {
    prompt:
      "Change the sky to a soft pink sunset. Preserve the subject, pose, and foreground rocks.",
    image_urls: ["https://example.com/source.png"],
    input_fidelity: "high",
    image_size: "1536x1024",
    quality: "high",
    num_images: 1,
    output_format: "png",
  },
  logs: true,
});

console.log(r.data.images[0].url);

Queue submit, for long running jobs

Use fal.queue.submit when you want to fire a job and poll for results from another process, for example from a worker or a webhook handler.

import { fal } from "@fal-ai/client";

fal.config({ credentials: process.env.FAL_KEY });

const { request_id } = await fal.queue.submit("fal-ai/gpt-image-1.5", {
  input: { prompt: "A moody low-key still life of citrus on a slate plate." },
});

const status = await fal.queue.status("fal-ai/gpt-image-1.5", {
  requestId: request_id,
  logs: true,
});

if (status.status === "COMPLETED") {
  const r = await fal.queue.result("fal-ai/gpt-image-1.5", {
    requestId: request_id,
  });
  console.log(r.data.images[0].url);
}

Text to image parameters

ParameterTypeDescription
promptreqstringThe full text prompt. No length cap you will hit in practice, but forty to one hundred and twenty words is the sweet spot.
image_size1024x1024 | 1536x1024 | 1024x1536 | autoOutput dimensions. Auto lets the model pick based on the prompt.
qualitylow | medium | high | autoCost and latency dial. Iterate on low, finalize on high.
num_imagesintegerHow many images to generate per request. Useful for side by side picks.
output_formatpng | jpegPNG preserves alpha and hard edges. JPEG is smaller for photos.
backgroundauto | transparent | opaqueTransparent works on subjects you plan to composite. Requires PNG output.

Edit parameters

The edit endpoint accepts everything from the text to image endpoint plus two more fields.

ParameterTypeDescription
image_urlsreqstring[]One or more URLs to source images. You can also upload via fal.storage and pass the returned URL.
input_fidelitylow | high | autoHow tightly the output hugs the source. High for logo and text swaps, low for restyles.
09 Pricing

You pay fal directly, per image

Every render is billed to the fal key you provided in Settings. This site does not markup or proxy anything.

Image generation on fal is priced per image and varies by endpoint, resolution, and quality. The current OpenAI image endpoint on fal follows the standard pricing table. When gpt-image-2 launches, fal publishes its own row on the same page.

For the exact, current numbers go to fal.ai/pricing. We do not list cents in these docs because the authoritative price lives there and nowhere else.

Two rules of thumb for budget planning. First, quality high typically costs a few times what low costs, so iterate cheap and finalize expensive. Second, landscape and portrait sizes are often the same rate as the square size, but check the pricing page if you are running large volume.

10 Rate limits and errors

Status codes you will actually see

Three error codes cover more than ninety percent of what goes wrong. Here is how to recognize each and what to do about it.

401Unauthorized

Your fal key is missing, malformed, or revoked. Re-paste it in Settings or generate a new one at fal.ai/dashboard/keys.

422Unprocessable Entity

The request shape is wrong. The most common causes: an unsupported image_size value, a prompt field left empty, or a content policy rejection. Read the detail field on the response, it names the offending field.

429Too Many Requests

You crossed the burst or sustained rate limit on your fal account. Back off with exponential delay, or move to fal.queue.submit and let the queue absorb the spike.

5xxServer errors

Transient and retryable. Retry once after a short delay. If it persists, check status.fal.ai before digging further.

11 Best practices

Batching, retries, caching, and the cost dial

These tips save money and wall-clock time in production. None of them require a rewrite of your prompt flow.

Batch with num_images

If you need four variations of the same prompt, pass num_images: 4. One request, one round trip. Cheaper and faster than four separate calls.

Iterate cheap, finalize expensive

Lock the prompt on quality: low or medium. When the composition is right, bump to high for the one you ship.

Cache prompt lookups

If your product shows the same image across users, cache the rendered URL by a hash of the prompt plus parameters. You will see cache hit rates over ninety percent on popular prompts.

Respect the back pressure

On user-triggered jobs, cap concurrent in-flight requests per user. Three is a good starting number. Beyond that you are paying to sit in the queue.

Store the prompt with the image

Every time you persist a render, persist the full prompt, seed if available, and parameters. Future you will thank you when you need to re-generate with a tweak.

Log the request id

fal returns a request id on every job. Log it. It is the fastest way to debug a user-reported issue against the fal dashboard.

12 FAQ

The questions that come up the most

How do I call gpt-image-2 on fal?
The endpoint slug is openai/gpt-image-2 for text-to-image and openai/gpt-image-2/edit for edits. Both take the same schema the previous generation used, plus a few new knobs.
Is this site run by OpenAI or by fal?
No. This is an independent prompt library and docs site. We run on top of fal. We are not affiliated with OpenAI.
Do I need to be a developer to use the playground?
No. Paste a fal key in Settings, open the playground, type a prompt, press run. The rest of these docs assume you want to go beyond that into your own app, but the playground is a full first destination.
Where is my fal key stored?
In your browser, locally. We do not store your key on our servers. Every request you make from this site goes straight to fal from your browser with your key in the header. Clearing site data removes it.
Can I use these prompts commercially?
Generated images are subject to fal and OpenAI terms. Check the fal terms page. The prompt text itself in our library is free for you to copy and adapt.
How many images can I run per minute?
Depends on your fal plan. Free and low tiers have burst limits in the low double digits per minute. Higher tiers go well beyond that. See fal.ai/pricing for the exact quotas.
Can I pass a reference image in the playground?
Yes. Use the Style reference tool for visual style, or the Image editing tool when you want to preserve the source.
Why does the model sometimes ignore part of my prompt?
Usually the prompt has too many competing instructions, or two of them contradict each other. Try dropping the least important third of your prompt. You will often find the remaining two thirds lands perfectly.
Can I get transparent backgrounds?
Yes. Set background: transparent and output_format: png. Works best on subjects with clear edges and no background interaction in the prompt.
What size should I generate for social posts?
For feed, 1024x1024 is safe and reads well on every platform. For stories and vertical video, use 1024x1536. For hero banners on a blog or landing page, 1536x1024 holds up after resize.
Can I use the same prompt across different models?
Usually yes, with minor tweaks. Most of the subject-scene-light-style-negatives structure translates well. Typography and text rendering is the area where models differ the most.
How do I report a bad output?
File an issue on the site repo, or email us. Include the prompt, the request id if you have it, and what you expected versus what you got. Pattern reports help us tune the guides.
Is there a safe-for-work filter?
Yes, run by fal and the underlying model. Requests that trip the filter return a 422 with a content policy detail. Rewrite the prompt and try again.
Where can I see example prompts I can just copy?
Start with the prompt library on the home page and the trending page for what the community is running today.
What next

Two ways to keep building

Grab a prompt from the library and ship a render in the playground, or wire openai/gpt-image-2 into your own product with the API reference.