# Recipe D — Vision-critic refinement loop

# Recipe D — Vision-critic refinement loop

**Outcome:** After N rounds, a final image that's been iteratively refined
based on per-round critiques from a vision-capable LLM. Useful when one-shot
prompts don't capture intent.

**Approx cost:** ~$0.05 × N (per round: image + LLM critique).

## Script

```bash
#!/bin/bash
set -euo pipefail

PROMPT="${1:-a cyberpunk cat reading a newspaper}"
N="${2:-3}"

# Round 0: initial generation
CURRENT_IMG=$(visa-cli generate image "$PROMPT" --json --yes \
    | jq -r '.urls[0] // .filePath')
echo "Round 0: $CURRENT_IMG"

for i in $(seq 1 "$N"); do
  # Vision LLM critique using --from-stdin to bind the prior URL
  CRITIQUE=$(visa-cli generate image "$PROMPT" --json --yes \
    | visa-cli run-llm --model or-gemini-nano-banana-pro --json --yes \
        --image-url --from-stdin '.urls[0]' \
        "Critique this image for: '$PROMPT'. Be specific and concise. One short paragraph." \
    | jq -r '.text')

  # Re-prompt with the critique
  CURRENT_IMG=$(echo "$PROMPT. Refinement note: $CRITIQUE" \
    | visa-cli generate image - --json --yes \
    | jq -r '.urls[0] // .filePath')

  echo "Round $i: $CURRENT_IMG"
  echo "  critique: $CRITIQUE"
done

echo "Final image: $CURRENT_IMG"
```

## Cost breakdown (per round)

| Step | Tool | Approx |
|---|---|---|
| Image generation | `fal-flux-pro` | $0.04 |
| Vision critique | `or-gemini-nano-banana-pro` | $0.01 |
| **Per round** | | **~$0.05** |

3-round loop = ~$0.15. 10-round loop = ~$0.50.

## Notes

- The `--from-stdin .urls[0]` flag binds the upstream JSON envelope's first
  URL into `--image-url` — no `jq` round-trip needed.
- Stop the loop earlier by exiting when the critique includes a "looks
  good" sentinel (left as an exercise; pipe the critique back through
  `run-llm --json` with a yes/no system prompt).
