For AI agents: a documentation index is available at /llms.txt. Markdown versions of all documentation pages are available by appending .md to the URL path.

On This Page

Recipe D — Vision-critic refinement loop

Outcome: After N rounds, a final image that's been iteratively refined based on per-round critiques from a vision-capable LLM. Useful when one-shot prompts don't capture intent.

Approx cost: ~$0.05 × N (per round: image + LLM critique).

Script

#!/bin/bash
set -euo pipefail

PROMPT="${1:-a cyberpunk cat reading a newspaper}"
N="${2:-3}"

# Round 0: initial generation
CURRENT_IMG=$(visa-cli generate image "$PROMPT" --json --yes \
    | jq -r '.urls[0] // .filePath')
echo "Round 0: $CURRENT_IMG"

for i in $(seq 1 "$N"); do
  # Vision LLM critique using --from-stdin to bind the prior URL
  CRITIQUE=$(visa-cli generate image "$PROMPT" --json --yes \
    | visa-cli run-llm --model or-gemini-nano-banana-pro --json --yes \
        --image-url --from-stdin '.urls[0]' \
        "Critique this image for: '$PROMPT'. Be specific and concise. One short paragraph." \
    | jq -r '.text')

  # Re-prompt with the critique
  CURRENT_IMG=$(echo "$PROMPT. Refinement note: $CRITIQUE" \
    | visa-cli generate image - --json --yes \
    | jq -r '.urls[0] // .filePath')

  echo "Round $i: $CURRENT_IMG"
  echo "  critique: $CRITIQUE"
done

echo "Final image: $CURRENT_IMG"

Cost breakdown (per round)

Step	Tool	Approx
Image generation	`fal-flux-pro`	$0.04
Vision critique	`or-gemini-nano-banana-pro`	$0.01
Per round		~$0.05

3-round loop = ~$0.15. 10-round loop = ~$0.50.

Notes

The --from-stdin .urls[0] flag binds the upstream JSON envelope's first URL into --image-url — no jq round-trip needed.
Stop the loop earlier by exiting when the critique includes a "looks good" sentinel (left as an exercise; pipe the critique back through run-llm --json with a yes/no system prompt).