Media Generation

Generate images, videos, voice, and remove backgrounds using AI. Supports DALL-E, Imagen, Gemini, Veo, Sora, and ElevenLabs.

Overview

The Media Generation API provides a unified interface for creating AI-generated content across multiple modalities — images, videos, voice, and text. Every generation is stored in your media library and can be linked directly to products in your catalog.

All endpoints accept a model parameter so you can choose the best provider for each task. Results are returned as hosted URLs ready for use in your storefront, emails, or ad campaigns.

Category	Models	Output Formats
Image	DALL-E 3, Imagen 3, Gemini	PNG, JPEG, WebP
Video	Veo 3, Sora	MP4, WebM
Voice	ElevenLabs Turbo v2, ElevenLabs Multilingual v2	MP3, WAV, OGG
Text	Claude Sonnet, Claude Opus, GPT-4o	Plain text, Markdown

Image Generation

Generate images from text prompts using DALL-E 3, Imagen 3, or Gemini. Specify size, quality, and style. Generated images are automatically uploaded to your media library.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Generate an image
const image = await whale.generate.image({
  prompt: 'A minimalist product photo of a ceramic coffee mug on a marble surface',
  model: 'dall-e-3',
  size: '1024x1024',
  quality: 'hd',
  style: 'natural'
})

// Response
{
  "id": "gen_img_550e8400",
  "url": "https://cdn.whaletools.dev/media/gen_img_550e8400.png",
  "model": "dall-e-3",
  "size": "1024x1024",
  "created_at": "2026-03-10T12:00:00Z"
}

// Generate with Imagen
const imagen = await whale.generate.image({
  prompt: 'Flat lay of summer clothing collection on white background',
  model: 'imagen-3',
  size: '1024x1024',
  aspect_ratio: '1:1'
})

Video Generation

Create short-form video content from text prompts. Ideal for product demos, social media clips, and ad creatives. Videos are generated asynchronously — poll the status endpoint or use a webhook to get notified when complete.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Generate a video
const video = await whale.generate.video({
  prompt: 'A smooth 360-degree rotation of a pair of white sneakers on a dark background',
  duration: 5,
  model: 'veo-3',
  resolution: '1080p',
  aspect_ratio: '16:9'
})

// Response (generation is async)
{
  "id": "gen_vid_a1b2c3d4",
  "status": "processing",
  "estimated_seconds": 45,
  "poll_url": "/v1/generate/video/gen_vid_a1b2c3d4"
}

// Poll for completion
const result = await whale.generate.video.get('gen_vid_a1b2c3d4')

// When complete
{
  "id": "gen_vid_a1b2c3d4",
  "status": "completed",
  "url": "https://cdn.whaletools.dev/media/gen_vid_a1b2c3d4.mp4",
  "duration": 5,
  "model": "veo-3",
  "created_at": "2026-03-10T12:01:00Z"
}

Voice Synthesis

Convert text to natural-sounding speech using ElevenLabs. Choose from a library of voices or clone a custom voice for your brand. Use it for product descriptions, IVR menus, podcast intros, and more.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Synthesize speech
const audio = await whale.generate.voice({
  text: 'Welcome to our store. We have new arrivals every week.',
  voice_id: 'voice_rachel',
  model: 'eleven_turbo_v2',
  output_format: 'mp3'
})

// Response
{
  "id": "gen_voice_e5f6g7h8",
  "url": "https://cdn.whaletools.dev/media/gen_voice_e5f6g7h8.mp3",
  "duration_seconds": 3.2,
  "model": "eleven_turbo_v2",
  "voice_id": "voice_rachel",
  "created_at": "2026-03-10T12:02:00Z"
}

// List available voices
const voices = await whale.generate.voice.list()
// Returns: [{ id: "voice_rachel", name: "Rachel", preview_url: "..." }, ...]

Background Removal

Remove backgrounds from product photos instantly. Returns a transparent PNG ready for use on any background color or composite image. Works with complex edges, hair, and transparent objects.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Remove background from an image
const result = await whale.generate.removeBackground({
  image_url: 'https://cdn.whaletools.dev/media/product_photo.jpg'
})

// Response
{
  "id": "gen_rbg_i9j0k1l2",
  "url": "https://cdn.whaletools.dev/media/gen_rbg_i9j0k1l2.png",
  "format": "png",
  "transparent": true,
  "original_url": "https://cdn.whaletools.dev/media/product_photo.jpg",
  "created_at": "2026-03-10T12:03:00Z"
}

// Batch remove backgrounds
const batch = await whale.generate.removeBackground.batch({
  image_urls: [
    'https://cdn.whaletools.dev/media/photo_1.jpg',
    'https://cdn.whaletools.dev/media/photo_2.jpg',
    'https://cdn.whaletools.dev/media/photo_3.jpg'
  ]
})
// Returns: [{ id, url, transparent }, ...]

Text Generation

Generate product descriptions, marketing copy, email content, and more using large language models. Supports streaming responses for real-time display. Combine with prompt templates for consistent, on-brand output.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Generate text
const completion = await whale.generate.text({
  prompt: 'Write a compelling product description for a handmade leather wallet.',
  model: 'claude-sonnet-4-6',
  max_tokens: 1000,
  temperature: 0.7
})

// Response
{
  "id": "gen_txt_m3n4o5p6",
  "text": "Crafted from full-grain leather that develops a rich patina...",
  "model": "claude-sonnet-4-6",
  "usage": {
    "input_tokens": 18,
    "output_tokens": 156
  },
  "created_at": "2026-03-10T12:04:00Z"
}

// Stream a response
const stream = await whale.generate.text({
  prompt: 'Write 5 Instagram captions for a new sneaker drop.',
  model: 'claude-sonnet-4-6',
  max_tokens: 500,
  stream: true
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

Media Library

All generated media is automatically stored in your media library. You can also upload files directly, organize with tags, and link media to products or variants. The library supports images, videos, audio, and documents.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Upload a file
const file = await whale.media.upload({
  file: buffer,
  filename: 'hero-banner.jpg',
  content_type: 'image/jpeg',
  tags: ['banner', 'homepage']
})

// List media with filters
const media = await whale.media.list({
  type: 'image',
  tags: ['product'],
  limit: 20,
  sort: 'created_at:desc'
})

// Link media to a product
await whale.media.link({
  media_id: 'med_q7r8s9t0',
  product_id: 'prod_abc123',
  position: 0  // primary image
})

// Bulk link media to a variant
await whale.media.link({
  media_id: 'med_u1v2w3x4',
  variant_id: 'var_def456',
  position: 1
})

API Reference

All media generation and library endpoints. Authenticated with your API key. Scoped to your store.

Method	Endpoint	Description
POST	/v1/generate/image	Generate an image from a text prompt.
POST	/v1/generate/video	Generate a video from a text prompt.
POST	/v1/generate/voice	Synthesize speech from text.
POST	/v1/generate/remove-background	Remove background from an image.
POST	/v1/generate/text	Generate text with an LLM.
GET	/v1/media	List all media files.
POST	/v1/media	Upload a media file.
GET	/v1/media/:id	Get a single media file.
DELETE	/v1/media/:id	Delete a media file.
POST	/v1/media/:id/link	Link media to a product or variant.