AI Platform

Embeddings, prompt templates, agent memory, evaluation framework, and safety rules.

Overview

The AI Platform is the infrastructure layer that powers every AI agent in your store. It provides vector embeddings for semantic search, versioned prompt templates, persistent agent memory, an evaluation framework for testing agent quality, and safety rules to keep responses on-brand and compliant.

All AI Platform endpoints are scoped to your store and authenticated with your API key. Agents automatically use these systems — you can also access them directly for custom integrations.

Embeddings & Semantic Search

Create vector embeddings from text and search by semantic similarity. Use this to power product discovery, FAQ matching, and knowledge base retrieval. Embeddings are stored per-store and can be scoped to a namespace.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Create an embedding
const embedding = await whale.ai.embeddings.create({
  input: 'Organic cotton crew neck t-shirt in navy blue',
  namespace: 'products',
  metadata: {
    product_id: 'prod_abc123',
    category: 'apparel'
  }
})

// Search by similarity
const results = await whale.ai.embeddings.search({
  query: 'comfortable blue shirt',
  namespace: 'products',
  limit: 10,
  threshold: 0.75
})

// Response
{
  "results": [
    {
      "id": "emb_550e8400",
      "score": 0.92,
      "metadata": { "product_id": "prod_abc123", "category": "apparel" },
      "text": "Organic cotton crew neck t-shirt in navy blue"
    }
  ]
}

// Index all products in bulk
await whale.ai.embeddings.index({
  source: 'products',
  fields: ['name', 'description', 'tags'],
  namespace: 'products'
})

Prompt Templates

Versioned prompt templates with change tracking. Templates support variable interpolation and can be pinned to a specific version or set to always use the latest. Every edit creates a new version — roll back instantly if a change degrades quality.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Create a prompt template
const template = await whale.ai.templates.create({
  name: 'product_recommendation',
  content: `You are a shopping assistant for {{store_name}}.
The customer is looking for: {{query}}

Available products:
{{#products}}
- {{name}} ({{price}}): {{description}}
{{/products}}

Recommend the best match and explain why.`,
  variables: ['store_name', 'query', 'products'],
  description: 'Product recommendation prompt with inventory context'
})

// Use a specific version
const v2 = await whale.ai.templates.get('product_recommendation', {
  version: 2
})

// List all versions with change history
const history = await whale.ai.templates.versions('product_recommendation')
// Returns: [{ version: 3, created_at, diff_summary }, ...]

Agent Memory

Agents store and retrieve memory across three scopes. Memory is automatically managed during conversations, but you can also read and write memory directly for custom workflows.

short_term

Conversation-scoped memory that expires after the session ends. Default TTL: 1 hour.

long_term

Persistent memory stored across sessions. Used for customer preferences and history.

entity

Structured memory about specific entities — customers, products, orders. Auto-linked by ID.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Store a long-term memory
await whale.ai.memory.store({
  agent_id: 'agent_abc123',
  type: 'long_term',
  key: 'customer_preference',
  value: {
    preferred_size: 'M',
    favorite_colors: ['navy', 'black'],
    allergies: ['latex']
  },
  customer_id: 'cust_xyz789',
  ttl: null  // null = no expiration
})

// Store a short-term memory with TTL
await whale.ai.memory.store({
  agent_id: 'agent_abc123',
  type: 'short_term',
  key: 'current_cart_context',
  value: { items: 3, total: 89.50 },
  ttl: 3600  // expires in 1 hour
})

// Retrieve memory for a customer
const memories = await whale.ai.memory.recall({
  agent_id: 'agent_abc123',
  customer_id: 'cust_xyz789',
  types: ['long_term', 'entity']
})

Evaluation Framework

Test agent quality systematically with datasets, test cases, and automated scoring. Run evaluations after prompt changes to catch regressions before they reach customers. Supports exact match, semantic similarity, and LLM judge scoring.

Field	Type	Description
dataset_id	string	Reference to the evaluation dataset containing test cases.
test_cases	array	Array of input/expected_output pairs for scoring.
scoring_method	string	Scoring strategy: exact_match, semantic_similarity, or llm_judge.
judge_model	string	Model used for LLM judge scoring (e.g., claude-sonnet-4-20250514).
pass_threshold	number	Minimum score (0-1) for a test case to pass.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Create a dataset
const dataset = await whale.ai.eval.datasets.create({
  name: 'product_qa_v2',
  test_cases: [
    {
      input: 'Do you have this in red?',
      expected_output: 'Let me check our available colors for you.',
      tags: ['color_query']
    },
    {
      input: 'What is your return policy?',
      expected_output: 'You can return any item within 30 days.',
      tags: ['policy']
    }
  ]
})

// Run an evaluation
const run = await whale.ai.eval.run({
  agent_id: 'agent_abc123',
  dataset_id: dataset.id,
  scoring_method: 'llm_judge',
  judge_model: 'claude-sonnet-4-20250514',
  pass_threshold: 0.8
})

// Response
{
  "id": "eval_run_001",
  "status": "completed",
  "summary": {
    "total": 2,
    "passed": 2,
    "failed": 0,
    "avg_score": 0.91
  },
  "results": [
    { "test_case_id": 0, "score": 0.88, "passed": true, "judge_feedback": "..." },
    { "test_case_id": 1, "score": 0.94, "passed": true, "judge_feedback": "..." }
  ]
}

Safety Rules

Safety rules run on every agent response before it reaches the customer. Blocking rules prevent the response from being sent and return a fallback message. Warning rules flag the response for review but still deliver it.

Rule	Mode	Description
content_filter	blocking	Blocks responses containing prohibited content categories.
pii_detection	warning	Flags responses that may contain personally identifiable information.
topic_restriction	blocking	Prevents the agent from discussing off-topic subjects.
tone_check	warning	Warns when response tone deviates from brand guidelines.
hallucination_guard	blocking	Blocks responses that reference products or policies not in the knowledge base.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Add a safety rule
await whale.ai.safety.create({
  agent_id: 'agent_abc123',
  rule_type: 'topic_restriction',
  mode: 'blocking',
  config: {
    blocked_topics: ['competitors', 'politics', 'religion'],
    fallback_message: "I can only help with questions about our products and services."
  }
})

// List active rules for an agent
const rules = await whale.ai.safety.list({ agent_id: 'agent_abc123' })

Cost Budgets

Set spending limits on AI usage to prevent runaway costs. Budgets can be configured at the store level or per-agent. When a budget is exceeded, the agent returns a graceful fallback message instead of making additional LLM calls.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Set a store-wide budget
await whale.ai.budgets.set({
  scope: 'store',
  limits: {
    daily: 25.00,    // $25/day
    weekly: 150.00,  // $150/week
    monthly: 500.00  // $500/month
  },
  alert_threshold: 0.8,  // alert at 80% usage
  fallback_message: "Our AI assistant is temporarily unavailable. Please contact support."
})

// Set a per-agent budget
await whale.ai.budgets.set({
  scope: 'agent',
  agent_id: 'agent_abc123',
  limits: {
    daily: 10.00,
    monthly: 200.00
  }
})

// Check current usage
const usage = await whale.ai.budgets.usage()
// { daily_spent: 8.42, daily_limit: 25.00, daily_remaining: 16.58, ... }