Observability

Telemetry spans, error tracking, event bus, and fraud detection powered by ClickHouse.

Overview

WhaleTools Observability is built on ClickHouse for high-throughput analytics. The system ingests telemetry spans, error events, and operational metrics across all services. Currently tracking 88K+ spans and 14K+ errors with sub-second query performance.

All observability data flows through the Gateway API. Ingest spans via POST /v1/native/telemetry and query traces via POST /v1/native/clickhouse/query. Both endpoints require JWT authentication.

Ingest Spans

Send telemetry spans to track operations across your system. Each span records an operation name, duration, and optional metadata like model name and token counts for LLM calls.

Field	Type	Description
operation_name	string	Name of the operation (e.g., "llm.chat", "tool.execute", "api.request").
duration_ms	number	How long the operation took in milliseconds.
service_name	string	Service that produced the span (e.g., "whale-gateway", "whale-agent").
model_name	string	LLM model used, if applicable (e.g., "claude-opus-4-20250514").
input_tokens	number	Number of input tokens consumed by the LLM call.
output_tokens	number	Number of output tokens generated by the LLM call.
status	string	Span status: "ok", "error", or "timeout".
trace_id	string	Groups related spans into a single trace.
parent_span_id	string	Links child spans to their parent for tree visualization.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Ingest a span
await whale.telemetry.ingest({
  spans: [
    {
      trace_id: 'trace_abc123',
      span_id: 'span_001',
      operation_name: 'llm.chat',
      service_name: 'product-assistant',
      duration_ms: 1842,
      model_name: 'claude-sonnet-4-20250514',
      input_tokens: 1250,
      output_tokens: 340,
      status: 'ok',
      attributes: {
        agent_id: 'agent_abc123',
        conversation_id: 'conv_xyz789'
      }
    }
  ]
})

// Response
{ "accepted": 1, "rejected": 0 }

Query Traces

Query stored traces using ClickHouse SQL. Filter by service, operation, duration, time range, and custom attributes. The query endpoint only allows SELECT statements — write operations are rejected.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Find slow LLM calls in the last hour
const traces = await whale.telemetry.query({
  sql: `
    SELECT
      operation_name,
      duration_ms,
      model_name,
      input_tokens + output_tokens AS total_tokens
    FROM spans
    WHERE service_name = 'product-assistant'
      AND duration_ms > 3000
      AND timestamp > now() - INTERVAL 1 HOUR
    ORDER BY duration_ms DESC
    LIMIT 20
  `
})

// Token usage by model over the last 7 days
const usage = await whale.telemetry.query({
  sql: `
    SELECT
      model_name,
      sum(input_tokens) AS total_input,
      sum(output_tokens) AS total_output,
      count() AS call_count,
      avg(duration_ms) AS avg_latency_ms
    FROM spans
    WHERE model_name != ''
      AND timestamp > now() - INTERVAL 7 DAY
    GROUP BY model_name
    ORDER BY total_input + total_output DESC
  `
})

Error Events

Errors are deduplicated using SHA-256 fingerprinting — identical errors are grouped automatically. Each error includes a severity level, stack trace, and contextual metadata.

critical

System-breaking errors that require immediate attention. Pages operators.

error

Request failures, unhandled exceptions, and integration errors.

warning

Degraded performance, approaching limits, recoverable issues.

info

Notable events that are not errors — config changes, deploys, scaling events.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Report an error
await whale.telemetry.error({
  message: 'Payment gateway timeout after 30s',
  severity: 'error',
  service_name: 'checkout',
  stack_trace: 'TimeoutError: Request timed out\n  at PaymentClient.charge (payment.ts:42)',
  attributes: {
    order_id: 'ord_abc123',
    gateway: 'stripe',
    amount: 4999
  }
})

// Query error trends
const errors = await whale.telemetry.query({
  sql: `
    SELECT
      fingerprint,
      message,
      severity,
      count() AS occurrences,
      max(timestamp) AS last_seen
    FROM errors
    WHERE timestamp > now() - INTERVAL 24 HOUR
    GROUP BY fingerprint, message, severity
    ORDER BY occurrences DESC
    LIMIT 10
  `
})

Event Bus

Publish events with guaranteed delivery. Events support idempotency keys to prevent duplicate processing, configurable retry with exponential backoff, and a dead letter queue for events that exceed max_attempts.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Publish an event
await whale.events.publish({
  event_type: 'order.completed',
  idempotency_key: 'ord_abc123_completed',
  payload: {
    order_id: 'ord_abc123',
    customer_id: 'cust_xyz789',
    total: 4999,
    items: 3
  },
  max_attempts: 5  // retry up to 5 times on handler failure
})

// Subscribe to events (webhook)
await whale.events.subscribe({
  event_type: 'order.completed',
  webhook_url: 'https://your-app.com/webhooks/order-completed',
  secret: 'whsec_your_signing_secret'
})

// List dead letter events (exceeded max_attempts)
const deadLetters = await whale.events.deadLetter.list({
  event_type: 'order.completed',
  since: '2026-03-01T00:00:00Z'
})

// Retry a dead letter event
await whale.events.deadLetter.retry({ event_id: 'evt_failed_001' })

Fraud Detection

Real-time fraud scoring on orders and transactions. Each order receives a risk score from 0 (no risk) to 100 (confirmed fraud). Scores above your configured threshold trigger holds for manual review. The system analyzes multiple signals:

velocity_check

Too many orders from the same IP, device, or payment method in a short window.

address_mismatch

Billing and shipping addresses are in different countries or distant regions.

statistical_outlier

Order amount deviates significantly from the customer's historical average.

new_account_high_value

High-value order from an account created within the last 24 hours.

proxy_vpn_detection

Order placed through a known proxy, VPN, or Tor exit node.

const whale = new WhaleClient({ apiKey: 'wk_live_...' })

// Score an order
const score = await whale.fraud.score({
  order_id: 'ord_abc123',
  customer_id: 'cust_xyz789',
  amount: 29900,
  ip_address: '203.0.113.42',
  billing_country: 'US',
  shipping_country: 'US',
  payment_method: 'card_ending_4242',
  email: 'customer@example.com'
})

// Response
{
  "order_id": "ord_abc123",
  "risk_score": 23,
  "decision": "approve",    // "approve" | "review" | "reject"
  "signals": [
    { "type": "velocity_check", "score": 0, "detail": "1 order in 24h — normal" },
    { "type": "address_mismatch", "score": 0, "detail": "Same country" },
    { "type": "new_account_high_value", "score": 23, "detail": "Account 3 days old, order is 2x avg" }
  ]
}

// Configure thresholds
await whale.fraud.configure({
  review_threshold: 40,   // score >= 40 triggers manual review
  reject_threshold: 75    // score >= 75 auto-rejects
})