Building with AI: Phase 02 — AI Integration | Learn PromptPlan

BUILDING WITH AI — Phase 2: Integration

Phase 02 — AI Integration

Choose your provider, design a prompt system that scales, wire up streaming responses, and add the guardrails that make an AI feature production-safe.


Series	Building with AI
Phase	Phase 02 — AI Integration
Skill level	Intermediate
Est. time	90-120 minutes
Deliverable	Working AI endpoint
AI prompts included	5
Prerequisite	Phase 01 — Planning & Architecture

Why AI integration fails in production

Most AI features fail not because the model is weak, but because the prompt is vague, the architecture is brittle, and nobody planned for the model to be wrong.

AI integration is the phase that separates products that feel magical from products that feel like demos. The difference is rarely the model itself. It is the system around the model: prompt structure, failure handling, cost control, and user protection.

This tutorial covers a full production-minded setup: provider selection, API key management, a reusable prompt template system, streaming responses, error handling, usage monitoring, and content moderation.

1. Choosing your AI provider

Your choice of provider is not mainly a technical decision. It is a product decision. The right provider depends on the job the feature performs, your latency target, your cost tolerance, and how much behavioural control you need.

Three providers cover most production use cases:

Provider	Models	Best for	Why
Anthropic Claude	`claude-sonnet-4-5`, `claude-haiku-4-5`	Reasoning	Strong instruction-following, long context analysis, and structured output
OpenAI	`gpt-4o`, `gpt-4o-mini`	Multimodal	Strong ecosystem, familiar API shape, good image understanding
Google Gemini	`gemini-1.5-pro`, `gemini-flash`	Large context	Very large context windows and strong cost efficiency for simpler tasks

Practical decision rule

Start with one provider. Wrap provider calls behind a thin abstraction from day one so you can swap later without rewriting your application logic.

On model versions

Never hardcode a model string into application logic. Pin the model in an environment variable such as AI_MODEL=claude-sonnet-4-5 so upgrades and deprecations remain configuration changes, not code changes.

AI prompt: Provider selection

text

My product: [describe in 2-3 sentences]

My AI feature specifically does: [e.g. "generates product descriptions from a title and 5 bullet points"]

My requirements:
- Expected calls per day: [number]
- Max acceptable latency: [e.g. "2 seconds"]
- Budget per 1,000 AI calls: [$X]
- Input size: [e.g. "~500 tokens per call"]
- Output size: [e.g. "~300 tokens per call"]

Compare Anthropic Claude Haiku, OpenAI gpt-4o-mini, and Gemini Flash for this use case.
Show a cost estimate per 1,000 calls for each. Recommend one with a clear reason.

2. Securing API keys

An exposed API key is not a minor inconvenience. It is a billing emergency. A leaked key in a public repository can be abused within minutes.

There is one rule: API keys belong in environment variables only. Never in source code, client-side JavaScript, comments, or logs.

Step 1: Store keys in `.env.local`

Create a .env.local file in the project root. Make sure it is ignored before your first commit.

Step 2: Keep AI keys server-side only

In Next.js, variables prefixed with NEXT_PUBLIC_ are exposed to the browser. AI provider keys must never use that prefix.

Step 3: Add keys to your deployment platform

Use the environment variable UI in your hosting platform. Never commit production keys.

Step 4: Rotate keys on any suspected exposure

If a key ever touches git history, revoke it immediately. Removing the commit is not enough.

env

# AI Provider Keys - server-side only, never NEXT_PUBLIC_
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Model config - change without code deploy
AI_MODEL=claude-haiku-4-5-20251001
AI_MAX_TOKENS=1024

# Rate limiting
AI_RATE_LIMIT_PER_USER_PER_HOUR=20

Common mistake: calling the AI API from the client

Never call the provider directly from browser code. All AI calls must go through your own server-side route, which holds the key and proxies the request.

3. Designing your prompt system

A prompt system is not a single string. It is a reusable structure that separates:

the system instruction
contextual data
the actual user message

The biggest prompt mistake is embedding one giant instruction string directly inside the API call. That makes prompts hard to test, version, or reuse.

The three-layer model

System prompt: defines role, tone, output format, and hard constraints
Context injection: validated server-side data about the user, session, or document
User message: the actual request, sanitised and never trusted to override system rules

typescript

// Centralised prompt registry - all prompts live here, not scattered in routes
export const prompts = {
  productDescription: (ctx: {
    productName: string
    bulletPoints: string[]
    tone: 'professional' | 'playful' | 'luxury'
  }) => `You are a product copywriter specialising in e-commerce listings.
Tone: ${ctx.tone}. Output format: 2-3 sentences, no bullet points, no markdown.
Hard limit: 80 words. Never invent specifications not in the bullet points.`,

  documentSummary: (ctx: { maxWords: number; audience: string }) =>
    `You are a document analyst.
Summarise the provided document in plain language for ${ctx.audience}.
Maximum ${ctx.maxWords} words. Use present tense. Output only the summary - no preamble.`,

  chatAssistant: (ctx: { userName: string; planTier: string }) =>
    `You are a helpful assistant for [Product Name].
The user's name is ${ctx.userName} and they are on the ${ctx.planTier} plan.
Answer concisely. If you don't know something, say so - never fabricate.
Never discuss competitors. Escalate billing questions to support@example.com.`,
}

Version prompts like code

Store prompt templates in a dedicated file such as lib/prompts.ts and commit them. Prompt changes should be diffable, reviewable, and reversible.

AI prompt: System prompt generator

text

I need a system prompt for an AI feature in my app.

Feature description: [what the AI does in one sentence]
Output format: [e.g. JSON, plain text, markdown, specific structure]
Tone: [e.g. professional, friendly, concise]
Hard constraints: [things the AI must never do]
User context available: [e.g. user's name, subscription tier, document they uploaded]

Write a system prompt following this structure:
1. Role definition (one sentence)
2. Behaviour rules (bullet list)
3. Output format specification (exact format with example)
4. Hard constraints (things it must never do)

Keep it under 200 words. Be specific, not vague.

Exercise 3.1: Design your prompt template

Deliverable: A first system prompt draft for your core AI feature.

Include:

what the feature does
the system prompt draft
the dynamic context you plan to inject

4. Building a streaming endpoint

Streaming is not a nice-to-have. It is what makes an AI feature feel alive instead of broken. Without it, users stare at a spinner. With it, they see progress in real time.

In Next.js App Router, streaming usually means a server-side ReadableStream and a client-side reader. The Vercel AI SDK can simplify this, but the core mechanics are still worth understanding.

typescript

import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'
import { prompts } from '@/lib/prompts'
import { rateLimit } from '@/lib/rate-limit'
import { getServerSession } from 'next-auth'

const client = new Anthropic()

export async function POST(req: NextRequest) {
  const session = await getServerSession()
  if (!session) return new Response('Unauthorized', { status: 401 })

  const limited = await rateLimit(session.user.id)
  if (limited) return new Response('Rate limit exceeded', { status: 429 })

  const { userMessage } = await req.json()
  if (!userMessage || typeof userMessage !== 'string') {
    return new Response('Invalid input', { status: 400 })
  }

  const systemPrompt = prompts.chatAssistant({
    userName: session.user.name,
    planTier: session.user.plan,
  })

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const response = await client.messages.stream({
          model: process.env.AI_MODEL!,
          max_tokens: parseInt(process.env.AI_MAX_TOKENS || '1024'),
          system: systemPrompt,
          messages: [{ role: 'user', content: userMessage }],
        })

        for await (const chunk of response) {
          if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
            controller.enqueue(new TextEncoder().encode(chunk.delta.text))
          }
        }
        controller.close()
      } catch (error) {
        controller.error(error)
      }
    },
  })

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Transfer-Encoding': 'chunked',
      'X-Content-Type-Options': 'nosniff',
    },
  })
}

typescript

'use client'
import { useState } from 'react'

export function useAIStream() {
  const [output, setOutput] = useState('')
  const [loading, setLoading] = useState(false)
  const [error, setError] = useState<string | null>(null)

  const generate = async (userMessage: string) => {
    setOutput('')
    setLoading(true)
    setError(null)

    try {
      const res = await fetch('/api/ai/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ userMessage }),
      })

      if (!res.ok) throw new Error(await res.text())

      const reader = res.body!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break
        setOutput(prev => prev + decoder.decode(value, { stream: true }))
      }
    } catch (err) {
      setError('Something went wrong. Please try again.')
    } finally {
      setLoading(false)
    }
  }

  return { output, loading, error, generate }
}

5. Error handling and fallbacks

AI APIs fail. Models time out. Rate limits get hit. The network drops. Failure planning is part of the feature, not an optional extra.

Error type	HTTP status	Correct response
Network timeout	—	Retry once after 1 second, then show a user-friendly error
Rate limited by provider	429	Exponential backoff, max 3 retries
Invalid API key	401	Alert ops immediately, do not expose details to users
Model overloaded	529	Queue the request or show a high-demand message
Max tokens exceeded	400	Truncate input or ask the user to shorten it
Content filtered	400	Explain that the request cannot be processed

Golden rule of AI error messages

Never show raw provider errors to users. Translate every error class into a clear, actionable user message and log the raw detail server-side.

typescript

export function classifyAIError(error: unknown): {
  userMessage: string
  shouldRetry: boolean
  retryAfterMs: number
} {
  if (error instanceof Anthropic.APIError) {
    switch (error.status) {
      case 429:
        return {
          userMessage: 'Our AI is busy right now. Retrying...',
          shouldRetry: true,
          retryAfterMs: parseInt(error.headers?.['retry-after'] || '2') * 1000,
        }
      case 529:
        return {
          userMessage: 'High demand right now. Please try again in a moment.',
          shouldRetry: true,
          retryAfterMs: 5000,
        }
      default:
        return {
          userMessage: 'Something went wrong with the AI. Please try again.',
          shouldRetry: false,
          retryAfterMs: 0,
        }
    }
  }
  return {
    userMessage: 'An unexpected error occurred. Please try again.',
    shouldRetry: false,
    retryAfterMs: 0,
  }
}

6. Monitoring usage and costs

AI costs are variable and can spike unexpectedly. A single abusive input pattern or retry bug can generate a surprising bill overnight. Visibility has to exist before the problem happens.

Log every AI call with enough metadata to answer:

who triggered it
how many tokens were used
how long it took
whether it succeeded

typescript

interface AICallLog {
  userId: string
  feature: string
  model: string
  inputTokens: number
  outputTokens: number
  durationMs: number
  success: boolean
  errorCode?: string
}

export async function logAICall(log: AICallLog) {
  await db.aiUsage.create({ data: { ...log, timestamp: new Date() } })

  const totalTokens = log.inputTokens + log.outputTokens
  if (totalTokens > 4000) {
    console.warn(`High token usage: ${totalTokens} tokens for user ${log.userId}`)
  }
}

export function estimateCostUSD(model: string, input: number, output: number): number {
  const rates: Record<string, { in: number; out: number }> = {
    'claude-haiku-4-5-20251001': { in: 0.80, out: 4.00 },
    'claude-sonnet-4-5': { in: 3.00, out: 15.00 },
    'gpt-4o-mini': { in: 0.15, out: 0.60 },
  }
  const r = rates[model] || { in: 0, out: 0 }
  return (input * r.in + output * r.out) / 1_000_000
}

Set a hard spend limit immediately

Every major AI provider supports monthly spend caps. Set one before you have live traffic. A modest MVP limit is better than discovering an uncontrolled bill after the fact.

7. Content moderation and guardrails

If users can submit free-form text, they will eventually send something they should not. A system prompt alone is not enough protection. Guardrails must be independent of the model.

Step 1: Input sanitisation

Strip HTML, cap character length, and validate input shape before anything reaches the model.

Step 2: Prompt injection defence

Never concatenate raw user input into your system prompt. Keep system instructions and user content structurally separate.

Step 3: Output validation

If you expect JSON or another structured format, validate it before using it. Models will occasionally produce malformed output.

Step 4: PII filtering

Before logging or storing outputs, scan for personally identifiable information and redact where appropriate.

AI Assist: Guardrail review

Once your system prompt and route code exist, ask an AI to review them like a security engineer. Specifically ask:

how could a user manipulate the prompt?
what sensitive data could leak?
how could someone trigger unusually expensive calls?

AI prompt: Security audit your AI feature

text

Here is my AI feature's system prompt and API route code:

[SYSTEM PROMPT]
---
[paste your system prompt]
---

[API ROUTE CODE]
---
[paste your route.ts]
---

Act as a security engineer. Identify:
1. Prompt injection vectors - how could a user override my system prompt?
2. Data leakage risks - what sensitive data could appear in outputs?
3. Cost abuse vectors - how could a user trigger unusually expensive calls?
4. Missing input validation - what inputs are not being checked?
5. Missing output validation - what unexpected output formats could break my app?

For each issue found, suggest a specific fix in TypeScript.

8. Phase checklist

Before moving on, make sure the following are complete:

Chosen an AI provider based on task type, latency, and cost
Pinned the AI model in an environment variable instead of hardcoding it
Stored API keys in .env.local only and verified they are absent from source control
Routed all AI calls through a server-side endpoint
Centralised prompts in a reusable lib/prompts.ts file
Implemented streaming so users see output progressively
Added an error classifier with user-friendly messages
Added per-user rate limiting
Logged usage with tokens, duration, model, and success state
Set a monthly spend limit in the provider dashboard
Added input sanitisation and max-length validation
Reviewed prompt and route security with an audit prompt

9. What you should have at the end of Phase 02

A provider choice with a documented cost rationale
API keys secured and ready for deployment
A reusable and versioned prompt template system
A live streaming endpoint in Next.js App Router
A client-side stream reader with loading and error states
Typed AI error classification
Per-user rate limiting and spend monitoring
Guardrails against injection, PII leakage, and cost abuse

Up next: Phase 03 — Security

The next phase is tightening authentication, permissions, request validation, and system boundaries so the AI feature is not just useful, but safe to operate.

BUILDING WITH AI — Phase 2: Integration

Phase 02 — AI Integration

Choose your provider, design a prompt system that scales, wire up streaming responses, and add the guardrails that make an AI feature production-safe.


Series	Building with AI
Phase	Phase 02 — AI Integration
Skill level	Intermediate
Est. time	90-120 minutes
Deliverable	Working AI endpoint
AI prompts included	5
Prerequisite	Phase 01 — Planning & Architecture

Why AI integration fails in production

Most AI features fail not because the model is weak, but because the prompt is vague, the architecture is brittle, and nobody planned for the model to be wrong.

1. Choosing your AI provider

Three providers cover most production use cases:

Provider	Models	Best for	Why
Anthropic Claude	`claude-sonnet-4-5`, `claude-haiku-4-5`	Reasoning	Strong instruction-following, long context analysis, and structured output
OpenAI	`gpt-4o`, `gpt-4o-mini`	Multimodal	Strong ecosystem, familiar API shape, good image understanding
Google Gemini	`gemini-1.5-pro`, `gemini-flash`	Large context	Very large context windows and strong cost efficiency for simpler tasks

Practical decision rule

Start with one provider. Wrap provider calls behind a thin abstraction from day one so you can swap later without rewriting your application logic.

On model versions

AI prompt: Provider selection

text

My product: [describe in 2-3 sentences]

My AI feature specifically does: [e.g. "generates product descriptions from a title and 5 bullet points"]

My requirements:
- Expected calls per day: [number]
- Max acceptable latency: [e.g. "2 seconds"]
- Budget per 1,000 AI calls: [$X]
- Input size: [e.g. "~500 tokens per call"]
- Output size: [e.g. "~300 tokens per call"]

Compare Anthropic Claude Haiku, OpenAI gpt-4o-mini, and Gemini Flash for this use case.
Show a cost estimate per 1,000 calls for each. Recommend one with a clear reason.

2. Securing API keys

An exposed API key is not a minor inconvenience. It is a billing emergency. A leaked key in a public repository can be abused within minutes.

There is one rule: API keys belong in environment variables only. Never in source code, client-side JavaScript, comments, or logs.

Step 1: Store keys in `.env.local`

Create a .env.local file in the project root. Make sure it is ignored before your first commit.

Step 2: Keep AI keys server-side only

In Next.js, variables prefixed with NEXT_PUBLIC_ are exposed to the browser. AI provider keys must never use that prefix.

Step 3: Add keys to your deployment platform

Use the environment variable UI in your hosting platform. Never commit production keys.

Step 4: Rotate keys on any suspected exposure

If a key ever touches git history, revoke it immediately. Removing the commit is not enough.

env

# AI Provider Keys - server-side only, never NEXT_PUBLIC_
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Model config - change without code deploy
AI_MODEL=claude-haiku-4-5-20251001
AI_MAX_TOKENS=1024

# Rate limiting
AI_RATE_LIMIT_PER_USER_PER_HOUR=20

Common mistake: calling the AI API from the client

Never call the provider directly from browser code. All AI calls must go through your own server-side route, which holds the key and proxies the request.

3. Designing your prompt system

A prompt system is not a single string. It is a reusable structure that separates:

the system instruction
contextual data
the actual user message

The biggest prompt mistake is embedding one giant instruction string directly inside the API call. That makes prompts hard to test, version, or reuse.

The three-layer model

System prompt: defines role, tone, output format, and hard constraints
Context injection: validated server-side data about the user, session, or document
User message: the actual request, sanitised and never trusted to override system rules

typescript

// Centralised prompt registry - all prompts live here, not scattered in routes
export const prompts = {
  productDescription: (ctx: {
    productName: string
    bulletPoints: string[]
    tone: 'professional' | 'playful' | 'luxury'
  }) => `You are a product copywriter specialising in e-commerce listings.
Tone: ${ctx.tone}. Output format: 2-3 sentences, no bullet points, no markdown.
Hard limit: 80 words. Never invent specifications not in the bullet points.`,

  documentSummary: (ctx: { maxWords: number; audience: string }) =>
    `You are a document analyst.
Summarise the provided document in plain language for ${ctx.audience}.
Maximum ${ctx.maxWords} words. Use present tense. Output only the summary - no preamble.`,

  chatAssistant: (ctx: { userName: string; planTier: string }) =>
    `You are a helpful assistant for [Product Name].
The user's name is ${ctx.userName} and they are on the ${ctx.planTier} plan.
Answer concisely. If you don't know something, say so - never fabricate.
Never discuss competitors. Escalate billing questions to support@example.com.`,
}

Version prompts like code

Store prompt templates in a dedicated file such as lib/prompts.ts and commit them. Prompt changes should be diffable, reviewable, and reversible.

AI prompt: System prompt generator

text

I need a system prompt for an AI feature in my app.

Feature description: [what the AI does in one sentence]
Output format: [e.g. JSON, plain text, markdown, specific structure]
Tone: [e.g. professional, friendly, concise]
Hard constraints: [things the AI must never do]
User context available: [e.g. user's name, subscription tier, document they uploaded]

Write a system prompt following this structure:
1. Role definition (one sentence)
2. Behaviour rules (bullet list)
3. Output format specification (exact format with example)
4. Hard constraints (things it must never do)

Keep it under 200 words. Be specific, not vague.

Exercise 3.1: Design your prompt template

Deliverable: A first system prompt draft for your core AI feature.

Include:

what the feature does
the system prompt draft
the dynamic context you plan to inject

4. Building a streaming endpoint

Streaming is not a nice-to-have. It is what makes an AI feature feel alive instead of broken. Without it, users stare at a spinner. With it, they see progress in real time.

In Next.js App Router, streaming usually means a server-side ReadableStream and a client-side reader. The Vercel AI SDK can simplify this, but the core mechanics are still worth understanding.

typescript

import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'
import { prompts } from '@/lib/prompts'
import { rateLimit } from '@/lib/rate-limit'
import { getServerSession } from 'next-auth'

const client = new Anthropic()

export async function POST(req: NextRequest) {
  const session = await getServerSession()
  if (!session) return new Response('Unauthorized', { status: 401 })

  const limited = await rateLimit(session.user.id)
  if (limited) return new Response('Rate limit exceeded', { status: 429 })

  const { userMessage } = await req.json()
  if (!userMessage || typeof userMessage !== 'string') {
    return new Response('Invalid input', { status: 400 })
  }

  const systemPrompt = prompts.chatAssistant({
    userName: session.user.name,
    planTier: session.user.plan,
  })

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const response = await client.messages.stream({
          model: process.env.AI_MODEL!,
          max_tokens: parseInt(process.env.AI_MAX_TOKENS || '1024'),
          system: systemPrompt,
          messages: [{ role: 'user', content: userMessage }],
        })

        for await (const chunk of response) {
          if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
            controller.enqueue(new TextEncoder().encode(chunk.delta.text))
          }
        }
        controller.close()
      } catch (error) {
        controller.error(error)
      }
    },
  })

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Transfer-Encoding': 'chunked',
      'X-Content-Type-Options': 'nosniff',
    },
  })
}

typescript

'use client'
import { useState } from 'react'

export function useAIStream() {
  const [output, setOutput] = useState('')
  const [loading, setLoading] = useState(false)
  const [error, setError] = useState<string | null>(null)

  const generate = async (userMessage: string) => {
    setOutput('')
    setLoading(true)
    setError(null)

    try {
      const res = await fetch('/api/ai/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ userMessage }),
      })

      if (!res.ok) throw new Error(await res.text())

      const reader = res.body!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break
        setOutput(prev => prev + decoder.decode(value, { stream: true }))
      }
    } catch (err) {
      setError('Something went wrong. Please try again.')
    } finally {
      setLoading(false)
    }
  }

  return { output, loading, error, generate }
}

5. Error handling and fallbacks

AI APIs fail. Models time out. Rate limits get hit. The network drops. Failure planning is part of the feature, not an optional extra.

Error type	HTTP status	Correct response
Network timeout	—	Retry once after 1 second, then show a user-friendly error
Rate limited by provider	429	Exponential backoff, max 3 retries
Invalid API key	401	Alert ops immediately, do not expose details to users
Model overloaded	529	Queue the request or show a high-demand message
Max tokens exceeded	400	Truncate input or ask the user to shorten it
Content filtered	400	Explain that the request cannot be processed

Golden rule of AI error messages

Never show raw provider errors to users. Translate every error class into a clear, actionable user message and log the raw detail server-side.

typescript

export function classifyAIError(error: unknown): {
  userMessage: string
  shouldRetry: boolean
  retryAfterMs: number
} {
  if (error instanceof Anthropic.APIError) {
    switch (error.status) {
      case 429:
        return {
          userMessage: 'Our AI is busy right now. Retrying...',
          shouldRetry: true,
          retryAfterMs: parseInt(error.headers?.['retry-after'] || '2') * 1000,
        }
      case 529:
        return {
          userMessage: 'High demand right now. Please try again in a moment.',
          shouldRetry: true,
          retryAfterMs: 5000,
        }
      default:
        return {
          userMessage: 'Something went wrong with the AI. Please try again.',
          shouldRetry: false,
          retryAfterMs: 0,
        }
    }
  }
  return {
    userMessage: 'An unexpected error occurred. Please try again.',
    shouldRetry: false,
    retryAfterMs: 0,
  }
}

6. Monitoring usage and costs

AI costs are variable and can spike unexpectedly. A single abusive input pattern or retry bug can generate a surprising bill overnight. Visibility has to exist before the problem happens.

Log every AI call with enough metadata to answer:

who triggered it
how many tokens were used
how long it took
whether it succeeded

typescript

interface AICallLog {
  userId: string
  feature: string
  model: string
  inputTokens: number
  outputTokens: number
  durationMs: number
  success: boolean
  errorCode?: string
}

export async function logAICall(log: AICallLog) {
  await db.aiUsage.create({ data: { ...log, timestamp: new Date() } })

  const totalTokens = log.inputTokens + log.outputTokens
  if (totalTokens > 4000) {
    console.warn(`High token usage: ${totalTokens} tokens for user ${log.userId}`)
  }
}

export function estimateCostUSD(model: string, input: number, output: number): number {
  const rates: Record<string, { in: number; out: number }> = {
    'claude-haiku-4-5-20251001': { in: 0.80, out: 4.00 },
    'claude-sonnet-4-5': { in: 3.00, out: 15.00 },
    'gpt-4o-mini': { in: 0.15, out: 0.60 },
  }
  const r = rates[model] || { in: 0, out: 0 }
  return (input * r.in + output * r.out) / 1_000_000
}

Set a hard spend limit immediately

Every major AI provider supports monthly spend caps. Set one before you have live traffic. A modest MVP limit is better than discovering an uncontrolled bill after the fact.

7. Content moderation and guardrails

If users can submit free-form text, they will eventually send something they should not. A system prompt alone is not enough protection. Guardrails must be independent of the model.

Step 1: Input sanitisation

Strip HTML, cap character length, and validate input shape before anything reaches the model.

Step 2: Prompt injection defence

Never concatenate raw user input into your system prompt. Keep system instructions and user content structurally separate.

Step 3: Output validation

If you expect JSON or another structured format, validate it before using it. Models will occasionally produce malformed output.

Step 4: PII filtering

Before logging or storing outputs, scan for personally identifiable information and redact where appropriate.

AI Assist: Guardrail review

Once your system prompt and route code exist, ask an AI to review them like a security engineer. Specifically ask:

how could a user manipulate the prompt?
what sensitive data could leak?
how could someone trigger unusually expensive calls?

AI prompt: Security audit your AI feature

text

Here is my AI feature's system prompt and API route code:

[SYSTEM PROMPT]
---
[paste your system prompt]
---

[API ROUTE CODE]
---
[paste your route.ts]
---

Act as a security engineer. Identify:
1. Prompt injection vectors - how could a user override my system prompt?
2. Data leakage risks - what sensitive data could appear in outputs?
3. Cost abuse vectors - how could a user trigger unusually expensive calls?
4. Missing input validation - what inputs are not being checked?
5. Missing output validation - what unexpected output formats could break my app?

For each issue found, suggest a specific fix in TypeScript.

8. Phase checklist

Before moving on, make sure the following are complete:

Chosen an AI provider based on task type, latency, and cost
Pinned the AI model in an environment variable instead of hardcoding it
Stored API keys in .env.local only and verified they are absent from source control
Routed all AI calls through a server-side endpoint
Centralised prompts in a reusable lib/prompts.ts file
Implemented streaming so users see output progressively
Added an error classifier with user-friendly messages
Added per-user rate limiting
Logged usage with tokens, duration, model, and success state
Set a monthly spend limit in the provider dashboard
Added input sanitisation and max-length validation
Reviewed prompt and route security with an audit prompt

9. What you should have at the end of Phase 02

A provider choice with a documented cost rationale
API keys secured and ready for deployment
A reusable and versioned prompt template system
A live streaming endpoint in Next.js App Router
A client-side stream reader with loading and error states
Typed AI error classification
Per-user rate limiting and spend monitoring
Guardrails against injection, PII leakage, and cost abuse

Up next: Phase 03 — Security

The next phase is tightening authentication, permissions, request validation, and system boundaries so the AI feature is not just useful, but safe to operate.

Building with AI: Phase 02 — AI Integration

Why AI integration fails in production

1. Choosing your AI provider

AI prompt: Provider selection

2. Securing API keys

Step 1: Store keys in .env.local

Step 2: Keep AI keys server-side only

Step 3: Add keys to your deployment platform

Step 4: Rotate keys on any suspected exposure

3. Designing your prompt system

AI prompt: System prompt generator

4. Building a streaming endpoint

5. Error handling and fallbacks

6. Monitoring usage and costs

7. Content moderation and guardrails

Step 1: Input sanitisation

Step 2: Prompt injection defence

Step 3: Output validation

Step 4: PII filtering

AI prompt: Security audit your AI feature

8. Phase checklist

9. What you should have at the end of Phase 02

Related reading

Building with AI: Phase 02 — AI Integration

Why AI integration fails in production

1. Choosing your AI provider

AI prompt: Provider selection

2. Securing API keys

Step 1: Store keys in .env.local

Step 2: Keep AI keys server-side only

Step 3: Add keys to your deployment platform

Step 4: Rotate keys on any suspected exposure

3. Designing your prompt system

AI prompt: System prompt generator

4. Building a streaming endpoint

5. Error handling and fallbacks

6. Monitoring usage and costs

7. Content moderation and guardrails

Step 1: Input sanitisation

Step 2: Prompt injection defence

Step 3: Output validation

Step 4: PII filtering

AI prompt: Security audit your AI feature

8. Phase checklist

9. What you should have at the end of Phase 02

Related reading

Step 1: Store keys in `.env.local`

Step 1: Store keys in `.env.local`