Tutorials

Building with AI: Phase 02 — AI Integration

Choose your provider, design a prompt system that scales, wire up streaming responses, and add the guardrails that make an AI feature production-safe.

Updated 2026-04-01

#ai#anthropic#streaming#next.js#security#prompt-engineering

BUILDING WITH AI — Phase 2: Integration

Phase 02 — AI Integration

Choose your provider, design a prompt system that scales, wire up streaming responses, and add the guardrails that make an AI feature production-safe.

SeriesBuilding with AI
PhasePhase 02 — AI Integration
Skill levelIntermediate
Est. time90-120 minutes
DeliverableWorking AI endpoint
AI prompts included5
PrerequisitePhase 01 — Planning & Architecture

Why AI integration fails in production

Most AI features fail not because the model is weak, but because the prompt is vague, the architecture is brittle, and nobody planned for the model to be wrong.

AI integration is the phase that separates products that feel magical from products that feel like demos. The difference is rarely the model itself. It is the system around the model: prompt structure, failure handling, cost control, and user protection.

This tutorial covers a full production-minded setup: provider selection, API key management, a reusable prompt template system, streaming responses, error handling, usage monitoring, and content moderation.

1. Choosing your AI provider

Your choice of provider is not mainly a technical decision. It is a product decision. The right provider depends on the job the feature performs, your latency target, your cost tolerance, and how much behavioural control you need.

Three providers cover most production use cases:

ProviderModelsBest forWhy
Anthropic Claudeclaude-sonnet-4-5, claude-haiku-4-5ReasoningStrong instruction-following, long context analysis, and structured output
OpenAIgpt-4o, gpt-4o-miniMultimodalStrong ecosystem, familiar API shape, good image understanding
Google Geminigemini-1.5-pro, gemini-flashLarge contextVery large context windows and strong cost efficiency for simpler tasks

Practical decision rule

Start with one provider. Wrap provider calls behind a thin abstraction from day one so you can swap later without rewriting your application logic.

On model versions

Never hardcode a model string into application logic. Pin the model in an environment variable such as AI_MODEL=claude-sonnet-4-5 so upgrades and deprecations remain configuration changes, not code changes.

AI prompt: Provider selection

text
My product: [describe in 2-3 sentences]

My AI feature specifically does: [e.g. "generates product descriptions from a title and 5 bullet points"]

My requirements:
- Expected calls per day: [number]
- Max acceptable latency: [e.g. "2 seconds"]
- Budget per 1,000 AI calls: [$X]
- Input size: [e.g. "~500 tokens per call"]
- Output size: [e.g. "~300 tokens per call"]

Compare Anthropic Claude Haiku, OpenAI gpt-4o-mini, and Gemini Flash for this use case.
Show a cost estimate per 1,000 calls for each. Recommend one with a clear reason.

2. Securing API keys

An exposed API key is not a minor inconvenience. It is a billing emergency. A leaked key in a public repository can be abused within minutes.

There is one rule: API keys belong in environment variables only. Never in source code, client-side JavaScript, comments, or logs.

Step 1: Store keys in .env.local

Create a .env.local file in the project root. Make sure it is ignored before your first commit.

Step 2: Keep AI keys server-side only

In Next.js, variables prefixed with NEXT_PUBLIC_ are exposed to the browser. AI provider keys must never use that prefix.

Step 3: Add keys to your deployment platform

Use the environment variable UI in your hosting platform. Never commit production keys.

Step 4: Rotate keys on any suspected exposure

If a key ever touches git history, revoke it immediately. Removing the commit is not enough.

env
# AI Provider Keys - server-side only, never NEXT_PUBLIC_
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

# Model config - change without code deploy
AI_MODEL=claude-haiku-4-5-20251001
AI_MAX_TOKENS=1024

# Rate limiting
AI_RATE_LIMIT_PER_USER_PER_HOUR=20

Common mistake: calling the AI API from the client

Never call the provider directly from browser code. All AI calls must go through your own server-side route, which holds the key and proxies the request.

3. Designing your prompt system

A prompt system is not a single string. It is a reusable structure that separates:

  • the system instruction
  • contextual data
  • the actual user message

The biggest prompt mistake is embedding one giant instruction string directly inside the API call. That makes prompts hard to test, version, or reuse.

The three-layer model

  • System prompt: defines role, tone, output format, and hard constraints
  • Context injection: validated server-side data about the user, session, or document
  • User message: the actual request, sanitised and never trusted to override system rules
typescript
// Centralised prompt registry - all prompts live here, not scattered in routes
export const prompts = {
  productDescription: (ctx: {
    productName: string
    bulletPoints: string[]
    tone: 'professional' | 'playful' | 'luxury'
  }) => `You are a product copywriter specialising in e-commerce listings.
Tone: ${ctx.tone}. Output format: 2-3 sentences, no bullet points, no markdown.
Hard limit: 80 words. Never invent specifications not in the bullet points.`,

  documentSummary: (ctx: { maxWords: number; audience: string }) =>
    `You are a document analyst.
Summarise the provided document in plain language for ${ctx.audience}.
Maximum ${ctx.maxWords} words. Use present tense. Output only the summary - no preamble.`,

  chatAssistant: (ctx: { userName: string; planTier: string }) =>
    `You are a helpful assistant for [Product Name].
The user's name is ${ctx.userName} and they are on the ${ctx.planTier} plan.
Answer concisely. If you don't know something, say so - never fabricate.
Never discuss competitors. Escalate billing questions to support@example.com.`,
}

Version prompts like code

Store prompt templates in a dedicated file such as lib/prompts.ts and commit them. Prompt changes should be diffable, reviewable, and reversible.

AI prompt: System prompt generator

text
I need a system prompt for an AI feature in my app.

Feature description: [what the AI does in one sentence]
Output format: [e.g. JSON, plain text, markdown, specific structure]
Tone: [e.g. professional, friendly, concise]
Hard constraints: [things the AI must never do]
User context available: [e.g. user's name, subscription tier, document they uploaded]

Write a system prompt following this structure:
1. Role definition (one sentence)
2. Behaviour rules (bullet list)
3. Output format specification (exact format with example)
4. Hard constraints (things it must never do)

Keep it under 200 words. Be specific, not vague.

Exercise 3.1: Design your prompt template

Deliverable: A first system prompt draft for your core AI feature.

Include:

  • what the feature does
  • the system prompt draft
  • the dynamic context you plan to inject

4. Building a streaming endpoint

Streaming is not a nice-to-have. It is what makes an AI feature feel alive instead of broken. Without it, users stare at a spinner. With it, they see progress in real time.

In Next.js App Router, streaming usually means a server-side ReadableStream and a client-side reader. The Vercel AI SDK can simplify this, but the core mechanics are still worth understanding.

typescript
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'
import { prompts } from '@/lib/prompts'
import { rateLimit } from '@/lib/rate-limit'
import { getServerSession } from 'next-auth'

const client = new Anthropic()

export async function POST(req: NextRequest) {
  const session = await getServerSession()
  if (!session) return new Response('Unauthorized', { status: 401 })

  const limited = await rateLimit(session.user.id)
  if (limited) return new Response('Rate limit exceeded', { status: 429 })

  const { userMessage } = await req.json()
  if (!userMessage || typeof userMessage !== 'string') {
    return new Response('Invalid input', { status: 400 })
  }

  const systemPrompt = prompts.chatAssistant({
    userName: session.user.name,
    planTier: session.user.plan,
  })

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const response = await client.messages.stream({
          model: process.env.AI_MODEL!,
          max_tokens: parseInt(process.env.AI_MAX_TOKENS || '1024'),
          system: systemPrompt,
          messages: [{ role: 'user', content: userMessage }],
        })

        for await (const chunk of response) {
          if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
            controller.enqueue(new TextEncoder().encode(chunk.delta.text))
          }
        }
        controller.close()
      } catch (error) {
        controller.error(error)
      }
    },
  })

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Transfer-Encoding': 'chunked',
      'X-Content-Type-Options': 'nosniff',
    },
  })
}
typescript
'use client'
import { useState } from 'react'

export function useAIStream() {
  const [output, setOutput] = useState('')
  const [loading, setLoading] = useState(false)
  const [error, setError] = useState<string | null>(null)

  const generate = async (userMessage: string) => {
    setOutput('')
    setLoading(true)
    setError(null)

    try {
      const res = await fetch('/api/ai/generate', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ userMessage }),
      })

      if (!res.ok) throw new Error(await res.text())

      const reader = res.body!.getReader()
      const decoder = new TextDecoder()

      while (true) {
        const { done, value } = await reader.read()
        if (done) break
        setOutput(prev => prev + decoder.decode(value, { stream: true }))
      }
    } catch (err) {
      setError('Something went wrong. Please try again.')
    } finally {
      setLoading(false)
    }
  }

  return { output, loading, error, generate }
}

5. Error handling and fallbacks

AI APIs fail. Models time out. Rate limits get hit. The network drops. Failure planning is part of the feature, not an optional extra.

Error typeHTTP statusCorrect response
Network timeoutRetry once after 1 second, then show a user-friendly error
Rate limited by provider429Exponential backoff, max 3 retries
Invalid API key401Alert ops immediately, do not expose details to users
Model overloaded529Queue the request or show a high-demand message
Max tokens exceeded400Truncate input or ask the user to shorten it
Content filtered400Explain that the request cannot be processed

Golden rule of AI error messages

Never show raw provider errors to users. Translate every error class into a clear, actionable user message and log the raw detail server-side.

typescript
export function classifyAIError(error: unknown): {
  userMessage: string
  shouldRetry: boolean
  retryAfterMs: number
} {
  if (error instanceof Anthropic.APIError) {
    switch (error.status) {
      case 429:
        return {
          userMessage: 'Our AI is busy right now. Retrying...',
          shouldRetry: true,
          retryAfterMs: parseInt(error.headers?.['retry-after'] || '2') * 1000,
        }
      case 529:
        return {
          userMessage: 'High demand right now. Please try again in a moment.',
          shouldRetry: true,
          retryAfterMs: 5000,
        }
      default:
        return {
          userMessage: 'Something went wrong with the AI. Please try again.',
          shouldRetry: false,
          retryAfterMs: 0,
        }
    }
  }
  return {
    userMessage: 'An unexpected error occurred. Please try again.',
    shouldRetry: false,
    retryAfterMs: 0,
  }
}

6. Monitoring usage and costs

AI costs are variable and can spike unexpectedly. A single abusive input pattern or retry bug can generate a surprising bill overnight. Visibility has to exist before the problem happens.

Log every AI call with enough metadata to answer:

  • who triggered it
  • how many tokens were used
  • how long it took
  • whether it succeeded
typescript
interface AICallLog {
  userId: string
  feature: string
  model: string
  inputTokens: number
  outputTokens: number
  durationMs: number
  success: boolean
  errorCode?: string
}

export async function logAICall(log: AICallLog) {
  await db.aiUsage.create({ data: { ...log, timestamp: new Date() } })

  const totalTokens = log.inputTokens + log.outputTokens
  if (totalTokens > 4000) {
    console.warn(`High token usage: ${totalTokens} tokens for user ${log.userId}`)
  }
}

export function estimateCostUSD(model: string, input: number, output: number): number {
  const rates: Record<string, { in: number; out: number }> = {
    'claude-haiku-4-5-20251001': { in: 0.80, out: 4.00 },
    'claude-sonnet-4-5': { in: 3.00, out: 15.00 },
    'gpt-4o-mini': { in: 0.15, out: 0.60 },
  }
  const r = rates[model] || { in: 0, out: 0 }
  return (input * r.in + output * r.out) / 1_000_000
}

Set a hard spend limit immediately

Every major AI provider supports monthly spend caps. Set one before you have live traffic. A modest MVP limit is better than discovering an uncontrolled bill after the fact.

7. Content moderation and guardrails

If users can submit free-form text, they will eventually send something they should not. A system prompt alone is not enough protection. Guardrails must be independent of the model.

Step 1: Input sanitisation

Strip HTML, cap character length, and validate input shape before anything reaches the model.

Step 2: Prompt injection defence

Never concatenate raw user input into your system prompt. Keep system instructions and user content structurally separate.

Step 3: Output validation

If you expect JSON or another structured format, validate it before using it. Models will occasionally produce malformed output.

Step 4: PII filtering

Before logging or storing outputs, scan for personally identifiable information and redact where appropriate.

AI Assist: Guardrail review

Once your system prompt and route code exist, ask an AI to review them like a security engineer. Specifically ask:

  • how could a user manipulate the prompt?
  • what sensitive data could leak?
  • how could someone trigger unusually expensive calls?

AI prompt: Security audit your AI feature

text
Here is my AI feature's system prompt and API route code:

[SYSTEM PROMPT]
---
[paste your system prompt]
---

[API ROUTE CODE]
---
[paste your route.ts]
---

Act as a security engineer. Identify:
1. Prompt injection vectors - how could a user override my system prompt?
2. Data leakage risks - what sensitive data could appear in outputs?
3. Cost abuse vectors - how could a user trigger unusually expensive calls?
4. Missing input validation - what inputs are not being checked?
5. Missing output validation - what unexpected output formats could break my app?

For each issue found, suggest a specific fix in TypeScript.

8. Phase checklist

Before moving on, make sure the following are complete:

  • Chosen an AI provider based on task type, latency, and cost
  • Pinned the AI model in an environment variable instead of hardcoding it
  • Stored API keys in .env.local only and verified they are absent from source control
  • Routed all AI calls through a server-side endpoint
  • Centralised prompts in a reusable lib/prompts.ts file
  • Implemented streaming so users see output progressively
  • Added an error classifier with user-friendly messages
  • Added per-user rate limiting
  • Logged usage with tokens, duration, model, and success state
  • Set a monthly spend limit in the provider dashboard
  • Added input sanitisation and max-length validation
  • Reviewed prompt and route security with an audit prompt

9. What you should have at the end of Phase 02

  • A provider choice with a documented cost rationale
  • API keys secured and ready for deployment
  • A reusable and versioned prompt template system
  • A live streaming endpoint in Next.js App Router
  • A client-side stream reader with loading and error states
  • Typed AI error classification
  • Per-user rate limiting and spend monitoring
  • Guardrails against injection, PII leakage, and cost abuse

Up next: Phase 03 — Security

The next phase is tightening authentication, permissions, request validation, and system boundaries so the AI feature is not just useful, but safe to operate.

Related reading

PreviousBuilding with AI: Phase 01 — Planning & Architecture