BUILDING WITH AI — Phase 2: Integration
Phase 02 — AI Integration
Choose your provider, design a prompt system that scales, wire up streaming responses, and add the guardrails that make an AI feature production-safe.
| Series | Building with AI |
| Phase | Phase 02 — AI Integration |
| Skill level | Intermediate |
| Est. time | 90-120 minutes |
| Deliverable | Working AI endpoint |
| AI prompts included | 5 |
| Prerequisite | Phase 01 — Planning & Architecture |
Why AI integration fails in production
Most AI features fail not because the model is weak, but because the prompt is vague, the architecture is brittle, and nobody planned for the model to be wrong.
AI integration is the phase that separates products that feel magical from products that feel like demos. The difference is rarely the model itself. It is the system around the model: prompt structure, failure handling, cost control, and user protection.
This tutorial covers a full production-minded setup: provider selection, API key management, a reusable prompt template system, streaming responses, error handling, usage monitoring, and content moderation.
1. Choosing your AI provider
Your choice of provider is not mainly a technical decision. It is a product decision. The right provider depends on the job the feature performs, your latency target, your cost tolerance, and how much behavioural control you need.
Three providers cover most production use cases:
| Provider | Models | Best for | Why |
|---|---|---|---|
| Anthropic Claude | claude-sonnet-4-5, claude-haiku-4-5 | Reasoning | Strong instruction-following, long context analysis, and structured output |
| OpenAI | gpt-4o, gpt-4o-mini | Multimodal | Strong ecosystem, familiar API shape, good image understanding |
| Google Gemini | gemini-1.5-pro, gemini-flash | Large context | Very large context windows and strong cost efficiency for simpler tasks |
Practical decision rule
Start with one provider. Wrap provider calls behind a thin abstraction from day one so you can swap later without rewriting your application logic.
On model versions
Never hardcode a model string into application logic. Pin the model in an environment variable such as AI_MODEL=claude-sonnet-4-5 so upgrades and deprecations remain configuration changes, not code changes.
AI prompt: Provider selection
My product: [describe in 2-3 sentences]
My AI feature specifically does: [e.g. "generates product descriptions from a title and 5 bullet points"]
My requirements:
- Expected calls per day: [number]
- Max acceptable latency: [e.g. "2 seconds"]
- Budget per 1,000 AI calls: [$X]
- Input size: [e.g. "~500 tokens per call"]
- Output size: [e.g. "~300 tokens per call"]
Compare Anthropic Claude Haiku, OpenAI gpt-4o-mini, and Gemini Flash for this use case.
Show a cost estimate per 1,000 calls for each. Recommend one with a clear reason.2. Securing API keys
An exposed API key is not a minor inconvenience. It is a billing emergency. A leaked key in a public repository can be abused within minutes.
There is one rule: API keys belong in environment variables only. Never in source code, client-side JavaScript, comments, or logs.
Step 1: Store keys in .env.local
Create a .env.local file in the project root. Make sure it is ignored before your first commit.
Step 2: Keep AI keys server-side only
In Next.js, variables prefixed with NEXT_PUBLIC_ are exposed to the browser. AI provider keys must never use that prefix.
Step 3: Add keys to your deployment platform
Use the environment variable UI in your hosting platform. Never commit production keys.
Step 4: Rotate keys on any suspected exposure
If a key ever touches git history, revoke it immediately. Removing the commit is not enough.
# AI Provider Keys - server-side only, never NEXT_PUBLIC_
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# Model config - change without code deploy
AI_MODEL=claude-haiku-4-5-20251001
AI_MAX_TOKENS=1024
# Rate limiting
AI_RATE_LIMIT_PER_USER_PER_HOUR=20Common mistake: calling the AI API from the client
Never call the provider directly from browser code. All AI calls must go through your own server-side route, which holds the key and proxies the request.
3. Designing your prompt system
A prompt system is not a single string. It is a reusable structure that separates:
- the system instruction
- contextual data
- the actual user message
The biggest prompt mistake is embedding one giant instruction string directly inside the API call. That makes prompts hard to test, version, or reuse.
The three-layer model
- System prompt: defines role, tone, output format, and hard constraints
- Context injection: validated server-side data about the user, session, or document
- User message: the actual request, sanitised and never trusted to override system rules
// Centralised prompt registry - all prompts live here, not scattered in routes
export const prompts = {
productDescription: (ctx: {
productName: string
bulletPoints: string[]
tone: 'professional' | 'playful' | 'luxury'
}) => `You are a product copywriter specialising in e-commerce listings.
Tone: ${ctx.tone}. Output format: 2-3 sentences, no bullet points, no markdown.
Hard limit: 80 words. Never invent specifications not in the bullet points.`,
documentSummary: (ctx: { maxWords: number; audience: string }) =>
`You are a document analyst.
Summarise the provided document in plain language for ${ctx.audience}.
Maximum ${ctx.maxWords} words. Use present tense. Output only the summary - no preamble.`,
chatAssistant: (ctx: { userName: string; planTier: string }) =>
`You are a helpful assistant for [Product Name].
The user's name is ${ctx.userName} and they are on the ${ctx.planTier} plan.
Answer concisely. If you don't know something, say so - never fabricate.
Never discuss competitors. Escalate billing questions to support@example.com.`,
}Version prompts like code
Store prompt templates in a dedicated file such as lib/prompts.ts and commit them. Prompt changes should be diffable, reviewable, and reversible.
AI prompt: System prompt generator
I need a system prompt for an AI feature in my app.
Feature description: [what the AI does in one sentence]
Output format: [e.g. JSON, plain text, markdown, specific structure]
Tone: [e.g. professional, friendly, concise]
Hard constraints: [things the AI must never do]
User context available: [e.g. user's name, subscription tier, document they uploaded]
Write a system prompt following this structure:
1. Role definition (one sentence)
2. Behaviour rules (bullet list)
3. Output format specification (exact format with example)
4. Hard constraints (things it must never do)
Keep it under 200 words. Be specific, not vague.Exercise 3.1: Design your prompt template
Deliverable: A first system prompt draft for your core AI feature.
Include:
- what the feature does
- the system prompt draft
- the dynamic context you plan to inject
4. Building a streaming endpoint
Streaming is not a nice-to-have. It is what makes an AI feature feel alive instead of broken. Without it, users stare at a spinner. With it, they see progress in real time.
In Next.js App Router, streaming usually means a server-side ReadableStream and a client-side reader. The Vercel AI SDK can simplify this, but the core mechanics are still worth understanding.
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'
import { prompts } from '@/lib/prompts'
import { rateLimit } from '@/lib/rate-limit'
import { getServerSession } from 'next-auth'
const client = new Anthropic()
export async function POST(req: NextRequest) {
const session = await getServerSession()
if (!session) return new Response('Unauthorized', { status: 401 })
const limited = await rateLimit(session.user.id)
if (limited) return new Response('Rate limit exceeded', { status: 429 })
const { userMessage } = await req.json()
if (!userMessage || typeof userMessage !== 'string') {
return new Response('Invalid input', { status: 400 })
}
const systemPrompt = prompts.chatAssistant({
userName: session.user.name,
planTier: session.user.plan,
})
const stream = new ReadableStream({
async start(controller) {
try {
const response = await client.messages.stream({
model: process.env.AI_MODEL!,
max_tokens: parseInt(process.env.AI_MAX_TOKENS || '1024'),
system: systemPrompt,
messages: [{ role: 'user', content: userMessage }],
})
for await (const chunk of response) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
controller.enqueue(new TextEncoder().encode(chunk.delta.text))
}
}
controller.close()
} catch (error) {
controller.error(error)
}
},
})
return new Response(stream, {
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Transfer-Encoding': 'chunked',
'X-Content-Type-Options': 'nosniff',
},
})
}'use client'
import { useState } from 'react'
export function useAIStream() {
const [output, setOutput] = useState('')
const [loading, setLoading] = useState(false)
const [error, setError] = useState<string | null>(null)
const generate = async (userMessage: string) => {
setOutput('')
setLoading(true)
setError(null)
try {
const res = await fetch('/api/ai/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ userMessage }),
})
if (!res.ok) throw new Error(await res.text())
const reader = res.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
setOutput(prev => prev + decoder.decode(value, { stream: true }))
}
} catch (err) {
setError('Something went wrong. Please try again.')
} finally {
setLoading(false)
}
}
return { output, loading, error, generate }
}5. Error handling and fallbacks
AI APIs fail. Models time out. Rate limits get hit. The network drops. Failure planning is part of the feature, not an optional extra.
| Error type | HTTP status | Correct response |
|---|---|---|
| Network timeout | — | Retry once after 1 second, then show a user-friendly error |
| Rate limited by provider | 429 | Exponential backoff, max 3 retries |
| Invalid API key | 401 | Alert ops immediately, do not expose details to users |
| Model overloaded | 529 | Queue the request or show a high-demand message |
| Max tokens exceeded | 400 | Truncate input or ask the user to shorten it |
| Content filtered | 400 | Explain that the request cannot be processed |
Golden rule of AI error messages
Never show raw provider errors to users. Translate every error class into a clear, actionable user message and log the raw detail server-side.
export function classifyAIError(error: unknown): {
userMessage: string
shouldRetry: boolean
retryAfterMs: number
} {
if (error instanceof Anthropic.APIError) {
switch (error.status) {
case 429:
return {
userMessage: 'Our AI is busy right now. Retrying...',
shouldRetry: true,
retryAfterMs: parseInt(error.headers?.['retry-after'] || '2') * 1000,
}
case 529:
return {
userMessage: 'High demand right now. Please try again in a moment.',
shouldRetry: true,
retryAfterMs: 5000,
}
default:
return {
userMessage: 'Something went wrong with the AI. Please try again.',
shouldRetry: false,
retryAfterMs: 0,
}
}
}
return {
userMessage: 'An unexpected error occurred. Please try again.',
shouldRetry: false,
retryAfterMs: 0,
}
}6. Monitoring usage and costs
AI costs are variable and can spike unexpectedly. A single abusive input pattern or retry bug can generate a surprising bill overnight. Visibility has to exist before the problem happens.
Log every AI call with enough metadata to answer:
- who triggered it
- how many tokens were used
- how long it took
- whether it succeeded
interface AICallLog {
userId: string
feature: string
model: string
inputTokens: number
outputTokens: number
durationMs: number
success: boolean
errorCode?: string
}
export async function logAICall(log: AICallLog) {
await db.aiUsage.create({ data: { ...log, timestamp: new Date() } })
const totalTokens = log.inputTokens + log.outputTokens
if (totalTokens > 4000) {
console.warn(`High token usage: ${totalTokens} tokens for user ${log.userId}`)
}
}
export function estimateCostUSD(model: string, input: number, output: number): number {
const rates: Record<string, { in: number; out: number }> = {
'claude-haiku-4-5-20251001': { in: 0.80, out: 4.00 },
'claude-sonnet-4-5': { in: 3.00, out: 15.00 },
'gpt-4o-mini': { in: 0.15, out: 0.60 },
}
const r = rates[model] || { in: 0, out: 0 }
return (input * r.in + output * r.out) / 1_000_000
}Set a hard spend limit immediately
Every major AI provider supports monthly spend caps. Set one before you have live traffic. A modest MVP limit is better than discovering an uncontrolled bill after the fact.
7. Content moderation and guardrails
If users can submit free-form text, they will eventually send something they should not. A system prompt alone is not enough protection. Guardrails must be independent of the model.
Step 1: Input sanitisation
Strip HTML, cap character length, and validate input shape before anything reaches the model.
Step 2: Prompt injection defence
Never concatenate raw user input into your system prompt. Keep system instructions and user content structurally separate.
Step 3: Output validation
If you expect JSON or another structured format, validate it before using it. Models will occasionally produce malformed output.
Step 4: PII filtering
Before logging or storing outputs, scan for personally identifiable information and redact where appropriate.
AI Assist: Guardrail review
Once your system prompt and route code exist, ask an AI to review them like a security engineer. Specifically ask:
- how could a user manipulate the prompt?
- what sensitive data could leak?
- how could someone trigger unusually expensive calls?
AI prompt: Security audit your AI feature
Here is my AI feature's system prompt and API route code:
[SYSTEM PROMPT]
---
[paste your system prompt]
---
[API ROUTE CODE]
---
[paste your route.ts]
---
Act as a security engineer. Identify:
1. Prompt injection vectors - how could a user override my system prompt?
2. Data leakage risks - what sensitive data could appear in outputs?
3. Cost abuse vectors - how could a user trigger unusually expensive calls?
4. Missing input validation - what inputs are not being checked?
5. Missing output validation - what unexpected output formats could break my app?
For each issue found, suggest a specific fix in TypeScript.8. Phase checklist
Before moving on, make sure the following are complete:
- Chosen an AI provider based on task type, latency, and cost
- Pinned the AI model in an environment variable instead of hardcoding it
- Stored API keys in
.env.localonly and verified they are absent from source control - Routed all AI calls through a server-side endpoint
- Centralised prompts in a reusable
lib/prompts.tsfile - Implemented streaming so users see output progressively
- Added an error classifier with user-friendly messages
- Added per-user rate limiting
- Logged usage with tokens, duration, model, and success state
- Set a monthly spend limit in the provider dashboard
- Added input sanitisation and max-length validation
- Reviewed prompt and route security with an audit prompt
9. What you should have at the end of Phase 02
- A provider choice with a documented cost rationale
- API keys secured and ready for deployment
- A reusable and versioned prompt template system
- A live streaming endpoint in Next.js App Router
- A client-side stream reader with loading and error states
- Typed AI error classification
- Per-user rate limiting and spend monitoring
- Guardrails against injection, PII leakage, and cost abuse
Up next: Phase 03 — Security
The next phase is tightening authentication, permissions, request validation, and system boundaries so the AI feature is not just useful, but safe to operate.