AI Engineer Interview Questions (2026): RAG, Agents & System Design
Real AI engineer interview questions from OpenAI, Anthropic, Google, and Apple — covering RAG, agents, LLM system design, and AI PM cases — plus what strong answers actually demonstrate.

AI engineer interviews in 2026 are roughly 75% generative AI — RAG, evals, production prompting, and agent design — and only 25% classic ML. The questions stopped asking "what is X" and started asking "what would you do when X breaks." Here are real questions from top companies and what strong answers demonstrate.
What's the interview structure?
Most AI engineer loops run 4–6 rounds over 2–4 weeks:
- Recruiter screen.
- Technical phone screen — LLM fundamentals + coding.
- One or two deep technical rounds — RAG, system design.
- Behavioral / culture round.
- Sometimes a take-home: build an AI feature or RAG app (increasingly AI-allowed).
At Anthropic specifically, the loop opens with a 90-minute, two-problem online assessment, with a reported pass band around a 590–600 CodeSignal score.
Real LLM system design questions (by company)
These are pulled from documented 2026 interview catalogs:
| Company | Question |
|---|---|
| OpenAI | "Design ChatGPT." / "Design a scalable system for training an LLM." |
| Anthropic | "Design our Claude chat service." / "How would you minimize harmful outputs while keeping the model useful and expressive?" / "Review a junior dev's design for an inference batching system." |
| "Design a small language model that runs on a phone while staying polite." | |
| Apple | "What is a KV cache and how does it help inference?" / "Walk me through a RAG project you've built." |
| Cohere | "Design a model that solves math problems — data collection, SFT, post-training, evaluation." |
| Salesforce | "Architect an AI agent system: the agent loop, tool interfaces, memory design, orchestration, and safety." |
Notice the pattern: every one is about decisions under constraints, not definitions.
The operational questions everyone asks
Beyond design, expect scale-and-reliability questions:
- "How would you handle traffic spikes without overwhelming the model provider?"
- "How do you think about cost and capacity planning for an LLM app at scale?"
- "How would you design the UX for an assistant that is often slow?"
- "How do you surface model errors to users without breaking trust?"
RAG questions and the tradeoffs that matter
RAG is the single most-tested system. Be ready to design one end-to-end — ingestion, chunking, embeddings, retrieval, generation, evals, tracing, guardrails — and to name the core tradeoffs:
- Latency vs. accuracy
- Chunk size vs. context
- Cost vs. quality
A strong answer doesn't just list components; it explains which knob you'd turn for this use case and why.
Agent system design
Agents moved from research to production fast, and interviewers want candidates who understand what breaks. Expect: "design a multi-step agent that does X — the loop, tool use, memory, failure handling, and guardrails." The signal they're listening for: you've shipped one and can talk about what went wrong, not just what agents "could" do.
AI PM questions are a different bank
If you're interviewing for AI product management, the mix is distinct. One coach who placed 47 people into $300K+ AI PM roles breaks it down as: AI Product Sense 20%, AI Execution 15%, Technical Depth 15%, Behavioral 35%, Presentation 10%. Sample questions:
- "How would you prioritize making the current model cheaper vs. investing in the next one?" (product sense)
- "Your chatbot's response quality dropped 15% last week. How do you debug it?" (execution)
- "Explain embeddings to a product designer." (technical depth)
- "Tell me about a time you shipped an AI feature that backfired." (behavioral)
What "good" looks like
Across roles, interviewers reward the same four things:
- Problem decomposition — turning a vague prompt into crisp, measurable goals.
- LLM-aware architecture — knowing when to use a model and when a deterministic system is better.
- Designing for non-determinism — UX and guardrails that assume the model is sometimes wrong.
- Back-of-the-envelope math — requests per second, latency budgets, token costs.
In short: frameworks, not memorized answers. The fastest way to internalize them is to practice answering out loud and get feedback — which is exactly what landed. does.
landed. drills these exact questions in realistic mock interviews and tells you where your answers fall short — the feedback no chatbot will give you. Run a mock interview →
Sources: igotanoffer (Generative AI System Design; Anthropic interview process); Aakash Gupta (47 AI PM placements); DataCamp RAG questions; Medium / Adil Shamim (100+ AI engineer interviews); Towards AI.
Ready to land it?
Landed scores your readiness against real AI-native roles and drills the interview until you walk in ready.
See where you stand