all articles
interviewsai-engineerrag

AI Engineer Interview Questions (2026): RAG, Agents & System Design

Real AI engineer interview questions from OpenAI, Anthropic, Google, and Apple — covering RAG, agents, LLM system design, and AI PM cases — plus what strong answers actually demonstrate.

The landed. team·Jun 19, 2026·3 min read

AI engineer interviews in 2026 are roughly 75% generative AI — RAG, evals, production prompting, and agent design — and only 25% classic ML. The questions stopped asking "what is X" and started asking "what would you do when X breaks." Here are real questions from top companies and what strong answers demonstrate.

What's the interview structure?

Most AI engineer loops run 4–6 rounds over 2–4 weeks:

  1. Recruiter screen.
  2. Technical phone screen — LLM fundamentals + coding.
  3. One or two deep technical rounds — RAG, system design.
  4. Behavioral / culture round.
  5. Sometimes a take-home: build an AI feature or RAG app (increasingly AI-allowed).

At Anthropic specifically, the loop opens with a 90-minute, two-problem online assessment, with a reported pass band around a 590–600 CodeSignal score.

Real LLM system design questions (by company)

These are pulled from documented 2026 interview catalogs:

CompanyQuestion
OpenAI"Design ChatGPT." / "Design a scalable system for training an LLM."
Anthropic"Design our Claude chat service." / "How would you minimize harmful outputs while keeping the model useful and expressive?" / "Review a junior dev's design for an inference batching system."
Google"Design a small language model that runs on a phone while staying polite."
Apple"What is a KV cache and how does it help inference?" / "Walk me through a RAG project you've built."
Cohere"Design a model that solves math problems — data collection, SFT, post-training, evaluation."
Salesforce"Architect an AI agent system: the agent loop, tool interfaces, memory design, orchestration, and safety."

Notice the pattern: every one is about decisions under constraints, not definitions.

The operational questions everyone asks

Beyond design, expect scale-and-reliability questions:

  • "How would you handle traffic spikes without overwhelming the model provider?"
  • "How do you think about cost and capacity planning for an LLM app at scale?"
  • "How would you design the UX for an assistant that is often slow?"
  • "How do you surface model errors to users without breaking trust?"

RAG questions and the tradeoffs that matter

RAG is the single most-tested system. Be ready to design one end-to-end — ingestion, chunking, embeddings, retrieval, generation, evals, tracing, guardrails — and to name the core tradeoffs:

  • Latency vs. accuracy
  • Chunk size vs. context
  • Cost vs. quality

A strong answer doesn't just list components; it explains which knob you'd turn for this use case and why.

Agent system design

Agents moved from research to production fast, and interviewers want candidates who understand what breaks. Expect: "design a multi-step agent that does X — the loop, tool use, memory, failure handling, and guardrails." The signal they're listening for: you've shipped one and can talk about what went wrong, not just what agents "could" do.

AI PM questions are a different bank

If you're interviewing for AI product management, the mix is distinct. One coach who placed 47 people into $300K+ AI PM roles breaks it down as: AI Product Sense 20%, AI Execution 15%, Technical Depth 15%, Behavioral 35%, Presentation 10%. Sample questions:

  • "How would you prioritize making the current model cheaper vs. investing in the next one?" (product sense)
  • "Your chatbot's response quality dropped 15% last week. How do you debug it?" (execution)
  • "Explain embeddings to a product designer." (technical depth)
  • "Tell me about a time you shipped an AI feature that backfired." (behavioral)

What "good" looks like

Across roles, interviewers reward the same four things:

  1. Problem decomposition — turning a vague prompt into crisp, measurable goals.
  2. LLM-aware architecture — knowing when to use a model and when a deterministic system is better.
  3. Designing for non-determinism — UX and guardrails that assume the model is sometimes wrong.
  4. Back-of-the-envelope math — requests per second, latency budgets, token costs.

In short: frameworks, not memorized answers. The fastest way to internalize them is to practice answering out loud and get feedback — which is exactly what landed. does.


landed. drills these exact questions in realistic mock interviews and tells you where your answers fall short — the feedback no chatbot will give you. Run a mock interview →

Sources: igotanoffer (Generative AI System Design; Anthropic interview process); Aakash Gupta (47 AI PM placements); DataCamp RAG questions; Medium / Adil Shamim (100+ AI engineer interviews); Towards AI.

Ready to land it?

Landed scores your readiness against real AI-native roles and drills the interview until you walk in ready.

See where you stand