Devoured - April 23, 2026
The AI-native interview (6 minute read)

The AI-native interview (6 minute read)

Tech Read original

Sierra redesigned its engineering interviews to let candidates actually build software with AI coding assistants during the interview, replacing traditional whiteboard coding tests.

What: Sierra replaced traditional coding and algorithms interviews with an "AI-native onsite" where candidates spend two hours building a real product using AI tools of their choice, bookended by planning and demo sessions. They also swapped coding phone screens for system design interviews and are piloting a debugging interview using existing codebases.
Why it matters: As coding agents handle more implementation details, engineering leverage now comes from product thinking, scope definition, and technical judgment rather than syntax recall. Traditional interviews that test typing code into editors without AI assistance no longer reflect day-to-day work where a single engineer can build across the stack with AI assistance.
Takeaway: If interviewing at companies adopting this approach, practice scoping projects that can be completed in 2 hours, get comfortable pivoting mid-build, and focus on demonstrating product thinking and technical judgment over perfect implementation.
Deep dive
  • Sierra's traditional interview process (two coding rounds, algorithms, system design, culture fit) started feeling disconnected from actual engineering work in the age of AI coding agents
  • The new AI-native onsite has three phases: Plan (working session to define what to build), Build (2 uninterrupted hours using any AI tools and frameworks), and Review (demo, code review, and discussion of technical choices)
  • Candidates receive evaluation criteria in advance, including guidance that it's acceptable to cut scope, skip boilerplate like CRUD and auth, and pivot when stuck
  • The phone screen shifted from coding without AI to system design interviews, since "vibe-coding an app is easy" but getting it to production scalably is the harder, more relevant problem
  • A new debugging interview is being piloted where candidates review and improve a draft PR in a medium-sized codebase, though the role of AI in this interview is still being determined as models improve
  • The process shifts hiring decisions from "absence of weakness" to "hiring for strengths," producing richer signal about where candidates excel and where they need support
  • Candidate feedback has been positive, with one person building an AI-powered flow-state trivia game and another creating a headless simulation tool with an agent-driven markdown demo
  • Challenges include difficulty standardizing open-ended interviews, mitigated by framework-agnostic evaluation criteria and conducting interviews in pairs for better calibration
  • The approach applies to infrastructure roles too, as infrastructure engineers increasingly build full-stack tools and agents with direct customer integration
Decoder
  • Coding agents: AI assistants like Codex and Claude Code that write code based on natural language instructions or pair with developers to implement features
  • 0->1 vs 1->N: Product development phases where 0->1 means building something from scratch, while 1->N means scaling an existing feature to handle more users, edge cases, or complexity
  • Vibe-coding: Quickly building a prototype or demo app, often with heavy AI assistance, focused more on getting something working than production-ready code
Original article

Coding agents like Codex and Claude Code are upending software engineering as we know it. The role is shifting from building the machine to designing and honing it. Much like engineers stopped worrying about how a compiler translates code into machine instructions, we now need to focus less on the precise lines of code that are written and more about whether it produces the right outcomes over time.

This shifts what we should evaluate in interviews. When a single engineer can build across the stack, leverage comes from combining technical ability with product thinking and business context. They don't just write code. They define scope, make tradeoffs, and iterate with customers to deliver impact. We've redesigned our engineering interview process from the ground up to reflect this new reality.

Framing the problem

Sierra's engineering interview process had been fairly standard: two coding interviews plus interviews for algorithms, system design, and culture fit, followed by reference checks. It's a well-understood, scalable approach, and for a long time it worked.

But recently, something started to feel off. Much of the signal we got from this interview was about mechanics; typing syntax into an editor, remembering algorithm details, stitching frameworks together. This felt increasingly dissonant from the new reality of our work. The gap showed up most clearly in debriefs. In the absence of clear interview signals, hiring managers leaned more heavily on referrals and prior experience.

We started building an AI-native interview process with three key attributes:

  • Representative: Reflects the work engineers actually do day to day, capturing initiative, ownership, judgment, system understanding, and product thinking.
  • High signal: Gives us clarity about where a candidate could excel, and where they may need support.
  • Positive experience: Feels engaging and authentic for candidates, so that when we make an offer, they're excited to say yes.

Introducing the AI-native onsite

We removed our coding and algorithms interviews and replaced them with an AI-native onsite:

  • Plan: A working session with the candidate to define a product to build. The candidate drives ideation, while interviewers ask questions to strengthen it. We focus on an idea in the candidate's domain so we see their product thinking in action.
  • Build: The interviewer steps out and the candidate brings the idea to life over 2 hours, using the AI tooling and frameworks of their choice. They have complete freedom to pivot or adjust scope as they go.
  • Review: The candidate demos what they've built. We debate the key product flows and choices they made; review the code to understand their technical judgment (data model, abstractions, extensibility, etc.); and discuss the path to production. We also dig into how they used AI along the way.

We've found this new format to be much more effective. Because candidates can actually build during the onsite — rather than just talk about what they might build — it's both more representative of the work, and produces higher signal. It's much easier to gauge their agency (do they pivot when they get stuck?) and judgement (how do they scope what to build within the time constraints?).

It's also more engaging, even if candidates are nervous at the start. To set expectations, we share evaluation criteria and advice ahead of time. For example, it's OK to cut your scope as you build, and to skip boilerplate (CRUD, auth) to focus on what's unique. As Paul Buchheit, the creator of Gmail, put it: if it's great, it doesn't have to be good.

Rounding out the rest of the process

As we've honed the AI-native onsite, we've also rethought the rest of the interview process. Our coding phone screen still required candidates to write code without AI into an online editor. But vibe-coding an app is easy. The harder, more relevant, problem is getting it into production in a scalable way. So we replaced the phone screen with a system design interview to better reflect that.

While the AI-native onsite tests for product sense and building 0->1, it doesn't capture taking a feature from 1->N in an existing, messy codebase. To address this, we're piloting a debugging interview. Candidates are given a medium-sized codebase and a draft PR from a colleague that introduces a cross-cutting feature. Their job is to review and improve it — pulling down the code, inspecting the output, and iterating with coding agents to make it better. The level of AI used in this interview is still TBD, as new models can zero-shot many fixes.

What did we learn?

We're hiring for strengths, not just an absence of weakness. This approach gives us much richer signal about a candidate's spikes and gaps. For example, some people excel at product strategy and initiative but have holes in their system understanding. Our debriefs have shifted from "should we hire this person?" to "where would this person thrive, and how do we support them?"

We ask every candidate for feedback, and many have said this was the most fun they've had in an interview. One trivia enthusiast built an AI-powered game intended to keep the user in a state of flow — the demo just involved the interviewer playing it. In another case, a backend engineer built a headless simulation tool and used an agent with a markdown file to walk through the demo.

This format isn't without challenges. It's open-ended, which makes it harder to standardize. To mitigate this, we've developed a set of evaluation criteria that are agnostic to what the candidate builds, and we run interviews in pairs to improve calibration. We also debated whether this approach applies for infrastructure, and concluded that it does — many infrastructure engineers now build full-stack tools or agents and work closely with product to vertically integrate with what customers need. That said, we've amended the interview slightly to better capture the signal we need for infrastructure.

The emergence of highly proficient coding agents is forcing us to reimagine Sierra from the ground up — from how we build (using agents to build and optimize agents with Ghostwriter), to how we hire. Given the pace of change, this is just the beginning. And yes, we're still hiring. So if you're interested in helping build this with us, learn more here: sierra.ai/careers.