What Ai Product Engineer Actually Does
Last month I turned down a PM candidate who had 7 years of experience at top-tier tech companies. Great communicator. Sharp strategy thinker. Flawless case study presentation.
I passed because when I asked how they'd evaluate whether our AI feature was actually working, they said, "I'd look at user engagement metrics and gather qualitative feedback."
That's not wrong. It's just not enough anymore. Not even close.
The PM role as you know it is splitting in two. There's the traditional PM โ still valuable, still needed โ who manages roadmaps, aligns stakeholders, and writes specs. And then there's the AI Product Engineer โ someone who does all of that and can build, evaluate, and ship AI features with their own hands.
I'm that second person. I run an AI platform at a $7B+ SaaS company that saves our customers over 800,000 hours. I hire AI PMs. I build with code daily. And I'm telling you: the gap between these two roles is the single biggest career opportunity in product right now.
Let me explain what the job actually looks like.
The Old Model Is Dying
Here's how PM used to work: You talk to customers. You identify problems. You write a PRD. You hand it to engineering. You manage the process. You ship. You measure.
That model assumed deterministic software. You spec a button, engineering builds the button, the button works the way you specced it. Done.
AI doesn't work like that.
When I shipped our first large-scale AI feature โ a system that automatically generates personalized content for millions of end users โ there was no spec I could write that would guarantee the output was good. The model might hallucinate. It might be technically accurate but tonally wrong. It might work beautifully for one segment and completely fail for another.
The PM who just writes a spec and hands it off is now the PM who ships broken AI features. Because they can't evaluate whether the output is actually good. They can't tell engineering what good looks like in a way that's testable. They're flying blind.
The AI Product Engineer doesn't fly blind. They build the instruments.
What the Job Actually Looks Like
A typical week for me involves:
Monday: I'm reviewing eval results from a model migration we're testing. We're comparing GPT-4o against Claude on a specific task. I'm not waiting for engineering to tell me which is better โ I wrote the eval criteria, I'm reading the outputs, I'm flagging where one model handles edge cases the other misses.
Tuesday: I'm prototyping a new feature in code. Not production code โ a quick prototype that proves the concept works before we invest engineering cycles. I can spin up an API call, chain a few prompts together, and show my team a working demo in hours instead of waiting weeks for a spec-to-build cycle.
Wednesday: I'm in a cost review. Our AI features process millions of requests. I know the per-token economics of every model we use. I'm making the call on whether we use a frontier model for complex tasks and a smaller model for simple ones โ and I can articulate exactly where the quality threshold sits.
Thursday: I'm writing eval specs for a feature launching next month. Not a PRD โ an eval spec. Acceptance criteria defined as test cases. "Given this input, the output must satisfy these criteria." Hundreds of test cases, covering edge cases that would take a traditional PM weeks to even think of.
Friday: I'm reviewing candidates for an AI PM role on my team. I'm looking for the same skills I use every day.
This is the job. It's not project management with an AI label slapped on it. It's a fundamentally different way of building product.
The 4 Skills That Define an AI Product Engineer
After hiring multiple AI PMs and building an AI platform from scratch, I've distilled the role down to four core skills. Miss any one of them and you're just a traditional PM who happens to work on AI features.
1. Evals
This is the skill that matters most, and almost nobody has it.
An eval is how you measure whether your AI feature is actually working. Not "users seem to like it" โ rigorous, repeatable, quantitative measurement of output quality.
When we launched a feature that generates strategic recommendations for business users, I didn't wait until after launch to see if it worked. I built an eval suite before we started development. I defined what a good recommendation looks like. I created test cases โ hundreds of them โ covering different industries, company sizes, data profiles.
That eval suite became the spec. Engineering didn't build to a PRD. They built until the evals passed.
If you can't write an eval, you can't ship AI. Period.
2. Prototyping
You don't need to be a software engineer. You need to be dangerous enough to prove an idea works.
I use Claude Code almost daily. I prototype features, test prompt strategies, build quick demos. Not because I want to do engineering's job โ because the feedback loop between idea and validation needs to be hours, not weeks.
Last quarter I had a hypothesis that we could use a smaller, cheaper model for a specific subtask in our pipeline. Instead of writing a spec and waiting for engineering to test it, I prototyped it in an afternoon. Ran it against our eval suite. Found it performed within 3% of the larger model at 1/10th the cost.
That prototype turned into a production optimization that saves significant infrastructure spend monthly. It happened because I could build it myself.
3. Model Selection
Knowing which model to use for which task is not a nice-to-have. It's a core product decision that directly impacts quality, cost, and speed.
Most PMs treat model selection as a technical decision they defer to engineering. Wrong. The PM needs to understand the tradeoffs deeply enough to make the call โ or at least to challenge engineering's recommendation with informed questions.
I maintain a mental model (and actual benchmarks) of how different models perform on our specific tasks. Not generic benchmarks from the internet โ benchmarks from our data, our use cases, our quality bar.
When Anthropic drops a new model, I don't wait for the blog post summary. I run it against our evals that day. That's the difference.
4. Cost Engineering
AI features have marginal costs that traditional software doesn't. Every API call costs money. Every token costs money. And at scale, these costs compound fast.
The AI Product Engineer thinks about cost per output as naturally as they think about user experience.
I've made product decisions that look bizarre from a traditional PM perspective โ like deliberately degrading output quality by 5% to cut costs by 60%. But when you're processing millions of requests, that 5% quality trade (invisible to users in testing) saves hundreds of thousands of dollars per year.
This isn't engineering's decision. It's a product decision. And you can only make it if you understand both the quality and cost curves intimately.
The Story Nobody's Telling You
Here's what the AI PM influencers on LinkedIn won't say: the skills that got you here won't get you there.
Your ability to run a great sprint planning session doesn't matter in AI product. Your beautifully formatted PRD doesn't matter. Your stakeholder management skills? Still important, but they're table stakes โ everyone has them.
The differentiator is whether you can sit down, look at an AI system's output, and know โ with rigor, not vibes โ whether it's good enough to ship.
I've watched brilliant PMs struggle in AI roles because they couldn't make the shift from "managing the build" to "understanding the build." They kept trying to spec their way to quality. You can't spec your way to quality when the output is non-deterministic.
You have to eval your way to quality. And that requires a fundamentally different skillset.
The Market Is Pricing This In
The salary data is unambiguous. Traditional senior PM roles at top companies pay $200K-$300K. AI PM roles at the same companies โ or at well-funded AI startups โ are paying $350K-$500K+.
That's not a bubble. That's the market telling you exactly how scarce these skills are.
Every company is trying to ship AI features right now. Very few have PMs who actually know how to build them. The ones who do are getting paid accordingly.
Try This Week
Pick an AI feature you use regularly โ ChatGPT, Copilot, whatever. Write 10 test cases for it. Define what a "good" output looks like for each one. Run them. Score the results.
Congratulations โ you just wrote your first eval.
Now imagine doing that for every AI feature you ship, before you ship it, with hundreds of test cases, covering every edge case.
That's the job.
The PMs getting $500K+ offers aren't better at prioritization. They're better at building.
They write evals, not just specs. They prototype, not just wireframe. They understand models, not just users. They think in cost curves, not just roadmaps.
The AI Product Engineer isn't a fancy title. It's a completely different job. And right now, the window to become one is wide open.
The question is whether you'll walk through it.
Free Tool
How strong are your AI PM skills?
8 real production scenarios. LLM-judged across 5 dimensions. Takes ~15 minutes. See exactly where your gaps are.
PM the Builder
Practical AI product management โ backed by PM leaders who build AI products, hire AI PMs, and ship every day. Building what we wish existed when we started.
Benchmark your AI PM skills
8 production scenarios. Free. LLM-judged. See where you stand.
Go deeper with the full toolkit
Playbooks, interview prep, prompt libraries, and production frameworks โ built by the teams who hire AI PMs.
Free: 68-page AI PM Prompt Library
Production-ready prompts for evals, architecture reviews, stakeholder comms, and shipping. Enter your email, get the PDF.
Related Posts
Want more like this?
Get weekly tactics for AI product managers.