Model Selection Isnt A Technical Decision

Why Model Selection Is a PM Concern

Let me show you why this matters:

Cost Structure

Different models have vastly different costs:

Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)
GPT-4 Turbo	$10	$30
Claude 3.5 Sonnet	$3	$15
GPT-3.5 Turbo	$0.50	$1.50
Llama 3.1 (self-hosted)	~$1-2	~$1-2

At scale, these differences are millions of dollars annually. That's a product economics question, not a technical one.

Quality Tradeoffs

Models excel at different things:

Claude: Following complex instructions, nuanced analysis, safety
GPT-4o: Multi-modal, general reasoning, code
Gemini 1.5: Long context, video understanding
Llama: Self-hosting, customization, cost

Which tradeoffs matter depends on your use case. That's a product question.

Latency

Model latency affects user experience:

Fast models (GPT-4o, Claude Sonnet): 500ms-1s typical
Slower models (GPT-4, Claude Opus): 2-5s typical
Self-hosted: Variable based on infrastructure

For real-time features, latency determines UX. For batch processing, it doesn't matter. Product decides.

Privacy

Data handling varies by provider:

Do they train on your data?
Where is data processed geographically?
What's their retention policy?
Can you get a HIPAA BAA?

For healthcare, finance, or privacy-sensitive products, this determines which models are even legal to use.

Lock-in

Switching models has costs:

Prompts may not transfer cleanly
Output formats differ
Fine-tuning is model-specific
Integration points vary

The choice of model creates dependencies. PM should understand these.

The PM's Model Selection Framework

Here's how I approach model selection:

Step 1: Define Requirements (PM-Led)

Before any model comparison, define what you need:

Quality requirements:

What task is the model doing?
What does "good enough" look like?
What's unacceptable?

Performance requirements:

Target latency (p50, p95)
Throughput needs
Availability requirements

Cost constraints:

Budget per request
Monthly budget cap
Cost at projected scale

Privacy/compliance requirements:

Data sensitivity level
Regulatory requirements
Acceptable data processing locations

Strategic requirements:

How important is avoiding vendor lock-in?
Do we need fine-tuning capability?
Do we need multi-modal?

Write these down BEFORE comparing models. Otherwise you'll optimize for the wrong things.

Step 2: Shortlist Models (Joint PM/Eng)

Based on requirements, identify candidate models.

The usual suspects:

OpenAI: GPT-4o, GPT-4 Turbo, GPT-3.5
Anthropic: Claude 3 Opus, Claude 3.5 Sonnet
Google: Gemini 1.5 Pro, Gemini 1.5 Flash
Meta: Llama 3.1 (various sizes)
Mistral: Mistral Large, Mistral Small, Mixtral

Don't limit to one provider. Compare.

Step 3: Run Comparative Evals (Eng-Led, PM-Designed)

Here's where engineering does the work, but PM designs the test:

Create an eval set:

50-100 real examples from your use case
Cover edge cases and adversarial inputs
Define clear scoring criteria

Run each candidate:

Same prompts, same examples
Measure quality scores
Measure latency
Calculate cost

Compare results:

Model	Quality Score	p50 Latency	Cost/Request
Model A	87%	800ms	$0.02
Model B	91%	1200ms	$0.05
Model C	85%	600ms	$0.01

Now you have data.

Step 4: Make the Decision (PM-Led)

With data in hand, PM makes the call:

If quality differences are small: Optimize for cost or latency

If quality differences are large: Pay for quality if business case supports it

If privacy constraints exist: Eliminate non-compliant options

If lock-in matters: Favor standards-based or open-source options

Document the decision and the reasoning. You'll revisit this.

The Model Selection Checklist

Use this for any model selection decision:

Requirements Clarity:

Quality bar defined with specific criteria
Latency requirements specified
Cost budget established
Privacy/compliance needs documented
Strategic considerations (lock-in, customization) identified

Evaluation Rigor:

Multiple models compared
Real use case examples in eval set
Quality scoring methodology defined
Latency measured under realistic conditions
Cost calculated at projected scale

Decision Quality:

Data-driven comparison completed
Tradeoffs explicitly acknowledged
Decision documented with reasoning
Fallback/migration plan considered
Review trigger defined (when to reconsider)

Multi-Model Strategies

Here's where it gets interesting: you don't have to pick one.

Model Routing: Use different models for different request types.

Simple queries → cheap/fast model (GPT-3.5, Haiku)
Complex queries → premium model (GPT-4, Opus)

Route based on query complexity. Reduce cost without sacrificing quality where it matters.

Fallback Chains: Primary model fails or is slow → fall back to alternative.

Improves reliability. Reduces dependency on single provider.

A/B Testing: Run different models for different user segments.

Learn which model performs better for your specific use case.

Ensemble: Multiple models vote or verify each other.

Improves quality for high-stakes decisions.

PM should push for multi-model architecture when it makes sense. Single-model dependency is a risk.

The "But Engineering Said X" Conversation

What to do when engineering has already picked a model?

Don't: Challenge the decision confrontationally.

Do: Ask good questions.

"Help me understand the model selection. I want to make sure I can defend it to stakeholders."

What options did we consider?
What were the quality scores on our use case?
What's the cost trajectory as we scale?
What are the lock-in implications?
What would trigger us to reconsider?

If the answers are solid, great. If the answers are "it's what we know" or "it's the best," dig deeper.

When to Reconsider Model Selection

Model selection isn't permanent. Revisit when:

Cost changes: Your costs spike, or a provider changes pricing.

Quality changes: New models release (happens constantly). Your model is no longer best-in-class.

Requirements change: You need longer context, multi-modal, or different capabilities.

Scale changes: Volume justifies self-hosting what you're buying via API.

Provider issues: Reliability problems, deprecation announcements, policy changes.

Build in regular model reviews (quarterly) to avoid complacency.

The Strategic View

Model selection is product strategy.

Commodity AI: Use APIs, optimize for cost, accept some vendor dependency. AI is a feature, not the differentiator.

Competitive AI: Customize models, prioritize quality, invest in differentiation. AI is core to the product.

Regulated AI: Prioritize compliance, accept cost premiums, prefer self-hosted or compliant providers. Constraints dominate.

Know which quadrant you're in. Let that drive model selection philosophy.

The Conversation with Engineering

When you engage engineering on model selection:

Come prepared with:

Requirements document (quality, latency, cost, privacy)
Business context (why these requirements matter)
Questions, not demands

Ask for:

Comparative eval data
Cost projections at scale
Lock-in assessment
Maintenance implications

Collaborate on:

Eval set design (you know the use cases)
Tradeoff decisions (you own the business case)
Review cadence (you'll know when requirements change)

Own:

The final decision (within your scope)
Communicating rationale to stakeholders
Revisiting when circumstances change

Model selection is a joint effort, but PM should drive the process, not just receive the output.

Key Takeaways

Model selection has product implications — cost, quality, latency, privacy, and lock-in are all PM concerns
Define requirements before comparing — know what you need, then evaluate; don't let engineering optimize for the wrong criteria
Consider multi-model strategies — routing, fallbacks, and A/B testing can optimize better than any single model choice

Model Selection Isnt A Technical Decision

Why Model Selection Is a PM Concern

Cost Structure

Quality Tradeoffs

Latency

Privacy

Lock-in

The PM's Model Selection Framework

Step 1: Define Requirements (PM-Led)

Step 2: Shortlist Models (Joint PM/Eng)

Step 3: Run Comparative Evals (Eng-Led, PM-Designed)

Step 4: Make the Decision (PM-Led)

The Model Selection Checklist

Multi-Model Strategies

The "But Engineering Said X" Conversation

When to Reconsider Model Selection

The Strategic View

The Conversation with Engineering

Key Takeaways

How strong are your AI PM skills?

PM the Builder

Benchmark your AI PM skills

Go deeper with the full toolkit

Free: 68-page AI PM Prompt Library

Related Posts

The Great AI PM Orchestration Split

The AI PM Portfolio Guide: What to Include, How to Build It, and 3 Examples

5 AI PM Frameworks That Actually Work (Not Theoretical Nonsense)

Want more like this?