Q043 ·Causal inference

Global Sales Went Up — But Could Regional Analysis Show It Went Down?

Medium High frequency

1Problem

Your stakeholder shows you two findings:

"Global sales conversion rate increased from 35% to 50% after the launch."
But when they break it out by subregion, each region's conversion rate actually decreased.

Task: Is this possible? Why would this happen? What framework explains it, and how would you decide which number to report?

Second variant:

"A new VP at Google Express claims that Google Express's shampoo profit margin is below the market average." You have (1) the brand distribution of shampoo sold on Express vs. the general market, and (2) each brand's profit margin. How do you verify the claim?

2HintsThink first, then tap to reveal

Hint 1 Yes, it's possible — this has a name. Tap →

Hint 2 Decompose aggregate change into within-group plus mix effects. Tap →

Hint 3 Always report the segmented view, never just the headline. Tap →

3Solution

▶Simpson's Paradox — The Canonical Example

Yes, it's possible — this is simpsons-paradox. It happens when the mix of subgroups changes between two time periods (or between groups being compared), and the mix change dominates the aggregate.

Concrete illustrative numbers:

	Total	Subregion 1	Subregion 2	Subregion 3
Before	35%	90%	10%	5%
After	50%	50%	50%	50%

What happened here:

Before: traffic was mostly from Subregion 2 and 3 (low conversion regions), with Subregion 1 (high conversion) a small slice. The weighted average is pulled down by the low-conversion majority.
After: traffic rebalanced so the mix is evenly split. Now the high-conversion region (at a new, lower 50%) is a bigger fraction of the total, which pulls the average up — even though every region individually got worse.

The paradox: every subgroup got worse, but the aggregate got better. Because the mix changed.

▶Why It Happens

Simpson's paradox appears when:

There's a confounding variable that differs across subgroups.
The weights of those subgroups change between the two measurements being compared.
The subgroup effects and the weighting effect work in opposite directions.

Common real-world settings:

Weekend vs. weekday mix changed after a launch → conversion rate shifts just because users are different
New vs. experienced user mix changed → a metric that's stable within each group still moves in aggregate
Demographic mix (age, gender, country) shifted due to a marketing change
Platform mix (iOS vs. Android) — iOS often has higher conversion; if launch drew in more Android users, aggregate drops even if nothing else changed

▶How to Diagnose It

Always segment. Before reporting an aggregate change, check the major subgroup cuts. If the aggregate direction disagrees with the subgroup directions, stop and investigate.
Compute the mix effect explicitly. Decompose the aggregate change into:
- Within-group effect: what happens if subgroup rates change but weights stay fixed
- Between-group effect: what happens if weights change but rates stay fixed
If the between-group effect dominates, the aggregate is misleading.
Compute a reweighted aggregate — apply the old weights to the new rates (or vice versa). If that number disagrees with the naive aggregate, you've found the mix shift.

▶Which Number to Report

The segmented story, always. The right way to communicate is:

"Global conversion appears to have increased from 35% to 50%. However, this is entirely driven by a shift in regional traffic mix. Every individual region saw conversion decline. The aggregate lift is mechanical — we're showing more traffic from regions that have historically higher conversion rates. The new campaign appears to have hurt conversion in every region; it's only the traffic redistribution that looks like a win at the top level."

Reporting the headline number without the decomposition is a career-limiting mistake. It sends the organization in the wrong direction.

▶The Shampoo Variant (Google Express)

The VP's claim: "Express's shampoo profit margin is below the market average."

Two possible stories — one of them Simpson's paradox:

Story A (claim is correct, no paradox): Express genuinely has lower margins on each brand it sells.

Story B (Simpson's paradox): Each individual brand on Express has the same or higher margin as in the general market, but Express's mix skews toward low-margin brands.

To check:

For each brand, compare Express's margin to the market margin. If Express is lower on each brand, claim is directly correct.
If Express is equal or higher on each brand but the aggregate is lower, it's Simpson's paradox — Express sells disproportionately more low-margin brands.
The fix for (2) is a direct margin comparison per brand, not an aggregate — or a margin comparison reweighted to the market's brand mix.

Statistically, run a one-sample t-test (or z-test, depending on sample size) on Express's brand-margin distribution against the market population mean.

▶The Right Framing in an Interview

"The aggregate metric answers a different question than the per-subgroup metric, and when those answers point different directions, the aggregate is usually wrong to act on. The right thing to report is the segmented view plus a statement about what the aggregate is picking up — usually a mix shift — so the decision-maker isn't misled."

4Interview

Interviewer

You see that global conversion went up from 35% to 50%. But your colleague breaks it down by region and finds every region's conversion actually dropped. Is that possible?

Yes — this is Simpson's paradox. It happens when the aggregate number is a weighted average, and the weights — the subgroup mix — shift between the two time periods.

A concrete example: if traffic used to come mostly from low-converting regions and now comes mostly from high-converting regions, the average conversion rate can go up even if every region got worse on its own. The mix change dominates the within-group change.

Self-rate:

Interviewer · Follow-up

Show me some numbers that would do that.

Sure. Suppose before the launch, 90% of traffic was Region 1 with 10% conversion, 10% was Region 2 with 50% conversion. Aggregate is 0.9 × 10% + 0.1 × 50% = 14%.

After the launch, traffic is 10% Region 1 now at 8% conversion, 90% Region 2 at 45% conversion. Aggregate is 0.1 × 8% + 0.9 × 45% = 41.3%.

Aggregate went from 14% to 41%, but Region 1's rate dropped from 10% to 8%, and Region 2's rate dropped from 50% to 45%. Every region got worse. The aggregate looks like a massive win because the mix shifted toward the historically higher-converting region.

Self-rate:

Interviewer · Follow-up

How would you diagnose that this is what's happening, before reporting to the team?

Three things I'd always do before trusting an aggregate:

Segment. Always break down by the obvious cuts — region, platform, new vs. returning, acquisition channel. If the aggregate direction disagrees with the subgroup directions, that's a red flag.

Decompose the change. The total change can be written as a within-group effect (rate changes within each subgroup) plus a between-group effect (mix shift with rates held constant). If the between-group effect is big and has the opposite sign, the aggregate is misleading.

Compute a reweighted aggregate. Apply the old weights to the new rates — what would the global rate be if the mix hadn't shifted? If that reweighted number looks very different from the naive post-launch rate, you've confirmed the mix is doing the work.

Self-rate:

Interviewer · Follow-up

Which number do you actually report?

Always the segmented view with the interpretation. Something like:

"Global conversion increased from 35% to 50%. But this is entirely driven by a traffic mix shift — more traffic now comes from regions that have historically higher conversion. Conversion declined in every individual region. The apparent lift is mechanical; if we'd held the regional mix constant, the aggregate would have dropped."

Reporting just "conversion is up 15 points" without the decomposition would be a serious mistake — the team would ship the launch thinking it succeeded, when it actually hurt every region.

Self-rate:

Interviewer · Follow-up

Here's a related scenario. A new VP at Google Express thinks shampoo profit margins on Express are below the market average. You have the brand mix on Express vs. the general market, and the margin of each brand. How do you check?

Same framework, applied to a different decomposition.

Two possibilities.

Express is actually lower margin on each brand. If so, the claim is directly correct — Express is selling the same brands at worse unit economics.

Simpson's paradox. Each individual brand's margin on Express equals or exceeds the market's margin for that brand — but Express's brand mix is skewed toward low-margin brands. The aggregate looks bad but per-brand performance is fine or better.

To distinguish: compare Express's margin to the market's margin brand by brand. If Express wins or ties on each brand but loses in aggregate, the claim is misleading — Express is just a retailer with a different product mix, not a worse performer.

Statistically, I'd run a one-sample test — either a z-test or t-test depending on sample size — of Express's per-brand margin distribution against the market average. But the decomposition is what actually answers the VP's question. The statistical test is a secondary detail.

Self-rate:

Interviewer · Follow-up

What would you tell the VP?

"Your claim is right in aggregate but potentially misleading. Per brand, Express's margins are at or above the market level. The aggregate is lower because Express sells more of the low-margin brands — that's a product-assortment decision, not a pricing or operations problem. If the goal is to raise aggregate margin, the lever is changing the assortment, not fixing per-brand pricing."

That's a completely different action plan than what they'd probably do if we just told them "yes, margins are low."

Self-rate:

Interviewer

You see that global conversion went up from 35% to 50%. But your colleague breaks it down by region and finds every region's conversion actually dropped. Is that possible?

Candidate

Yes — this is Simpson's paradox. It happens when the aggregate number is a weighted average, and the weights — the subgroup mix — shift between the two time periods.

Interviewer

Show me some numbers that would do that.

Candidate

Sure. Suppose before the launch, 90% of traffic was Region 1 with 10% conversion, 10% was Region 2 with 50% conversion. Aggregate is 0.9 × 10% + 0.1 × 50% = 14%.

After the launch, traffic is 10% Region 1 now at 8% conversion, 90% Region 2 at 45% conversion. Aggregate is 0.1 × 8% + 0.9 × 45% = 41.3%.

Interviewer

How would you diagnose that this is what's happening, before reporting to the team?

Candidate

Three things I'd always do before trusting an aggregate:

Segment. Always break down by the obvious cuts — region, platform, new vs. returning, acquisition channel. If the aggregate direction disagrees with the subgroup directions, that's a red flag.

Interviewer

Which number do you actually report?

Candidate

Always the segmented view with the interpretation. Something like:

Reporting just "conversion is up 15 points" without the decomposition would be a serious mistake — the team would ship the launch thinking it succeeded, when it actually hurt every region.

Interviewer

Candidate

Same framework, applied to a different decomposition.

Two possibilities.

Express is actually lower margin on each brand. If so, the claim is directly correct — Express is selling the same brands at worse unit economics.

Interviewer

What would you tell the VP?

Candidate

That's a completely different action plan than what they'd probably do if we just told them "yes, margins are low."

Interviewer

Great.

5Notes

Saved Export all notes

6Mock interview debrief

This question has a debrief tool attached. Practice it aloud with a voice-mode AI interviewer, paste the transcript, and get a graded debrief against the reference answer.

Create account or sign in

How to do a mock interview

1
Open DS Mock Interviewer in ChatGPT
2

Copy this question and paste it as your first message:

Your stakeholder shows you two findings: - "**Global** sales conversion rate **increased** from 35% to 50% after the launch." - But when they break it out by subregion, each region's conversion rate actually **decreased**. **Task**: Is this possible? Why would this happen? What framework explains it, and how would you decide which number to report? Second variant: - "A new VP at Google Express claims that Google Express's shampoo **profit margin is below the market average**." You have (1) the brand distribution of shampoo sold on Express vs. the general market, and (2) each brand's profit margin. How do you verify the claim?
3

Switch to voice mode (mic icon in the chat input). Speak through each follow-up — aim for 4–6 turns.
4

When the interviewer says "thank you, that's all I had", type or speak this:

Print the full transcript of our conversation as alternating "Interviewer:" and "Candidate:" lines. Include every exchange verbatim. Do not paraphrase, summarize, or skip turns. Do not add commentary.
5

Copy ChatGPT's response, paste it below, and run the debrief.

Shortcuts ›

SpaceReveal next 123Status FFocus TTranscript NNotes EscClose concept ←→Prev / next