All Research

RC Research

Can You Game the Reading Comprehension?

We tested four popular answer-choice heuristics against 10,340 real LSAT answer choices. The exam won.

310 passages · 2,068 questions · 4 analyses · March 2026

LSAT students trade test-taking "tricks" like folk remedies: pick the longest answer. Avoid extreme language. The correct answer paraphrases; traps copy verbatim. These heuristics feel plausible. Some tutors teach them as legitimate strategies.

We ran four empirical analyses across every Reading Comprehension question in our database to find out if any of them actually work.

The short answer: no. But the details are interesting.

Analysis I

"Pick the longest answer"

The most persistent myth in standardized testing. Does it hold up?

Myth — not supported
Correct Mean 15.9 words
Incorrect Mean 15.7 words
Effect Size d = 0.02 negligible (threshold: 0.20)
Significance p = 0.42 not significant

Correct and incorrect answers are virtually identical in length. The difference is 0.2 words. The LSAT's item writers clearly control for answer length.

Quintile: if you always pick the longest answer

Longest
20.4%
2nd
20.1%
3rd
22.1%
4th
19.6%
Shortest
17.8%

The shortest answer is slightly less likely to be correct (17.8% vs 20% expected). But the longest? Dead average. This heuristic gives you nothing.

Analysis II

"Pick the hedged, conservative answer"

Correct answers use "some" and "may." Wrong ones say "all" and "never." Right?

Real but negligible
Correct: % with hedge 22.1%
Incorrect: % with hedge 17.9%
Effect Size d = 0.15 below "small" threshold
Significance p < 0.001 significant, but tiny

This one is technically true: correct answers do use slightly more hedging language. But the effect is so small it's useless in practice. Picking the "most conservative" answer gives you 23.1% accuracy — barely above the 20% random baseline.

One sub-type stands out: Local Inference questions show d = 0.25 (p < 0.001) — the only question type where the signal clears the "small effect" threshold.

Analysis III

"Correct answers paraphrase; traps copy verbatim"

Perhaps the most widely-taught LSAT heuristic. If a choice lifts exact words from the passage, it's a trap.

Not confirmed
Correct: Word Overlap 47.4% of choice words found in passage
Incorrect: Word Overlap 46.6%
Effect Size d = 0.03 negligible
Significance p = 0.17 not significant

The paraphrase heuristic is not supported by the data. Correct and incorrect answers share almost exactly the same proportion of words with the passage. The direction is opposite to the myth: correct answers have marginally more overlap, not less.

Analysis IV

Semantic similarity to the passage

Beyond raw words: do correct answers live in a distinct semantic neighborhood?

Negligible overall, one exception

We built TF-IDF vectors for all 10,650 documents (310 passages + 10,340 choices) and computed cosine similarity between each answer choice and its passage.

Correct: Cosine Sim 0.182
Incorrect: Cosine Sim 0.173
Effect Size d = 0.06 negligible
Clustering? No correct answer not distinct from distractors

Overall: no signal. But broken down by question type, one standout emerges.

Effect size by question type (Cohen's d)

Main Point
0.37 ***
Purpose Ref.
0.18
Function
0.09
Inference
0.04
Detail
0.03
Prim. Purpose
0.01

The one real finding

Global Main Point questions (n = 252) show a small-to-medium effect: correct answers are significantly more semantically similar to the passage (d = 0.37, p < 0.001). This makes intuitive sense — the main point should closely mirror the passage's vocabulary. But this is one question type out of twelve.

The full picture

HeuristicCohen's dp-valueVerdict
"Pick the longest" 0.020.42 Myth
"Pick the hedged/conservative" 0.15< 0.001 Negligible
"Paraphrase = correct" 0.030.17 Not confirmed
"Most similar to passage" 0.060.01 Negligible

The LSAT's answer-choice engineering is remarkably good at neutralizing surface-level shortcuts.

None of the four heuristics produce an effect size above the 0.20 threshold for a "small" effect. The best performer — conservativeness — delivers 23.1% accuracy, barely clearing the 20% random guess rate.

The only reliable approach is understanding the passage.

Methodology

How we measured this

Dataset: 310 LSAT RC passages containing 2,068 questions with 5 answer choices each (10,340 total choices).

Conservativeness score: (hedge word frequency − extreme word frequency) / total words. Hedge words include some, may, might, could, often, generally, tends. Extreme words include all, every, always, never, none, only, must.

Word overlap: Overlap ratio = (choice content words found in passage) / (total choice content words). Jaccard similarity also measured. Stop words excluded.

TF-IDF similarity: Custom TF-IDF vectors across all 10,650 documents, L2-normalized, cosine similarity computed per choice-passage pair.

Statistical tests: Welch's t-test, Cohen's d. Thresholds: negligible (< 0.20), small (0.20–0.50), medium (0.50–0.80), large (> 0.80).