Measure What Matters

Measure What Matters

EXCERPT: An expert on school reform assesses the state of assessment in public education.

John Merrow ’63 | Nov - Dec 2017

The maxim “Measure What Matters” could be a bumper sticker, because it’s unambiguous: First, figure out what we care about in education and then develop ways of measuring those skills. While that sounds simple, it might not be easy: It requires us to define just what ‘success’ means when it comes to schooling. What do we want our children and grandchildren, and other people’s children and grandchildren, to be good at? It would be short-sighted to think only of those we share blood ties with because other people’s children and grandchildren will, as adults, be fixing the gas leak at your neighbor’s home, monitoring your IV drip in the hospital, and maintaining the jet engines on the planes you fly on.

Right now most school systems do the opposite of “Measure What Matters,” valuing instead what’s easy (and inexpensive) to measure. But it may be about more than saving money, because the strong supporters of ‘School Reform’ tend to endorse and support what I call the “test and punish” approach, with teachers as the target. For these folks, cheap tests are just fine. This policy of judging teachers based almost solely on student test scores has poisoned learning by turning it into a “gotcha” game.

A healthier approach would be “assess to improve,” with assessment being a tool to help both students and teachers get better. The contrast between “assess to improve” and “test and punish” could not be starker.

Whenever anyone talks seriously about redesigning schools, opponents of change drag out a familiar straw man, “Maybe everyone hates these tests, but how will we measure academic progress if we abandon testing?” No serious proponent of change is opposed to evaluating learning. While academic learning must not be understood as the only goal, its importance cannot be overlooked or dismissed. Yes, the unwarranted emphasis on costly and time-consuming standardized testing distorts the process of teaching and learning in schools, but the public and parents have the right to know what students are learning. We must have measures of learning—that is, tests—but they must be valid, reliable measures of genuine learning.[i]

The multiple-choice questions on standardized, machine-scored tests are not designed to measure diligence, honesty, tolerance, fairness, and compassion, which are the values and attitudes that parents repeatedly say they want their children to possess. Parents want their kids to be well rounded; to develop the skills they need to continue learning on their own; and to become good citizens, productive workers, and fulfilled human beings. Most employers would probably agree. So, if those attributes, skills and behaviors matter, we must determine how to measure them.

My argument is that a new approach must seek to find out how each child is intelligent, replacing the current question, “How smart are you?” Today’s system uses tests (and race and parental income) to sort children into groups (which are not but might as well be called ‘winners’ and ‘losers’). We need to determine what matters most and then—and only then—develop measures to assess progress toward those outcomes. Or, as some advocates express it, we need to measure what we value, instead of valuing what we measure.

Too Much Money . . . and Too Little
Paradoxically, public education spends too much money overall and too little on testing and assessment. Basically, our school districts and states buy lots of cheap tests and administer them to every student in grades 3 through 10, and sometimes in the grades above and below those. The bill for this comes to tens of billions of dollars a year, according to FairTest. How much varies widely from state to state. “We find that the 45 states from which we obtained data spend a combined $669 million per year on their primary assessment contracts, or $27 per pupil in grades 3-9, with six testing vendors accounting for 89 percent of this total. Per pupil spending varies significantly across states, with Oregon ($13 per student), Georgia ($14), and California ($16) among the lowest-spending states, and Massachusetts ($64), Delaware ($73), and Hawaii ($105) among the highest spending,” wrote Matthew Chingos in a 2012 report.[ii]

But to put the highest number, Hawaii’s $105, in perspective, consider how much you might spend having your car “assessed” every year. I drive a 2002 Toyota 4Runner (which I bought used in 2010 for $12,000), and the annual tune-up costs about $200. Ironically, I bought the car in California, which spends less than $20 per child on state assessments—not even 10 percent of what I spend assessing my car’s condition!

Testing companies, which know that there’s no good reason to test every child every year, must be laughing all the way to the bank. Sampling works in education just as it works in politics and marketing, and school districts could save billions by testing stratified random samples of students every year, then perhaps testing all children every three or four years. Teacher judgments are reliable and valid, and we used to trust them; now we no longer do (another consequence of our addiction to school reform). If we can learn to trust sampling and spend our money on good assessments, we would have better data, teachers and students would have more time for learning, and our students would no longer be the most-tested kids in the world.

Questionable Questions
Experiencing the vapidity of some test questions may help you grasp the problems with testing and understand why American students score lower than their counterparts in most other advanced nations. The first sample problem was offered by the University of Wisconsin–Oshkosh to high school math teachers and was designed to help “close the math achievement gap.”

Jack shot a deer that weighed 321 pounds. Tom shot a deer that weighed 289 pounds. How much more did Jack’s deer weigh than Tom’s deer?

Are you kidding me: Basic subtraction for high school students? My second example came from TeacherVision, part of Pearson, the giant testing company:

Linda is paddling upstream in a canoe. She can travel 2 miles upstream in 45 minutes. After this strenuous exercise she must rest for 15 minutes. While she is resting, the canoe floats downstream ½ mile. How long will it take Linda to travel 8 miles upstream in this manner?

This question’s premise is questionable. Will some students be distracted by Linda’s cluelessness? Won’t they ask themselves how long it will take her to figure out that she should grab hold of a branch while she’s resting in order to keep from floating back down the river? What’s the not-so-subtle subtext? That girls don’t belong in canoes? That girls are dumb?

I found the sample question below on a high school math test in Oregon:

There are 6 snakes in a certain valley. The population doubles every year. In how many years will there be 96 snakes?
a. 2
b. 3
c. 4
d. 8

This one you can solve by counting on your fingers, but all three high school math problems require simple numeracy at most. With enough practice—note I did not say critical thinking—just about anyone can solve undemanding problems like these and consequently feel confident of their ability.

School is supposed to be preparation for life, but spending time on problems like these three is like trying to become an excellent basketball player by shooting free throws all day long. To be good at basketball, players must work on all aspects of the game: jump shots, dribbling, throwing chest and bounce passes, positioning for rebounds, running the pick-and-roll, and, occasionally, practicing free throws.

Both basketball and life are about rhythm and motion, teamwork and individual play, offense and defense. Like life, the pace of the game can slow down or become frenetic. Basketball requires thinking fast, shifting roles, and having your teammates’ backs. Successful players know when to shoot and when to pass. As in life, failure is part of the game. Even the greatest players miss more than half of their shots, and some (even Michael Jordan!) are cut from their high school teams. And life doesn’t give us many free throw opportunities. If school is supposed to be preparation for life, why are American high school students being asked to count on their fingers? This sort of trivial work is the educational equivalent of shooting free throws.

My fourth example was a Common Core National Standards question for eighth graders in New York State. Keep in mind that the Common Core was supposed to introduce “much-needed rigor” to the curriculum.

Triangle ABC was rotated 90° clockwise. Then it underwent a dilation centered at the origin with a scale factor of 4. Triangle A′B′C′ is the resulting image. What parts of A′B′C′ are congruent to the corresponding parts of the original triangle? Explain your reasoning.

This problem represents the brave new world of education’s Common Core, national standards adopted at one point by nearly every state and the District of Columbia but now very much out of favor. This approach was supposed to expose students to higher and more rigorous standards and to challenge and engage students. Reading that prose, are you feeling engaged? Imagine how eighth graders might feel. If the first three problems are the educational equivalent of practicing free throws, then solving problems like this one is akin to spending basketball practice taking trick shots, like hook shots from midcourt—yet another way not to become good at the sport.

If schools stick with undemanding curricula and boring questions, our kids will be stuck at the free throw line, practicing something they will rarely be called upon to do in real life. If (under the flag of “greater rigor”) we ditch those boring questions in favor of triangles and other lifeless questions, schools will turn off the very kids they are trying to reach: the 99 percent who are not destined to become mathematicians.

My fifth example was given to fifteen-year-olds around the world on a test known as PISA (Programme in International Student Assessment):

Mount Fuji is a famous dormant volcano in Japan. The Gotemba walking trail up Mount Fuji is about 9 kilometers (km) long. Walkers need to return from the 18 km walk by 8 p.m.

Toshi estimates that he can walk up the mountain at 1.5 kilometers per hour on average, and down at twice that speed. These speeds take into account meal breaks and rest times.

Using Toshi’s estimated speeds, what is the latest time he can begin his walk so that he can return by 8 p.m.?

Note that this is not a multiple-choice question. To get the correct answer, students have to perform a number of calculations. The correct answer (11 a.m.) was provided by 55 percent of the Shanghai fifteen-year-olds but just 9 percent of the U.S. students.

Ironically, the PISA results generally reveal that American kids score high in confidence in mathematical ability, despite underperforming their peers in most other countries. I wonder if their misplaced confidence is the result of too many problems like the one about the snakes.

In addition to being more challenging, PISA and other international tests are given to a carefully drawn sample of students. Administering standardized tests to every student in grades 3 through 8 plus grade 10—which is what current U.S. laws require—is unnecessary and wasteful. Ask yourself who benefits when schools test all kids. Not students, not teachers, and not the general public.

For most students, March, April, and May are the cruelest months. That’s testing season, when learning and teaching stop in most schools, and test prep begins in earnest. Actually, in some schools (most of them in low income areas), it’s test prep pretty much all year long. Those who worship at the altar of “School Reform” are obsessed with measurement and testing rules, believing that test results will help weed out poor teachers. Districts spend tens of billions of dollars a year buying, preparing for, administering, and grading standardized tests.

That obsessive focus must be overthrown, but even those who believe our current system of measuring achievement does far more harm than good must also agree that we cannot simply abandon testing. Instead, we have to find better ways of measuring learning. Good assessments will help students learn more, help teachers do a better job of teaching, and enable systems to make judgments about schools and the adults in them.

There is good news. The test-centric situation shows signs of changing in at least three promising ways:

1) Some school districts have cut back on the number and frequency of standardized tests.

2) A few districts have embraced testing only a stratified random sample of students, recognizing that the results provide an accurate picture of the health of the system.

3) And—the most promising sign of all—many students are simply refusing to take some of the mandated standardized tests. The so-called ‘Opt Out’ movement has resulted in more than 20 percent of all high school students in one state—New York—refusing to sit for certain tests. In some high schools in New York and elsewhere as many as 80 and 90 percent of kids have opted out, this despite a federal rule requiring 95 percent participation. That’s a clear message that the consumers are fed up. It’s a message that adults ignore at their peril.

Adapted from Addicted to Reform: A 12-Step Program to Rescue Public Education (The New Press, 2017), and reprinted with permission. John Merrow spent 41 years covering public education for The PBS NewsHour and NPR. During his career he received two George Foster Peabody Awards, the George Polk Award, and the McGraw Prize. @John Merrow 2017

[i] ‘Validity and ‘reliability are fundamental building blocks. A valid test measures what it’s supposed to measure, and a reliable test can be counted on to always measure what it’s designed to measure.

[ii] Chingos, “Strength in Numbers.”

Measure What Matters

Portfolio

Recent Issues

July-August 2025

May-June 2025

March-April 2025

January-February 2025

November-December 2024

September-October 2024