When the tests fail
Even states considered models of accountability are struggling to come up with reliable tests
RALEIGH, N.C. — North Carolina is considered ahead of the curve when it comes to holding schools accountable. So if testing troubles here have officials stymied, it doesn't bode well for other states' efforts at standards-based reform.
The testing program here has been acclaimed by Princeton Review as the best in the nation and was a model for the new federal law that requires states to begin testing students in reading and math this year with sanctions coming if schools don't show yearly improvements.
But devising and grading tests accurately can be a difficult process, and it seems unlikely most states will meet the requirements of the No Child Left Behind Act right away, given the problems cropping up in a range of states, including some with years of experience doing statewide testing.
Every testing snafu gives new ammunition to critics who say that reliance on standardized testing is misguided in the first place. But even if their arguments fail to change the direction of education reform, that reform could be delayed as states scramble to establish standards and tests that match up.
Embarrassed North Carolina state school board members acknowledged two weeks ago that the results of pilot writing tests for the fourth and 10th grades had to be thrown in the dumpster because more than half the students failed. This came a year after the state experienced problems with the grading scale on a new math test that resulted in nearly everyone receiving A's.
To some, the high failure rate on the writing test indicated that the wording of the questions was confusing, while to others, the results just showed that students aren't performing as well as they should be expected to.
On the writing test this year, it was also disconcerting that nearly 30 percent of 10th-graders refused to write answers, scribbling some often-colorful versions of 'This doesn't matter, so I'm not taking it' across the top.
The past few months have seen testing problems in other states, whether they use tests created by private companies or homegrown tests such as North Carolina's, developed by experts from state universities. Some examples:
In July, Nevada officials reported that 736 sophomores and juniors had mistakenly been told they had failed the math portion of a test; when tests were rechecked, it turned out the students had passed.
In New Mexico, 70 percent of superintendents recently reported testing errors of various kinds, according to FairTest, a group in Cambridge, Mass., that objects to high-stakes testing.
In Georgia, Harcourt Educational Measurement could not deliver accurate results from last spring's Stanford 9 tests in time for this school year, throwing off students' assignments to gifted and remedial classes. The company called in several experts to help solve the problems with the tests, which were developed specifically for Georgia's third-, fifth-, and eighth-graders. School officials are considering fining the testing company.
"Broad assessments do have real value," says Dick Clifford, a researcher at the Child Development Institute in Chapel Hill, N.C. "But I worry that these mistakes will lead us away from getting the kind of information we need to make good public policy."
Testing proponents warn against overreacting. For Lawrence Feinberg, assistant director of the bipartisan National Assessment Government Board in Washington, it's "logical" that states will have to make difficult adjustments as they assign more weight to test scores in efforts to improve education for all students.
"Whenever you have a new version of a test, and you're trying to compare it to the previous year, that's very hard to do in a uniform way," says Mr. Feinberg.
But even testing proponents acknowledge that the speed with which states are being asked to implement tests is contributing to problems.
The fact that many new state-specific tests have to be developed is putting a strain on the system, says Chrys Dougherty, director of research at the National Center for Education Accountability in Austin, Texas. "One reason we're seeing these mistakes is that [the demands on] states and testing companies are exceeding the capacity of the existing tests," he says.
But he also believes that many of these testing errors will be smoothed over as both testmakers and test-takers warm up to the new routine.
Tying student performance to teacher pay, as some states do, and to school funding, as the federal law does, sometimes complicates the process, too, says Mr. Clifford. "When you put people in a position where [test results] may cost them their jobs or it may affect their pay, that increases the likelihood that there are going to be problems with the way the tests are administered," he says.
In addition to the stress of making tests more "high stakes," the sheer volume of tests that students take has prompted parents, teachers, and students to protest in recent years.
Georgia students, for example, take a battery of state-mandated exams, manufactured by three separate companies.
At the same time, the federal government's National Assessment of Educational Progress (NAEP) gives random survey tests to gather baseline information so states can see how their students measure up against other states. Then there's always the SAT for college applications on the heels of the PSAT.
Today, there are signs that states are pulling away from some tests, at least temporarily.
Partly in reaction to this summer's dilemma, Georgia's state school board voted to make the Stanford 9 optional as an educational assessment tool for local school districts.
The state is looking to other testing mechanisms to fulfill the requirements of the No Child Left Behind Act.
In the wake of problems in North Carolina, officials are considering postponing the implementation of the statewide writing assessment by two years. They expect their reading and math testing to continue, which will satisfy the federal law, but that doesn't mean they don't foresee the possibility of more bumps on the road.
"Other states like to look to North Carolina for advice on testing, but I can only say that they'll have to figure out a lot of this on their own," says Lou Fabrizio, North Carolina's test czar at the Education Department.
"One thing seems certain here," he says. "Things change whenever tests all of a sudden become part of high-stakes accountability."
Idaho may best be known for the Sawtooth Mountains and spuds, but soon the Gem State will have another notch in its belt: It's about to become the first state in the US to trade in its No. 2 pencils for "smart tests."
Starting this fall, Idaho public schools will rely on a new generation of testing technology. The computerized tests will adapt questions to what a student knows and they'll return the results the next day.
After decades of research, a number of small companies are now producing software that they claim can grade writing tests with more accuracy than a teacher.
"It sounds like 'Star Trek,' but these tests are actually being widely used today," says Scott Elliott, a spokesman for Vantage Learning, an East Coast firm that uses "artificial intelligence" technology not only to grade writing, but also to give pointers to students instantaneously.
While computerized instruction is a controversial topic in American schools, the idea of using computers in assessment is gaining adherents from Los Angeles to Boston.
"It's an extremely high priority right now to improve how well we measure students' abilities," says Chrys Dougherty, research director for the National Center for Education Accountability in Austin, Texas. "Think how important it is for the economy to have accurate business accounting. And look at the catastrophe that occurs when we don't."
In Idaho this fall, students will log onto computerized tests that can figure out almost exactly each one's achievement level, by automatically increasing the difficulty depending on how well each answers the questions.
"Like people mark the growth of a child by marking their height on a doorjamb in the garage, we've created a tape measure that allows us to identify how tall a child is academically every year, and to calculate the number of inches of growth in math, in reading, in language, and in science," says Allan Olson, president of the Northwest Evaluation Association (NWEA) in Portland, Ore., which created the Idaho test.
In NWEA's case, the programs are based on nearly two decades of research, using studies done by the military and universities to create software that can instantly gauge a child's achievement level.
Getting test data in a fraction of the time it takes to grade paper tests isn't just manna for states trying to abide by the new testing requirements under the No Child Left Behind Act. It also will let teachers instantly figure out which students are struggling in each class and let principals know which teachers are having trouble in specific areas of instruction.
Instant results, proponents say, can eliminate the "test lag" that occurs when students get tests back long after they've forgotten what they wrote.
"You'd hate to get on a scale in the bathroom and three months later get your weight back," says Mr. Dougherty. "These new tests eliminate those kinds of problems."
Companies that make computerized tests also say they tend to be cheaper to administer and grade.