Who's scoring those high-stakes tests? Poorly trained temps.

By Cameron Fortner

September 18, 2001

Growing up in California's public schools, I took more standardized tests than I can remember: Teachers at every grade level stressed their importance. I didn't want to let anyone down, so I approached each test with all the solemnity and effort a child can muster.

I never questioned that obedience. As a child, I imagined my test answers being flown across the country to a room of educated, professional test scorers who possessed a zeal for essays written on such topics as "A moment that changed my life."

My summer as a test-scorer disabused me of that notion. As a recent college graduate, I worked in a Boston testing company, and instead of the professionals I'd envisioned painstakingly grading exams, I found a room full of temporary employees who had little respect for - and minimal investment in - their jobs.

It was my first assignment after registering with a temporary-employment agency in June. For a fee, the grading company they placed me with scores exams and summarizes the results. My job was to score the essay portion of a test taken last spring by eighth-grade students all across the United States.

Before I began working, I attended a three-day training seminar during which I studied a scoring rubric, learned a numerical scoring system, and read hundreds of sample essays graded by experts.

Several other temps and I read the same essays, scored them, and compared our results. Often there were wide disparities between our scores, and it surprised me that these differences decreased only slightly as the training progressed. It surprised me even more to learn that those disparities were acceptable. We were told - by trainers who were themselves temps - that our scores need be only within one number of the standard on a five-point scale. This meant that if an essay "should" have gotten a score of three, as long as we gave that student a two or a four, we were close enough.

All of the scorers - even all the supervisors - were temps, some with poor English skills, and many without a college degree. Perhaps the grading system seemed subjective because most of us had no background in education, were minimally trained, and therefore weren't well qualified to evaluate standardized tests.

It's no wonder our scores differed to such an alarming extent. The turnover rate was high, because our compensation was low and many people left after finding higher-paying jobs. In addition, many of my fellow scorers were young college students, and some of them scored essays written by high school students only a few years their junior.

Some scorers had been there for months before I arrived, and the work environment lacked any sense of purpose or professionalism. One of my co-workers typically came late. She spent her first few hours sending e-mail and surfing the Web, another hour reading fashion magazines, and finally, a few minutes scoring tests before leaving the office for a leisurely lunch.

Another colleague was an aspiring writer. He commonly scored a few tests, went to work on his novel, scored a few more tests, and then went back to writing.

The test I scored asked students to write about an experience that taught them something important or somehow changed their lives. Though I occasionally encountered blank booklets or profanity scrawled across the page, most of the essays were honest attempts to write well.

Of the thousands of essays I read, my favorite was by a boy who described his experience swimming across a lake with two friends. The three boys were bored one summer day, he wrote, and one friend had an idea to pass the time by swimming toward town. This eighth-grade author eloquently described his ambivalence - he wasn't athletic and had little confidence in his swimming ability, but hated to give up in front of his friends. Finally, his friends promised to swim beside him and help him out if he got tired, and the three crossed the lake successfully. The boy described the moment when he reached the distant shore - the feel of slimy moss between his toes, the cool, hard rocks beneath him, and the squishy-soft grains of sand on his feet. He'd conquered his fears and gained greater self-confidence.

I was happy to see that this boy had taken the test seriously; I was moved by his earnest, vivid account of triumph, of a mundane moment made memorable.

But I also felt, in some vague way, like these students had the wool pulled over their eyes. These essays, which inspired careful penmanship, the pinnacles of eighth-grade vocabulary, and serious reflection on the meaning of life, were being graded by temps who were more invested in the magazines they'd brought with them than in the essays they were paid to read.

After my summer spent grading, I'm wary of using standardized test scores to gauge academic progress. It angers and bewilders me to think of committees using test scores to decide which schools are "failing," how much funding school districts deserve, and how teachers' salaries should be adjusted on the basis of their students' scores.

If most standardized tests are scored in an environment like the one I saw this summer, then I don't put much stock in the results.

Cameron Fortner graduated from Stanford University in 2000 and is currently biking across the United States.

Why is Christian Science in our name?

Your subscription makes our work possible.

Who's scoring those high-stakes tests? Poorly trained temps.

Who's scoring those high-stakes tests? Poorly trained temps.

Help fund Monitor journalism for $11/ month

Unlimited digital access $11/month.

Digital subscription includes:

Related stories

Subscription expired

Session expired

No subscription