Can Computerized Grading Replace Traditional Writing Assessments?

by

June 27, 2012

Ah, grading: the scourge and delight of the college instructor’s existence. At certain points of the semester, as piles of papers reeking of desperation and panic accumulate on my desk, I often despair of ever seeing daylight again, of once again spending time with my husband, sleeping, or having a weekend free to do things like clean the house, fold laundry, and grocery shop- all the things I neglect when rushing to return papers to students in a reasonable amount of time.

Thankfully, there are always a few papers that teach me new and fascinating information and analyses that promise to set the world of historical knowledge on fire with their ingenuity and insight. In recent years, for example, I learned that the Manhattan Project “was a plan to clean up the city,” that anti-Semitism “was not good for the Jews” in Nazi Germany, and that Elizabeth and Katie Stanton were leaders in the women’ rights movement, Katie Stanton apparently being The Lost Suffragette? How could I give up the opportunity to learn these fascinating and heretofore unknown facts?

Fairly easily, some seem to think, as computerized grading is more frequently proposed as an option that will improve student assessment. Certainly computerized grading has been around before now: where would standardized tests be without bubble forms and computer programs that rely on multiple choice questions? What’s different this time is that new programs promise to evaluate student responses on essays, which are a completely different animal. But Educational Testing Service (ETS), the enormous education prep and testing company responsible for administering the AP, CLEP, GRE, TOEFL, and Praxis exams for millions of students every year, recently announced that its computers can score 16,000 samples of writing in 20 seconds. That’s MUCH better than I can do even on a good day, when I’ve had a lot of caffeine and neglect to shower or answer the phone.

The Pros of Computer Grading

Computer grading programs were created by software developers who fed thousands of traditionally-graded essays into computers, extracted from them the key elements that were deemed by the graders as necessary to high-scoring essays, and created programs that identified these elements in ungraded essays, assigning evaluations to them based on which components of high-scoring essays were present in the ungraded papers. Thus, as explained by Molly Bloom on National Public Radio (NPR)’s Morning Edition,

“If human graders give essays with long sentences high marks, for example, the programs will tend to do so, as well. If human graders like big words, the programs will also, say, ‘manifest a tantamount predilection for meretricious vocabulary.’ “

There are many advocates of computerized grading of writing assignments, because traditional grading is definitely labor intensive, takes time, and can be very subjective. Those who believe that computerized grading can revolutionize education argue that it:

  • Costs less than professional faculty because it cuts down on the time required to complete the task.
  • Takes less time and guarantees more consistent standards because they are applied uniformly to all essays.
  • Provides instant and personalized feedback that students can use to improve their work.

The Cons of Computer Grading

However, critics point out that there are significant drawbacks to computerized essay grading that actually undermine the goals of writing assignments. Les Perelman, director of the student writing program at MIT, also told NPR that one of his concerns is rooted in the way that the programs were designed. Savvy students can simply throw in the elements that the computer scores highly, without paying attention to the larger logic or argumentation of the essay. Perlman argued, according to Bloom, that “it’s possible for students to score an A on a computer-graded essay simply by combining all the elements of an essay that would be scored highly by a human grader.”

In addition, Will Fitzhugh, founder of The Concord Review and the National Writing Board, argues that, “The new computer-scoring programs don’t waste any time thinking about the content of the work they are evaluating, and, in their rush to do a lot of writing ‘assessments’ real fast and very cheaply, perhaps those promoting those programs don’t spend a lot of time on that part, either.” He believes that the computer-grading process may encourage teachers to assign shorter papers and essays, discouraging the development of extended critical thinking and writing skills. He writes, “Thus, students’ greatest writing efforts in high school could be devoted to their 500-word ‘college essay,’ instead of, for example, the 4,000-word extended essay they would need to write for an International Baccalaureate diploma.” He compared the short essay and computer grading process unfavorably to the importance of spending time thinking, citing the example provided by the Pulitzer Prize-winning historian David McCullough, who claimed the people always ask him how long he takes to write a book and fail to ask, “how much time he spends thinking.”

Why Traditional Grading is Still Necessary

As a professor with more than 20 years teaching-and grading-experience, I think we have lost focus on the real role that teachers play in a student’s life when we hand over to technology the most personal aspects of education, such as the interpretation and evaluation of students thoughts and ideas. The most important reason for traditional grading, as I see it, is that grading student work is absolutely the best way to determine the strengths and weaknesses of the student learning experiences, their individual voices, and in some cases, important issues that affect their health and well-being. For example, when my student misheard “Elizabeth Cady Stanton” as “Elizabeth and Katie Stanton,” I learned that not only had she not completed her reading assignments (which clearly spell out Stanton’s name), but she also was having difficulty hearing class lecture and discussion.

Also, having spent a few summers as a Reader scoring Advanced Placement (AP) History exams for high school students, I can honestly report that the grading by hand, while relatively quick-perhaps 10 minutes per question-is very thorough. Sitting at tables with 8 or so experienced educators, we have examined and debated scoring rubrics for each question, share assessment problems that come up, and look for consensus on the answers through discussion and double-scoring, in which each student answer is scored twice by different Readers. When we notice a particular trend or problem in a group of essays, we highlight them and it is communicated to the schools. I trust this method of evaluation, which is holistic and qualitative and gives credit for more than computerized checklist items, as a more accurate barometer of a student’s academic ability.

Finally, how many students who use their written assignments as cries for academic help will be lost when we turn grading over to the indifferent calculations of computer programs? If I simply examine a print out of computerized scores, and don’t look at the papers in detail, where will I see the little asides and notes, the knowledge gaps and grammatical miscommunications, which will tell me where the holes in my instruction are and where the students need extra content reinforcement? And how on earth will we really be able to get a handle on how-not how much-individual students learn, if we don’t actually examine the products of that learning in qualitative as well as quantitative ways?

We won’t, actually. To me, that’s a failure -and I don’t need a computer to explain that to me.

Follow Jill Rooney, Ph.D. and join the conversation on Twitter and Google+.

Facebook Comments