Open Access News

News from the open access movement

Saturday, August 04, 2007

Color-coding Wikipedia entries by trustworthiness

Brock Read, Software Weighs Wikipedians' TrustworthinessChronicle of Higher Education blog, August 3, 2007. 

The problem with Wikipedia, as most scholars see it, isn’t that the site lacks credible information. There’s plenty of good stuff in the encyclopedia; it’s just that there’s no easy way to separate the wheat from the chaff.

Researchers at the University of California at Santa Cruz are trying to make that process simpler. They’ve designed software that color-codes Wikipedia entries, identifying those portions deemed trustworthy and those that might be taken with a grain of salt.

To determine which passages make the grade, the researchers analyzed Wikipedia’s editing history, tracking material that has remained on the site for a long time and edits that have been quickly overruled. A Wikipedian with a distinguished record of unchanged edits is declared trustworthy, and his or her contributions are left untouched on the Santa Cruz team’s color-coded pages. But a contributor whose posts have frequently been changed or deleted is considered suspect, and his or her content is highlighted in orange. (The darker the orange, the more spurious the content is thought to be.)

The researchers, led by Luca de Alfaro, an associate professor of computer engineering, have posted 1,000 demonstration pages on their Web site, and the samples show that the sorting process is pretty acute. Some articles, like a lengthy entry on the Curtiss P-40, a World War II-era fighter plane, get a nearly clean bill of health. Others, like an article on crochet, fare pretty well. And then there are entries, like a write-up on Polish Christmas traditions, that are drenched in orange.

Because the software assesses the histories of Wikipedia posters without actually fact-checking, it won’t necessarily direct people to Wikipedia’s best, most academically rigorous articles. But the program might be a useful tool for professors who want their students to examine closely how Wikipedia works rather than take it as gospel.

Comment.  Interesting approach. 

  • Some bad entries go uncorrected because few people read them.  Hence, I’d trust an entry more if it had a low rate of overwrites and a high rate of readership.  Could the algorithm take the extra variable into account?
  • I don’t expect (hope or fear) that algorithms will replace human judgment any time soon.  But there’s no doubt that they can supplement human judgment and already do.  I do look forward to how they can improve as human helpers.  Contrary to sci-fi fantasies, they don’t have to become infallible to cross a significant threshold; they only have to become roughly as fallible as peer review.  And they can be useful immediately, i.e. long before crossing that threshold.