Open Access News

News from the open access movement

Wednesday, April 02, 2008

Anand Rajaraman, More data usually beats better algorithms, Datawocky, March 24, 2008. (Thanks to John Wilbanks, via Slashdot.)

I teach a class on Data Mining at Stanford. Students in my class are expected to do a project that does some non-trivial data mining. Many students opted to try their hand at the Netflix Challenge: to design a movie recommendations algorithm that does better than the one developed by Netflix. ...

Different student teams in my class adopted different approaches to the problem, using both published algorithms and novel ideas. Of these, the results from two of the teams illustrate a broader point. Team A came up with a very sophisticated algorithm using the Netflix data. Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB). Guess which team did better?

Team B got much better results, close to the best results on the Netflix leaderboard!! I'm really happy for them, and they're going to tune their algorithm and take a crack at the grand prize. But the bigger point is, adding more, independent data usually beats out designing ever-better algorithms to analyze an existing data set. I'm often suprised that many people in the business, and even in academia, don't realize this. ...

The OA connection, from commenter "Plausible Accuracy" on Wilbanks' blog:

... This is a great example of how "mashups" ... can be used to sort of bootstrap the power of a dataset. In the case of the Stanford teams, the incorporation of data from an external source enabled them to improve their algorithm. In the case of Open Access science, the ability to better combine data from a variety of studies and fields will in turn lead to more discoveries.

Posted by Gavin Baker at 4/02/2008 03:22:00 PM.

The open access movement:
Putting peer-reviewed scientific and scholarly literature on the internet. Making it available free of charge and free of most copyright and licensing restrictions. Removing the barriers to serious research.

Why the OAN volume has been low since January 16, 2010

Why I curtailed my blogging on July 1, 2009

I recommend the OA tracking project (OATP) as the best way to stay on top of new OA developments. You can read the OATP feed on a blog-like web page or subscribe to it by RSS, email, or Twitter. You can also help build the feed by tagging new developments you encounter.