Thursday, April 19, 2012

Visualizing Word Usage (in Science!)


Uncovering historical trends can be a bit of a dark art. But in recent years, engineers and researchers have made it easy for the general public to quickly search through enormous sets of data. Google's Ngram viewer, for instance, allows users to compare word usage since 1800 in its massive digital library of books.

Ngram is pretty simple: input some words or phrases that you want to compare (e.g. "the Beatles," "Albert Einstein," and "Elvis Presley,"), pick a time range and press enter. Google then outputs a nice looking graph of the relative popularity of these terms in books throughout a given time period. If you haven't already, give it a try!

While the Ngram Viewer has been around for over a year, a similar project that tracks trends in scientific papers was recently unveiled. Called bookworm arXiv, this new tool works just like Ngram, but it searches through preprints of academic papers that scientists have posted on arXiv.org. Now it's much easier for anyone to search for the hottest research trends with a few keystrokes and mouse clicks.

Researchers at Harvard's Cultural Observatory developed the bookworm application, and they've added a few more features than Google's Ngram Viewer:
  • You can graph terms within physics subfields such as astrophysics or particle physics. For instance, you can graph how frequently "supernova" appears within the field of astronomy and compare this to the frequency of "graphene" in condensed matter physics.
  • You can compare how frequently certain universities or institutions publish to the arXiv. Now different universities can fight for the bragging rights of having the most influential arXiv authors.
  • You can graph papers by email domain, allowing you to focus your results to a certain country or language.
Not only can you use all of these features individually, but you can also mix and match as much as you please, leading to a highly targeted search for science trends. Below are a few examples of what you can do.

The LHC vs. Fermilab's Tevatron

The percentage of high energy physics papers that mention the Large Hadron Collider (blue) and Fermilab's Tevatron accelerator (red). More than half of these papers mentioned the LHC by 2011.

Graphene, Supernovae and Neutrinos

The appearance of graphene (blue), supernova (red), and neutrino (green) in all articles on the arXiv.

Ice Cream Battle: Chocolate vs. Vanilla

Physicists clearly prefer vanilla. Case closed.
Although arXiv papers, such as the traffic ticket paper we covered last week, aren't peer reviewed, this tool can still give you a fascinating glimpse into physics research. Submitted papers must come from a respected university or institution, so most of the articles detail legitimate research. Now go out and uncover the next trend in physics research!

Links:

Google's NGram Viewer

Bookworm arXiv

Update: Updated the top image with capitalized names because Ngram is case-sensitive.

2 comments:

  1. Google's Ngram viewer is case-sensitive. You should search for "Albert Einstein", not "albert einstein".

    The arXiv tool does not appear to be case-sensitive.

    ReplyDelete
  2. Thanks for catching that Matthew. I've updated the post with a new chart that has capitalized names.

    ReplyDelete