This is the Future

Because good geeky content matters

Correlation vs Causation: How statistics can be skewed to fool us

The internet, in so many ways have benefited us as human beings.  We can chat with someone across the world instantaneously, we can quickly search the meaning of a word, or translate a website with a click of a button.

While the internet in many ways have made us so much more knowledgable about subjects, it has also made us lazy about true research and the scientific method.  There are so many blogs, articles about any and every subject.  The problem is, many of these blogs are written from an uneducated view of the subject.  The problem is, much of what we read on the internet in regards of scientific subjects is wrong.

Most peer reviewed (meaning making sure the data isn’t falsified) scientific journals are only available with a paid subscription to the scholarly journal.  This means that we are reading mostly non peer reviewed data, in other words, a potential for what us mathematicians like to call proofiness: “the art of using bogus mathematical arguments to prove something that you know in your heart is true, even when it’s not.”-Charles Seife, Proofiness

Proofiness appears everywhere in our society, in elections, blogs, commercials, basically anywhere that someone can give information (however false) to have their way.  Humans are naturally selfish, so of course this happens more often than you might suspect.

So let’s talk correlation vs causation.  This is one of the most ways proofiness is prevalent in our society.  First, definitions.  Correlation is any data that shows a dependance in a statistical relationship.  And there we have the first issue, we see the word dependance and we automatically assume this means “a causes b”.  However, mathematical dependence does not imply casuality: relationship between two occurrences in which the second is understood as a consequence of the first.

Let’s look at some concrete examples: the chart below shows a positive correlation between the amount of Internet use a society uses and life expectancy.  Now most people can logically deduce that more Internet use does not cause a higher life expectancy.  There are much more factors that go into life expectancy, such as medical advances, healthier eating, not living in a war torn country, for example.

From Proofiness, Page 43 by Charles Siefe From Proofiness, Page 43 by Charles Siefe

The problem is, many companies, politicians, and people in general assume that because of a correlation, there is enough information to state that a causes b. In the mid-1990s, a correlation graph came out implying that the artificial sweetener Nutrasweet was causing an alarming rise in brain tumors. Here’s the problem, brain tumors were going up at the time, but so was cable TV and Walkman players.  You could have very well said that Walkman players was causing brain tumors by plotting a similar graph.  An even tighter relationship showed that deficit spending and brain tumors also going up!  But we realize the issue here.  How could deficit spending cause brain tumors and it has an even tighter relationship than the NutraSweet?

So what caused the rise in brain tumors during this time then?  The real answer is much more unclear.  But, looking behind the scenes, we find that MRIs became much more prevalent during this time, as well as medicare approving patients for MRI.  A good guess then, for the rise of brain tumors is simply that we became better at diagnosing them, as well as people had better access to MRI scans.

These are the reasons we must be much more careful when we see a headline that says “such and such causes brain cancer!  or allergies or autism” etc etc. You see sometimes, data changes because of a change in diagnostic criteria, just like what happened with the brain tumors.  Much like in autism, where it does seem like there has been an increase in the early 2000s sending quite a few positive correlations we can graph and lots of scares for many parents.  What it is most likely, is the fact that doctors changed the diagnoses of a few different mental conditions to one single name: autism.  This happened over the same period that California saw a three fold rise in autism.

This all goes to show that finding out “a causes b” is much harder than simply saying “there is a mathematical relationship between a and b.”  As people with so much access to information, we have an even greater responsibility to seek out and expose these bogus statistical analysis and to protect ourselves and others from believing everything we see on the Internet.

Simply put, when you see a friend post a blog or an article, be aware of any graphs or charts.  Note if they only have a correlation graph and see if they try to say “a causes b” just from that graph.  Look for peer reviewed articles, or anything with .gov or .edu.  As anyone can make a website, .com types may be less reliable as anyone can make one.

For more information on bogus math, read Proofiness by Charles Siefe (M.S. in Mathematics), which is what I based my blog post on.  I highly suggest this book as it goes into more detail on how people can use bad math to get their way in elections, medical studies, and advertising.