when things go wrong, in both ways

By Alexis Xie

Statistics can help journalist to make a better understanding of data and the story hide behind it.

I tried to find some journalism pieces that can match what we have learned in the past, such as mean, median, SD, experiments, correlation etc, and somehow I ended up looking at the observation experiment, so I think I will bring two examples and too illustrate my ideas of how statistic was used in the data journalism report.


It tells you that top 10 percent of American adults, 24 million of them, consume and average of 74 drinks per week, or a more than 10 drinks per day.

屏幕快照 2015-11-07 下午8.27.51

data from Phillip Cook, which he used 2001-2002 data from NESARC, the National Institute on Alcohol Abuse and Alcoholism.

He “corrected” the data for under-reporting, and multiplied the number of drinks by 1.97, in order to comport with the precious year’s sale data for alcohol in the U.S.

The alcohol sales in the US in that year were double what NESARC’s record.

Even when you are drunk, do you always multiple the number of your drink that you had last night?

Problems here:

1, although the reporter double-checked the figures with Cook, but he simply trusted his word. He did not actually take a look at the Alcohol sales number that year and try to explain why the numbers don’t match. Eg, gifts? Who bought the drink and what did they do with it? What are the possible variable there can change the outcome of this experiment?

2, the data isn’t complete. People who is under age of 18 years old although is not legal to drink, but they do it anyway. This chart can not be representative for the American’s national drinking situation.

3, simply just multiply the number of NESARC is just not right, statistically and ethically.

It turns out that Cook is an advocate for high taxes on alcohol, and he claims that the policy should curb those 10 percent people not to buy much alcohol.

It is clear that the outcome of this observation is misleading, because the reporter only took under coverage data, which is when some group of population is given no chance or much smaller chance in the sample, More importantly, this reporter also trust over-representation of people with strong opinions, in this case, Mr. Cook.

This case remains me that when it comes to data journalism, and the numbers and charts, do not get excited when you see an outrages number, verify it yourself, by using common sense and simply just calculate the number, if it doesn’t match, something is probably wrong.

Like most of the dramatic drop and rise in any charts, there is going to be a explanation about it, go find it, and do not let anyone manipulate your article by putting their ideas into your conclusion of reporting.

Sadly, this post gained 38,000 likes on Facebook, and the follow up report is only 8.

The other example is “The young invincible are primarily men.”


屏幕快照 2015-11-07 下午11.40.17

this chart shows that young men who are uninsured, is much higher than women, and gender gap remains for decades after that.

The hypothesis will be: men are more invincible, not easy to get sick and in a better healthy condition, which is what we called “Bias “in an observation study.

The reporter points out that, there are many variables that caused this gender gap in the data. It turns out that women receive health insurance thought government insurance, and it caused then tend to incur higher medical costs than men, even not including the costs of maternity care.

The article writes, “young women are also more likely to be custodial parents than men are, and kids of course have lots of health-care needs (starting with vaccinations at very young ages). Some research has found linkages between parents’ and their kids’ consumption of health-care services.”

This is a good example of not only look at a dataset and simply assume the reason of some dramatic difference, instead, to look for the reason of the differences, by using statistic experiment methods.

U.S department of health and human services data

Agency for healthcare research and quality:

Men are 24 percent less likely than women to have visited a doctor within the past year and are 22 percent more likely to have neglected their cholesterol tests.
Men are 28 percent more likely than women to be hospitalized for congestive heart failure.
Men are 32 percent more likely than women to be hospitalized for long-term complications of diabetes and are more than twice as likely than women to have a leg or foot amputated due to complications related to diabetes.
Men are 24 percent more likely than women to be hospitalized for pneumonia that could have been prevented by getting an immunization.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s