Saturday, December 11, 2010

Scary Statistics Fail: No Dog Bite Epidemic

Media hype and official government studies to the contrary, there is no epidemic of hospitalizations resulting from dog bites. The data presented are consistent with a level rate of hospitalizations from 1997 to 2008. The fluctuations during 1994 to 1996 suggest that the growth over this time period may be an artifact of the source data and the statistical methods used to estimate national rates.

The New York Times reports, "The number of Americans hospitalized for dog bites almost doubled over a 15-year-period, increasing to 9,500 in 2008 from 5,100 in 1993." Other MSM outlets picked up the story, also uncritically. The Times quotes the study author, Dr. Elixhauser. “It’s really kind of frightening, and unfortunately, we’re at a loss to explain it."

Adam Ozimek asks "What is going on here?". Jezebel posts a picture of a snarling dog. KC Dog Blog tries to cut through the media hype but misses the statistics angle.

Commenter reactions on various blogs nearly all accept the study findings at face value. A feeding frenzy of pontification on badly-behaved dogs and badly-behaved dog owners ensues. Let's take a look more closely at the actual study, skipping straight to the charts at the back. I've reproduced Figure 2 from page 11 below.

The first thing you should notice is that there is no discernible trend line from 1997 through 2008. The second thing you should notice is that the rate was highest in 1995. A researcher wishing to tell a different story could easily cherry pick this time series and state that dog bite hospitalizations fell nearly 20% from 1995 to 2007.

The data show a very sharp increase sometime between 1994 and 1997. The increase is too sharp to be a trend. Perhaps this is an anomaly in the data. Let's check out the data set used to produce the research report, the AHRQ Nationwide Inpatient Sample. Now using this sample is so complex, AHRQ provides a 45 page pdf with detailed instructions. One of the problems is the changing composition of the sample over time. In 1993-1994, this sample included data from 17 states, increasing to 19 states in 1995-1996, and 22 in 1997. By 2007 there were 40 states in the sample.

Surely it must be difficult to calculate long-term trends from such a dataset. Perhaps the changing composition of the sample caused the rate fluctations during 1993 to 1997. From the historical overview (page 3), we learn that in 1995, Missouri and Tennessee were added to the sample. Georgia, Hawaii and Utah were added in 1997. On page 9 we learn that the South and Midwest regions were under-represented in the sample during this time frame. We might have a story here, but the reported rate is highest in the Northeast, where the sample is most complete. Let's keep looking.

The study indicates that their numbers are pulled from HCUPnet, an online querying tool for the Nationwide Inpatient Sample. It's not too hard to navigate, and I show that I can replicate the reported value for 1997. I encourage you to give this a try on your own. (What good is all that government data collection if none of us know how to use it?) What I find striking is the standard error: about 6%. The error terms are large enough to potentially explain variations between 2.7 and 3.1 as noise. But they don't explain 1993 or 1995.

1997 National statistics - all-listed

You have chosen all-listed diagnoses. The only possible measure for all-listed diagnoses is the number of discharges who received the diagnoses you selected. If you want to see statistics on length of stay or charges, go back and select "principal diagnosis."
E906.0 Dog Bite
E906.0 Dog BiteStandard errors
Total number of discharges7,686448

Weighted national estimates from HCUP Nationwide Inpatient Sample (NIS), 1997, Agency for Healthcare Research and Quality (AHRQ), based on data collected by individual States and provided to AHRQ by the States. Total number of weighted discharges in the U.S. based on HCUP NIS = 34,678,703. Statistics based on estimates with a relative standard error (standard error / weighted estimate) greater than 0.30 or with standard error = 0 in the nationwide statistics (NIS, NEDS, and KID) are not reliable. These statistics are suppressed and are designated with an asterisk (*). The estimates of standard errors in HCUPnet were calculated using SUDAAN software. These estimates may differ slightly if other software packages are used to calculate variances.

What else do you need to know? A bit about medical coding. In 1993 the average inpatient claim had about 4 diagnosis codes on it. By 1997 there were over 4.5 diagnosis codes on an average claim, and 5.7 by 2004. See page 26 of the NIS trends report. It is entirely possible that better practices in capturing "E codes" for causes of injury improved during this time frame.


  1. Excellent blog post! I am inclined to agree that most danger (not just animals) is wildly overstated in popular press.

    It does strike me as odd that the NYT, with their political proclivities, would publish such an article.

    In any case, I'm still mystified by the early nineties spike.

    As a simple question...why BlogSpot? I tend to regard it as the most failed blogging tool.

  2. Thanks for the encouragement. Crafting this post in blogspot was painful. I'm already ready to switch platforms.