Opioid deaths: 2017 figures

The day after my earlier post I found that the CDC had released the 2017 files. Here is the updated version of the plot that shows the histories of death rates in the 12 counties with the highest rates. Click to enlarge.

The numbers just keep going up. I know this is a highly selective snapshot (by definition I am looking at the worst hit counties) but the full national picture is sobering. From the CDC's 2017 report:

  • Average rate in 2017 was 10% higher in 2017 than in 2016
  • Amongst the synthetic opioids (e.g. fentanyl and tramadol) the increase was 45%

Opioid deaths: rates or counts?

The CDC have just updated their annual report "Drug Overdose Deaths in the United States, 1999-2017". The average age-adjusted rate in 2017 is almost 22 per 100,000; up 10% on the 2016 figures. Some individual states are well above these numbers: the highest death rates are in West Virginia (58 per 100,000), Ohio (46), Pennsylvania (44) and District of Columbia (44).

What does this look like at the county level? The CDC make the Detailed Mortality Files—on which the above report is based—available through WONDER, the Wide-ranging OnLine Data for Epidemiologic Research. Unfortunately this has not been updated with the 2017 numbers, so a county-level analysis can only go through to 2016.

The counties with the highest death rates are shown below (click to enlarge).

As a New Mexico resident I am not surprised that Rio Arriba is at the top. Its largest town, Española, has been identified as a "drug capital of America" for a decade or more (for example, see this Forbes article from 2009). The West Virginia counties are not surprising either, given the state figures.

But the figures that really shock me are those for the counties in Maryland and Ohio. Yes, the rates are somewhat lower (though still north of 50 per 100,000) but the populations of these counties are much higher than you see in counties in NM and WV. The numbers of deaths in Baltimore, Montgomery, Butler, and Clermont Counties are—to my eyes—startlingly high.

This raises the question of how to measure "worst". Is it death rate (i.e. deaths per 100,000 population), deaths, or some combination? Although the use of rate has a mathematical appeal, I think it carries an inherent assumption that may not be correct. Specifically, that if county A has 10 times the population of county B, the threshold at which local services get overloaded is also 10 times higher in A than it is in B. I suspect (though I don't have evidence for this) that this is not the case and that the threshold in A would be substantially less than 10 times the threshold in B.

Note: The R code for the above plots and a discussion of assumptions is available here.

Percentiles in Tableau

The Tableau Support Communities contain several threads on how to calculate percentiles. Here is one that dates back to 2011 and is still going strong. It seems that historically (i.e. pre version 9), the calculation of percentile required all sorts of homegrown calculated fields that use Tableau's LOOKUP() and WINDOW_*() functions and other abstruse and barely documented features of Tableau's inner workings.

Now that we have the PERCENTILE() function and Level-of-Detail calculations, it seems to be a lot simpler. Here is the code that I use to tercile the items on all the orders in Tableau's "superstore" dataset by sales value:

IF [Sales] > {FIXED : PERCENTILE([Sales], 0.667)}
    THEN "H"
ELSEIF [Sales] > {FIXED : PERCENTILE([Sales], 0.333)}
    THEN "M"

Dropping this dimension into a crosstab confirms that (i) each tercile contains the same number of items and (ii) the minimum and maximum of each tercile do not overlap.

tercile minimum sale/$ maximum sale/$ count
H 127.96 22,638.48 3,329
M 25.20 127.95 3,334
L 0.44 25.18 3,331

Isn't there a term missing from the LOD expression?

Yes. All the documentation I have found suggests that the first of my LOD expressions should look like this:

{FIXED [grain] : PERCENTILE([Sales], 0.667)}

Omitting the "grain" qualifier seems to cause the expression to be evaluated at the finest grain possible, namely the individual row within the dataset. In this case, that is just what I want.

Sidebar: Why do I want to tercile anyway?

Splitting a continuous variable into discrete ranges aids communication and non-experts' interpretation of results. But how many discrete ranges should one use? Well, that depends on (i) the question you are trying to answer and (ii) the established practice in that particular discipline. For example, in pharmaceutical sales everything gets split into deciles: the things that a pharma rep does with a decile 10 physician are very different to the things she does with a decile 1 physician.

Personally, I like splitting into an odd number of ranges as it allows some items to be average. That central category contains the peak of the bell-curve and some stuff either side: in many cases I have found that this provides a better mapping of my analysis to the real-world problem that the analysis is attempting to solve. (I suspect that this is the flip-side of the problem in social sciences about whether a Likert scale should contain an odd or even number of terms; see link for discussion.)

Here is more evidence to support the odd-is-better-than-even position: Beyond the median split: Splitting a predictor into 3 parts.