# Improving on 'n-1'

A user-friendly tool for estimating the proportion of TB incidence due to recent transmission

#### Authors: Parastu Kasaie^{1}, Barun Mathema^{2}, Andrew Azman^{1}, Jeff Pennington^{1}, David W. Dowdy^{1}

###### 1 - Department of Epidemiology, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD

###### 2 - Public Health Research Institute Tuberculosis Center, International Center for Public Health, Newark, New Jersey

#### Input your values for the six variables below to get an estimation of TB in your area that reflects recent transmission

##### In order to use the model, please provide the following information (in the requested scale), and hit "Go!"

Description | Input Box | Symbol | Acceptable range |
---|

### Output

Estimated proportion of active tuberculosis due to recent transmission: %

Linear Regression Model

c = C/SS, n = N/SS

Equation:

### Background

In developing public health responses to tuberculosis (TB) epidemics, it is often important to estimate the proportion of active TB cases that result from recent infection versus reactivation. This is traditionally done with molecular epidemiological data (e.g., DNA fingerprinting), but the traditional ('n-1') method for converting these data into estimates of recent transmission is known to carry substantial bias.

### Method

We develop a stochastic, individual-based simulation model of TB epidemic to model the long term dynamics of transmission and strain-clustering. Simulations are carried out across a variety of epidemiological settings and study conditions (defined by population coverage of fingerprint data and duration of data collection). In each experiment, we compare the clustering estimates from 'n-1' method with the true level of recent transmission in the model, and compute the estimation bias. Using these simulations, we developed a simple regression-based tool to better estimate the recent transmission proportion, as a function of four inputs: TB incidence, proportion of observed cases that are clustered, population coverage of DNA fingerprint data, and duration of time over which fingerprint data were collected.