Some challenges of conducting researches using administrative database
Source:    Publish Time: 2013-01-06 05:24   2580 Views   Size:  16px  14px  12px
But, to conduct a rigorous study using administrative database is not an easy job. It requires the expertise in several different perspectives. I would like to discuss some issues below briefly.

Author: Xuanqian Xie


In recent years, medical researches using administrative database has been becoming popular. Although the administrative database can be used to conduct many types of researches, I think it is more appropriate in the following conditions. Firstly, it can be used to investigate the unintended effects of treatment, i.e. safety and drug interactions. Secondly, the large database allows researchers to examine the rare disease. Usually, we never worry about the small sample size or small number of events in the database research. Thirdly, we can use the administrative database to study the health care utilization in the routine practice, and healthcare practice patterns. In addition, we can use the database to estimate the incidence rate and the prevalence rate of a disease by an appropriate design.          

But, to conduct a rigorous study using administrative database is not an easy job.  It requires the expertise in several different perspectives. I would like to discuss some issues below briefly.

Understanding the database:  We need understand the meaning database, the variables, as well as their implications. Usually, the administrative database is dynamic, since peoples can join in or quit the healthcare plan (insurance) at any time. After peoples quit this plan or switch to another plan, the information of those peoples is unlikely to be tracked. Sometimes, one person can have more than one healthcare package, such as a basic one supported by government plus a comprehensive plan covered by the commercial company. We need figure out how many health care services and outcomes were not recorded in the database we studied.

In practice, we can select the relatively stable population, such as aged people, who have small chance to immigrate to other regions, and/or to change healthcare plan. Also, one dataset may be appropriate for studies in certain fields, but not all. Some information in the administrative data can be less precise. Compared with prospective clinical trials, there are more errors or missing values in administrative dataset. Thus, before using the dataset, we need check the validity of the dataset for research. To understand the implications of variable is also important. For example, we can use the postal code to distinguish peoples who live in city or country, and even to classify their social economic class, according the region they live. Furthermore, we can use the postal code to estimate how conveniences of people to receive the healthcare services, and so on.

The diagnostic code and procedure code:  Many databases use ICD-9 and ICD-10 (International Classification of Diseases and Related Health Problems) to classify disease. Also, some databases use CPT-4 code (Current Procedural Terminology) to document the procedure and services provided to the patients (usually the information is obtained from physician’s bills). Usually, the billing data is more reliable than diagnostic information. For example, if we identify a patient with a disease, but we do not find the records of corresponding treatments, indicating that possibly the patient was wrongly diagnosed of this disease. In addition, research need familiar with the codes of medications, too.


The statistical methods: Since the patients cannot be randomly assigned to different treatment groups in clinical practice, the observational studies often use the complex statistical model to adjust the selection bias. The common statistical tools include propensity score (matching, stratification and regression) and instrumental variable. It can be more complicated in the analysis of longitudinal data (often with certain proportion mission data) and cost effectiveness analysis (two outcomes, and usually skewed cost or healthcare service usage data).  


The data management skills: An administrative dataset usually has a couple of separate datasets for inpatient service, outpatient service, the medication use, and patients’ characteristics and so on. To prepare the data for statistical analysis, researchers have to link the datasets, restructure data, match, join (one-to-many and many-to-many), and so on. (Actually, I do not think it requires advanced knowledge of data management, but most universities do not teach much data management in the statistics courses, so it becomes challenging for the junior researchers.)


One researcher may not have all expertise required for the studies using administrative database. Thus, the well cooperation and frequent communications between researchers are important. Until now, there are no many guidelines for administrative database studies. Motheral et al. developed one and published in 2003 (Motheral, 2003). It is helpful to read this article before conducting such studies.




Motheral B, Brooks J, Clark MA, et al. A checklist for retroactive database studies -Report of the ISPOR Task Force on Retrospective Databases. Value in Health 2003; 6:90-7

Faries D, Leon A.C., Haro J.M., Obenchain R.L. Analysis of Observational Health Care Data Using SAS, 2010, SAS Institute