Health Services Research Data Center (HSRDC)
Established in 2005, Penn LDI’s Health Services Research Data Center (HSRDC) provides data services to LDI-affiliated investigators who use highly sensitive patient information in their research. The HSRDC is comprised of secure high-performance servers within the University of Pennsylvania’s Perelman School of Medicine that have the necessary security protections to permit storage and analysis of data containing Protected Health Information by LDI-affiliated investigators and research staff. Please send any questions or comments you have about the HSRDC to HSRDC@pennmedicine.upenn.edu.
Access to HSRDC
Access to the HSRDC is available to Penn LDI-affiliated investigators. University of Pennsylvania’s graduate students, clinical fellows, and post-docs who are not affiliated with LDI will be considered for HSRDC server privileges if their mentor or collaborator on their research project is an LDI-affiliated investigator. LDI Associate Fellows are encouraged to first discuss their proposals with their faculty mentors.
In addition, faculty affiliated with Penn’s Center for Clinical Epidemiology and Biostatistics (CCEB) can access the HSRDC to use the Optum datasets.
Please contact Jibby Kurichi for further information, including costs associated with the use of the HSRDC and access for non-Penn collaborators.
Data and Documentation Requirements
Data Storage on HSRDC
HSRDC is maintained at a high-security level in accordance with federal regulations governing secure computer systems (e.g., the Federal Information Security Management Act-FISMA). Data storage on HSRDC is limited to research using data that require high security such as data with individually identifiable, protected health information. Lower security datasets that include anonymous patient surveys, de-identified data, and publicly available data should be stored and analyzed on other University resources, such as Penn+Box.
The HSRDC server cluster is designed for the analysis of “complete” databases that are uploaded by our IT administrators. Since users are not allowed to upload their own data, the HSRDC is not an appropriate storage infrastructure for data from clinical trials, surveys, or other data sources that require more frequent updates.
Data that includes protected health information obtained outside of the University of Pennsylvania requires a Data Use Agreement (DUA) specifically permitting storage of the data on the HSRDC. Please provide HSRDC staff an executed DUA prior to requesting a data upload and provide current DUA documentation to HSRDC staff annually.
HSRDC servers utilize RedHat Linux operating systems, and data can be stored and analyzed using SAS, Stata, and R. Please note that “Windows-based” programs are not available in the HSRDC environment. Users wishing to use other software are responsible for the licensing costs. Installation of the specialized software will be evaluated by HSRDC staff on a case-by-case basis.
The HSRDC is a service center within the Perelman School of Medicine that provides services for a fee. The total operating costs of the HSRDC, including support for IT, administrative personnel, software licenses, hardware maintenance and depreciation, storage space, CPU time, and database management, exceed $150,000 per year, and Penn LDI receives no core funds from the University or its schools to support this resource.
HSRDC staff can prepare a formal cost estimate for investigators submitting grant applications to be included with the budget justification. Cost estimate requests should be sent to Jibby Kurichi. Please allow five business days for a response.
For funded projects, PIs and/or their staff should contact Jibby Kurichi in the earliest stages of the project to plan the timing, scope, and logistics of HSRDC resource use. Invoices are generally sent to PIs in May for the use of the HSRDC during the current fiscal year (July–June) unless prior arrangements have been made. If payment is not received, access to the HSRDC will be disabled, and project folders will be archived and then deleted.
Available Data Resources
HSRDC houses a variety of data resources that can be shared with a range of restrictions and at varying costs. For additional information on data accessible through HSRDC, please refer to each individual dataset.
Centers for Medicare & Medicaid Services (CMS)
The HSRDC houses a variety of Centers for Medicare & Medicaid Services (CMS) data from 1998–2020. These data require individual data use agreements (DUAs).
Centers for Medicare & Medicaid Services (CMS) data stored on the HSRDC are available for reuse purposes. LDI-affiliated investigators may submit an application to reuse CMS data stored on the HSRDC under their own DUA. These may be submitted directly to the CMS Data Request Center by the investigator. Reuse of CMS data under the investigator’s new DUA will be subject to fees paid directly to CMS. Once an executed DUA is obtained and provided to the HSRDC staff, access to the HSRDC and data can be granted. Current DUA documentation must be sent to HSRDC staff annually.
There are costs associated with using the HSRDC where these data are housed. Please contact Jibby Kurichi for information on costs and to find out what CMS data is currently available for reuse on the HSRDC. For more information on CMS data, including data dictionaries and more information about the DUA request process, please visit the ResDAC website.
Optum is a clinically rich U.S. health care claims database that can be used to conduct research studies. Optum accesses a comprehensive, large, and robust proprietary health care database of Optum’s parent company. The Optum database contains health care claims from 2000 to 2022, covering more than 100 million people, including inpatient and outpatient claims, pharmacy claims, and laboratory results.
Penn faculty wishing to use Optum data need to be either LDI-affiliated investigators or affiliated with CCEB. Access to Penn’s Optum data is project-based, and all projects require LDI review and approval. Faculty members or students should submit a brief (approximately one-page) research proposal that clearly demonstrates why Optum data is appropriate to the research aims and include a timeline, motivation, and brief research design specifying who will be conducting the analyses to Jibby Kurichi.
Investigators are required to email Optum with their grant proposal, regardless of funding source, at least 10 business days prior to grant submission. Publication of any work using Optum data needs to be reviewed by Optum before submission via email. Since the University of Pennsylvania is the sole contracting entity with Optum, it is essential that all faculty (including those at CHOP, VA, or any other Penn-affiliated hospital) use and emphasize their University of Pennsylvania affiliation when publishing work using Optum data.
There are costs associated with using the HSRDC where the Optum data is housed (as outlined here).
Please Note: Externally funded (by a government or nonprofit agency) research using Optum data must follow the requirements of the Penn-Optum agreement. Research using Optum data funded by for-profit/corporate entities is prohibited by terms of Penn’s contract with Optum. Investigators are required to alert Optum within 10 days of receiving notification of the award. Fees must be paid to Optum accordingly.
Questions? Please Contact Us.
Contact Jibby Kurichi for the Optum codebook and other related documents, or for further information.
HCUP National (Nationwide) Inpatient Sample (NIS)
The NIS from 1988–2020 is stored on the HSRDC. NIS is the largest publicly available all-payer inpatient care database in the United States, containing data on more than seven million hospital stays. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations. For a description of the data elements, visit HCUP’s website. Contact Jibby Kurichi for user guides and additional documentation.
HCUP Nationwide Emergency Department Sample (NEDS)
NEDS data from 2007–2020 is stored on HSRDC. NEDS produces national estimates about emergency department (ED) visits across the country. The NEDS describes ED visits, regardless of whether they result in admission. Its large sample size allows for analysis across hospital types and the study of relatively uncommon disorders and procedures. HCUP’s website provides a description of the data elements. Contact Jibby Kurichi for user guides and additional documentation.
HCUP Nationwide Readmissions Database (NRD)
NRD data from 2011–2017 is stored on the HSRDC. NRD is a unique and powerful database designed to support various types of analyses of national readmission rates for all patients regardless of the expected payer for the hospital stay. The NRD includes discharges for patients with and without repeat hospital visits in a year and those who have died in the hospital. This database addresses a large gap in health care data – the lack of nationally representative information on hospital readmissions for all ages. Visit HCUP’s website for a description of the data elements. Contact Jibby Kurichi for user guides and additional documentation.
HCUP Kids’ Inpatient Database (KID)
KID is the largest publicly available all-payer pediatric inpatient care database in the U.S., containing data from two to three million hospital stays. Its large sample size is ideal for calculating national and regional estimates and enables analyses of rare conditions and uncommon treatments. Data from 2003, 2006, 2009, 2012, and 2016 are stored on the HSRDC. HCUP’s website provides a description of the data elements. Contact Jibby Kurichi for additional information.
Health Care Data on Wharton Research Data Services (WRDS)
WRDS provides clients with a broad collection of health care data, analytics, and the most robust computing infrastructure available, making it the global gold standard for integrated research systems. Corporate, academic, and government clients turn to WRDS for seamless data storage, management, and access, all backed by the credibility and leadership of The Wharton School. WRDS has assembled, with the expertise from LDI, a collection of health care data that are available to Penn investigators at no charge. Penn staff and faculty can sign up for an account here. Contact Matthew Cohen for additional information.
Health Economics Data Analyst Pool (HEDAP)
The Health Economics Data Analyst Pool (HEDAP) is a Penn service center supported and managed by Penn LDI to provide LDI-affiliated investigators access to high-quality, skilled data analysts. HEDAP administrative staff recruits, trains, and manages a group of Master’s-level and PhD-level statistical analysts. These analysts work with LDI-affiliated investigators across funded projects using statistical software packages such as SAS, Stata, and R to manipulate and analyze health care data under the guidance of the investigators and other collaborators.
HEDAP administration supports the professional development of the analysts and provides state-of-the-art computer equipment and programming software required to conduct health services research.
For more information about HEDAP or to view the analyst request queue, please email Jibby Kurichi.
Lizzie Bair, MS
Xinwei Chen, MS
Zhi Geng, MPH
Qian (Erin) Huang, MPH
Seiyoun Kim, PhD
Sue Kim, MS
Junning Liang, MS, MS
Angira Mondal, MS
Selina Pan, MSed
Saehwan Park, PhD, MS
Chen Peng, MPH
Charles Rareshide, MS
Kaitlyn Shultz, MS
Chuxuan Sun, MPA
Jinming Tao, MS
Erkuan Wang, MA
Jingyi Wu, MS
Yaxin Wu, MS
Ruiying (Aria) Xiong, MS
Lin Xu, MS
Lin Yang, MS
Yueming Zhao, MS
Yu Zhao, MS
Song Zhong, MSSPDA
Apply to an Open Position
If you are a statistical analyst looking for a new and exciting place to work supporting health policy-related research projects conducted by investigators at Penn LDI, please email your resume to Abby Kearns.
We are looking for analysts who have programming skills using SAS, Stata, or R to create analytical data sets from clinical trials, surveys, and health care claims data, construct and standardize outcome measures and other analytical variables, provide descriptive and analytical reports, and perform specialized statistical analyses.
Qualifications include a minimum of a Bachelor’s degree in Mathematics/Statistics, Health Care Management, Economics, or Public Health, and three (3) years of related experience, or an equivalent combination of education and experience required.