Volume 47 Number 8, 2010
Pages 679 — 688
Abstract — The Department of Veterans Affairs (VA) provides integrated services to more than 25,000 veterans with spinal cord injuries and disorders (SCI/D). VA data offer great potential for providing insights into healthcare utilization and morbidity, and these capabilities are central to efforts to improve healthcare for veterans with SCI/D. The objective of this article is to introduce researchers to the use of VA data to examine questions related to SCI/D using examples from Spinal Cord Injury (SCI) Quality Enhancement Research Initiative studies. Sources of VA data available to investigators interested in SCI/D-related research include national-level VA administrative and clinical databases and primary data (medical record review, patient surveys). Methods used to identify veterans with SCI/D include the Allocation Resource Center cohort, the Spinal Cord Dysfunction (SCD) Registry, and the VA inpatient SCI flag; only 33% of veterans were included in all three groups (n = 12,306). While neurological level of SCI was unknown for approximately a third of veterans (from SCD Registry data alone), the percent decreased to 13% when augmented with diagnostic codes. Primary data can be used to augment other missing SCI data and to provide more detailed information about complications commonly associated with SCI/D.
Key words: data quality, paraplegia, pressure ulcers, rehabilitation, respiratory, spinal cord injuries and disorders, spinal cord injury, tetraplegia, VA, veterans.
Of the estimated 250,000 individuals in the United States with serious spinal cord injuries and disorders (SCI/D), approximately 42,000 are veterans [1]. At present, the Department of Veterans Affairs (VA) provides care to more than 25,000 veterans with SCI/D [1]. The SCI/D system of care in VA, coupled with the extensive data sets available and the size of the population served, places VA in a unique position to address gaps in the empirical literature and to apply research to improve healthcare for veterans with SCI/D. Accomplishing this research, however, requires strategies to address the challenges of using data collected for clinical purposes for research. These challenges include identifying cohorts, addressing missing data, and addressing problems with specific conditions. The objective of this study is to investigate and discuss the usefulness of VA databases for researchers interested in studying veterans with SCI/D. The aims are to (1) describe and compare methods of identifying veterans with SCI/D and (2) assess the value of using multiple data sources to address data missing from the Spinal Cord Dysfunction (SCD) Registry. We also discuss examples of important topics in SCI/D research that cannot be addressed solely using administrative data and describe potential solutions.
VA provides comprehensive, integrated healthcare services to veterans with SCI/D, which is organized and delivered through a "hub and spoke" system that is composed of 24 spinal cord injury (SCI) centers ("hubs") and 134 "spokes" that have SCI outpatient clinics or SCI designated primary care teams. Multidisciplinary teams at regional SCI centers located in VA facilities deliver primary care, acute rehabilitation, disability management, ongoing rehabilitation, health maintenance (including a comprehensive annual assessment that is required by VA policy), and lifelong care for veterans with SCI/D.
In addition to its nationwide system of care for SCI/D, VA is a leader in the use of informatics to improve patient care. VA providers can simultaneously access all electronic medical records that exist for a patient across all VA facilities. The Computerized Patient Record System, a computer application used by clinicians to access the medical record when providing patient care, is an interactive component of the larger local database management system, the Veterans Health Information Systems and Technology Architecture (VistA). VistA clinical modules include extensive information about laboratory results, medications, mental health, outpatient visits, nursing notes, and other clinical and quality improvement functions [2]. VistA is maintained locally at the facility level, and as a result, some variations are found in the databases across sites. Selected data from VistA, including demographic information, bed section and clinic information, diagnostic and procedure codes, pharmacy data, and some laboratory data, are collected and aggregated at the national level at the Austin Information Technology Center. These aggregates comprise several VA national databases that include inpatient, outpatient, laboratory, and pharmacy data files and that all also include measures of healthcare utilization and/or costs [3-5]. This database is an especially important tool for SCI researchers because patients with SCI/D may receive care through multiple VA sites.
To accomplish our objectives, we used data from several SCI Quality Enhancement Research Initiative (QUERI) studies to compare different strategies for using administrative data to identify veterans with SCI/D and describe the cohorts that resulted from the strategies. For the first objective, we compared three cohorts: the Allocation Resource Center (ARC), the SCD Registry, and the inpatient SCI flag. Individual records from these data sets were linked using patient identifiers. To examine the second objective, we used data from the SCD Registry, the VA Medical SAS Inpatient and Outpatient data sets, patient surveys, and medical record reviews to augment missing data on level of injury.
SCI QUERI's work to create an SCI cohort and examine healthcare use and outcomes has relied on several different data sources, including national-level data (ARC, SCD Registry, and VA Medical SAS Inpatient data sets), medical record reviews, and patient self-report surveys. Each is briefly described here.
The ARC maintains a cumulative list of persons with SCI/D, which includes all U.S. veterans who have had one of the International Classification of Diseases-9th Revision (ICD-9) codes listed in Table 1 noted in their record. The list was created for budgeting purposes (e.g., allocation of resources to VA facilities based on volume and complexity of patients), is updated every year, and is cumulative. Veterans are not removed upon death. The only element in this data is the patient identifier, which can be requested as either the real or the scrambled Social Security number. For the results described here, we used the fiscal year (FY) 2005 (October 1, 2004-September 30, 2005) ARC list. Veterans on this list include those with traumatic injuries as well as nontraumatic injuries; these may include nonmalignant neoplasms resulting in neurologic deficit; vascular insults of a thromboembolic, hemorrhagic, or ischemic nature; cauda equina syndrome producing neurologic deficit; inflammatory disease of the spine, spinal cord, or cauda equina resulting in nonprogressive neurologic deficit; demyelinating disease of the spinal cord; and multiple sclerosis (MS). For the analyses presented here, we removed veterans from the cohort if they had an etiology of MS in the SCD Registry or an ICD-9 code of 340.xx in the VA Medical SAS data sets.
In the 1990s, the VA and the Paralyzed Veterans of America jointly funded an effort to develop a computerized registry of all veterans with SCD, the SCD Registry [6]. This registry was designed to capture information about veterans with SCD. The registry includes information about etiology, date of onset, level of injury, completeness of injury, and the veteran's healthcare in addition to other administrative and clinical data. These data are collected at local VA facilities, entered into a database, and compiled into national files. These files are updated regularly, and values may be replaced during these updates if they have changed. Only a few studies have used these data; the registry was not created for research purposes, and with the exception of basic registry data, most other fields are infrequently completed. The variables used for this article include level of injury, registration status, age at onset, duration of injury, and etiology of injury. SCI QUERI previously obtained permission to use registry data through the Spinal Cord Injury and Disorders Services office; however, this process was recently changed and registry data are now obtained through a data request filed with the VA Office of Patient Care Services. Several documents are required to complete the request, such as research, privacy, and data security training documentation from the researchers involved and a copy of the protocol. Researchers should allow adequate time for the request process.
The SCD Registry was designed primarily as an administrative and clinical tool populated by local clinicians, administrators, and veterans themselves. Some staff members have received data entry training and others have not. Contradictions in the data or low completion rates for some variables should not be surprising to researchers. The limitations to a registry that is partially self-populated and voluntarily maintained by a combination of clerical and clinical personnel, many of whom are self-trained, should be apparent to researchers. Some data elements may be more accurately derived from other Veterans Health Administration (VHA) databases.
The inpatient SCI flag is found in the inpatient file of the VA Medical SAS data sets. These data sets include demographic and utilization data for all VA facilities and are based on databases that are updated nightly with patient medical record information from the local VA facilities [3-5]. We collected patients' date of birth (to compute age), race/ethnicity, and ICD-9 codes for level of injury (Table 1) for FY2005 from these databases. The inpatient SCI flag variable name in the Medical SAS inpatient data set is "SCI" and includes five categories: no spinal cord injury, paraplegia with a traumatic etiology, quadriplegia with a traumatic etiology, paraplegia with a nontraumatic etiology, and quadriplegia with a nontraumatic etiology. We used this variable to create the inpatient SCI flag cohort.
The Beneficiary Identification and Records Locator System (BIRLS) file includes veterans' dates of death that are obtained from the Social Security Administration Death Master File and reports from cemeteries, hospitals, and family or acquaintances of the veteran [6]. Dates of death were used to identify and exclude individuals who died prior to FY2005.
Additional SCI/D characteristics, such as level of injury, were obtained from patient surveys and medical record reviews from the VA Health Services Research and Development (HSR&D) Service study, "Characterizing Variability in Respiratory Care in SCI/D," also known as the "GAP study." Medical record reviews were conducted on persons identified with respiratory diagnoses (community-acquired pneumonia [CAP], sleep apnea, and chronic obstructive pulmonary disease [COPD]) using VA administrative data or patient survey. The goal of this study was to examine current management practices related to tobacco cessation, CAP, COPD, and sleep apnea.
Study participants included a total of 37,717 veterans identified through the ARC, the SCD Registry, and the inpatient SCI flag. The VA ARC file serves as the basis for most SCI QUERI work and is the primary resource for identifying the sample cohorts for our administrative data studies. We describe here how exclusions are made and how this list differs from other sources for identifying veterans with SCI/D.
The analyses in this study are primarily descriptive. We describe the number of veterans who are included in the ARC cohort and the SCD Registry and those who have an SCI flag in VA inpatient data. We also calculated the concordance between the SCD Registry registration status variable that indicated the level and etiology of injury and the SCI flag in the inpatient data that categorized level and etiology of injury. Concordance is the agreement rate between two sources of data. To measure the strength of the agreement and to account for agreement solely by chance, we calculated the kappa statistic [7]. We interpreted the strength using the approach outlined by Altman [8] and used by Jia et al. [9]: poor agreement 0.20, fair agreement 0.21 to 0.40, moderate 0.41 to 0.60, good 0.61 to 0.80, and very good 0.81 [7-8]. We described the mean ages for veterans who were and were not included in the registry versus the ARC cohort and assessed the difference using a difference of means test. We also tested the difference in race/ethnicity between veterans from the ARC cohort who were and were not included in the SCD Registry.
We also calculated the percentage of veterans who were included in the SCD Registry who were missing level of injury, etiology of injury, duration of injury, or completeness of injury. We supplemented the level of injury data we obtained from the registration status variable with data from the level of injury variables and with diagnostic codes obtained from the inpatient and outpatient Medical SAS files. If a veteran was missing the registration status variable or had "not applicable" as a value, we then filled the value based on data obtained from the level of injury variable. If a veteran was missing data in both registration status and level of injury fields, we searched their inpatient and outpatient utilization data for diagnostic codes. Codes used to identify paraplegia included ICD-9 codes of 806.2-802.7, 952.1-952.4, 344.1, and 907.2. To identify tetraplegia, we used ICD-9 codes 806, 806.0, 806.1, 806.8, 806.9, 952, 952.0, 952.8, 952.9, 344, and 344.0. The change in frequency of missing data using these augmentation strategies is presented here.
A Venn diagram depicting the number of unique veterans with SCI/D who were identified by the three data sources and the overlap between these cohorts is presented in the Figure. Approximately 42,914 individuals with SCI/D were included in the 2005 ARC list; of these, 28,082 were alive in FY2005 and 23,916 did not have MS. A total of 30,655 living veterans were included in the SCD Registry after individuals with MS were excluded, and 15,096 living veterans were identified by the SCI inpatient flag. When the data were combined, 12,306 (32.6%) veterans were included in all three groups, 16,897 were included in both the ARC list and the SCD Registry, 14,575 were included in the ARC and SCI inpatient flag group, and 12,784 overlapped between the SCD Registry and the SCI inpatient flag. Veterans who were on the ARC list but not on the SCD Registry were older (65.8 vs 60.6 years), but no significant difference was found between the groups for race/ethnicity.
Figure 1. Click Image to Enlarge
View as PowerPoint Slide
The overall concordance, or percent agreement, between the four categories (Table 2) (paraplegia traumatic, quadriplegia traumatic, paraplegia nontraumatic, quadriplegia nontraumatic) in the SCD Registry and the SCI flag in the administrative data was 52 percent. The simple kappa statistic was 0.34 (95% confidence interval: 0.34-0.35). According to the criteria established by Altman, this is fair agreement [8].
We also examined the concordance between level of injury data collected by medical record review and survey for a small sample of veterans in the GAP study. For 331 veterans with survey and SCD Registry data, 83 percent agreement was found for level of injury categorizations. For 545 veterans with medical record review and SCD Registry data, agreement was found for 92 percent of veterans.
Considerable variation exists in the completeness of the variables in the SCD Registry. The percentages of individuals missing data from the ARC and SCD Registry fields are presented in Table 3. For FY2005, 68 percent of veterans on the ARC list (n = 23,916) had complete level of injury data in the SCD Registry as defined by the registration variable, which includes information about both level and etiology of injury. Of the ARC cohort of veterans who were included in the SCD Registry (n = 16,897), 43 percent were missing completeness of injury, 42 percent were missing date of onset, and 32 percent had unknown level of injury. When the SCD Registry level of injury as defined by the registration status variable was supplemented with data in the SCD Registry from the variable that defines specific level of injury (e.g., cervical nerve root level 4 or C4), the number of veterans in the SCD Registry with an unknown level of injury only decreased to 31 percent (data not shown). By further augmenting the level of injury with ICD-9 codes from the VA inpatient data, however, this number was decreased to 13 percent.
Slightly less than a third of veterans with an identifier for SCI/D were included in all three data sources, and many were missing information about their level of and completeness of injury. Because no gold standard for defining veterans with SCI/D exists, decisions about whether to use the ARC list, the SCD Registry, or the SCI flag in the inpatient Medical SAS data sets to identify veterans with SCI/D data should depend on the research questions of the study. If a study sample will be limited to veterans with acute SCIs, the SCD Registry cohort may be more restrictive but will enable the investigator to determine etiology of injury for many of the veterans.
For studies examining issues related to the SCI system of care, however, the ARC cohort will allow for a more complete picture of the veterans with SCI/D who use VA for their healthcare. Given the low concordance between the SCI status flag and the neurologic and etiologic category data in the SCD Registry, we recommend not using the flag to identify cohorts for studies. Additionally, the flag is only available in the inpatient data sets, and thus does not identify veterans who were only seen on an outpatient basis.
Several possible explanations exist for the differences in the groups. There are a large number of VA facilities, and facilities may vary in how staff members make decisions about entering individuals in the SCD Registry. Additionally, veterans with diseases that result in nontraumatic SCD may be less likely to be included in the registry. As noted in VHA Handbook 1176.1, the focus of the SCI/D System of Care is individuals with stable and nonprogressive spinal cord neurological deficits. Because none of these cohorts was created for research purposes, researchers should obtain clinical input about the best strategies to address specific questions. Using a combination of cohorts and identifying specific ICD-9 codes to exclude other patient cohorts are strategies researchers can use to decrease the heterogeneity of their study population.
Approximately 30 percent of veterans in the ARC cohort were not included in the SCD Registry, and a substantial percentage of veterans who were included were missing information about SCI/D characteristics such as level of injury. Even after ICD-9 codes from the VA inpatient data were used to categorize level of injury, approximately 13 percent of patients with SCI/D (ARC cohort) were missing level of injury data. For etiology, duration of injury, and other variables, we recommend that researchers consider using data collection strategies such as medical record reviews or surveys to collect additional clinical detail. While these data collection methods can be expensive and time-consuming, they often provide information that is otherwise unavailable.
In addition to the effects of neurological injury, persons with SCI/D face secondary complications that represent lifelong challenges to adequate functioning and quality of life. Clinical, epidemiological, and health services research focused on SCI/D is crucial for improving care for this population. Implementing research findings to improve healthcare is an ongoing challenge. In the late 1990s, VA created the QUERI to increase the use of evidence-based findings in routine clinical practice for several prominent health conditions, including SCI, in the veteran population [10-11]. Since its inception, researchers associated with the SCI QUERI have used a combination of VA data previously collected for clinical and administrative purposes along with primary data from surveys and medical record reviews to identify and address conditions that are priorities based on their high prevalence, cost, and/or increased risk of mortality and morbidity. These include respiratory complications, pressure ulcers, and obesity.
SCI/D causes physiological changes that increase the risks of respiratory complications and illnesses [12-13], which are leading causes of death [14] and hospitalizations after SCI/D [15]. To decrease respiratory complications, SCI QUERI researchers have implemented strategies to increase influenza vaccinations [16-17] and have examined trends (antibiotic prescription) and outcomes in respiratory conditions [18-20]. SCI QUERI also recently completed a study characterizing care for smoking, sleep apnea, COPD, and CAP. This study is noteworthy because of its extensive use of multiple data sources, including veteran surveys, provider interviews, and record reviews, to provide detailed information about clinical management and symptoms that are not available in VA data sets. For example, it is possible to identify smoking status through diagnostic codes in the inpatient and outpatient data, and researchers have used these codes to examine patterns of tobacco use and how they relate to substance abuse [21]. To create interventions to decrease smoking in the SCI/D population, however, researchers and clinicians need information that is not available in the national data sets. In the GAP study, information about smoking status, reasons for smoking, readiness to quit, and quitting strategies were obtained from surveys and record reviews. Additionally, data about respiratory symptoms, sleep apnea diagnosis strategies and treatment patterns, and COPD diagnostic methods were also obtained from primary sources to obtain a more detailed picture of how these conditions are diagnosed and treated.
Persons with SCI/D are also at high risk for pressure ulcers throughout their lifetime because of decreased mobility and sensation, coupled with other physiologic changes. Reported prevalence rates of pressure ulcers have ranged from 17 to 33 percent in persons with SCI/D residing in the community [22-23]. High rates of recurrence have also been reported, ranging from 31 to 79 percent [24-25]. Severe (e.g., Stage III/IV) pressure ulcers are associated with increased mortality and often result in long hospital stays that substantially affect quality of life [26-27]. For these reasons, pressure ulcer research represents another key priority area for SCI QUERI. Challenges to using administrative data to examine questions related to pressure ulcers include lack of details about number, severity, location, and etiology of ulcer(s) and discriminating between incident and prevalent ulcers.
Examining questions related to pressure ulcers will likely require supplementing the VA administrative files with primary data collection. Use of diagnostic codes in VA administrative data sets may underestimate the number of pressure ulcers. Of 135 admissions for treatment of severe (Stage III/IV) pressure ulcers identified in a recent prospective study, 123 had an ICD-9 code 707.0 (pressure ulcer) for the patient's primary (n = 99) or secondary (n = 24) diagnostic code. However, 12 patients (9%) had no pressure ulcer diagnosis. We have found that primary data collection is necessary to obtain more detailed information, including the number and severity of ulcer(s), the location, and other ulcer characteristics. Researchers affiliated with SCI QUERI recently studied predictors of recurrence using data collected in a prospective study; the amount of time a patient could sit at the time of discharge was a significant predictor of recurrence [28]. In a related study, medical record review data were used in combination with primary data collection to describe patients who had a pressure ulcer recurrence [29]. These examples demonstrate the importance of primary data collection for examining many of the questions related to pressure ulcers.
A final example of an SCI QUERI focus area that demonstrates the complexities of using national data sets is obesity. Results from retrospective medical record review data and a study using VA national data suggest that obesity is common among veterans with SCI/D [30-31]. Higher body mass index (BMI) in persons with SCI/D has been associated with a number of health conditions [32-34]. The increased prevalence of obesity and its associated health implications have served as an impetus for SCI QUERI's work in this area.
A key measure of interest for our obesity research is BMI. While the use of BMI is not without limitations in this population, it continues to be widely used in epidemiological studies [35]. We obtained height and weight to calculate BMI from the VA's Healthcare Analysis and Information Group. These data are recorded in the vital statistics package of the local VistA. Weaver et al. pointed out that it was unknown whether the height and weight extracted from a data set created from vital statistics information from electronic medical records were self-reported or measured. Having this information may reduce the chance of misclassification that may occur with the use of self-reported measures [30]. The authors found that the risk of being overweight or obese was higher among individuals with paraplegia than among those with tetraplegia, which is counter to what might be expected and warrants further study [30]. However, the significant amount of missing or unknown information on SCI characteristics and duration of injury in the SCD Registry limits our ability to further investigate these issues and requires primary data collection.
The majority of current and planned SCI QUERI studies use combinations of methods, including VA databases, surveys, record reviews, and other methods of primary data collection. A future positive development is the ongoing development and implementation of the Spinal Cord Injury and Disorders Outcomes Database (SCIDO). The SCIDO includes data from VistA as well as outcomes data collected from veterans, such as information about impairments, pressure ulcer characteristics, medical complications, activity limitation measures, participation, and satisfaction with life. The implementation of this data source as a resource tool could address many of the limitations of the current SCD Registry and provide more complete injury characteristic data. This data set is not ready for release to researchers, but will be available in the future through the VA Office of Patient Care Services.
Other promising developments include the new Vital Status File, which includes the data from the BIRLS file with the addition of Medicare data, and the availability of BMI data from the Corporate Data Warehouse. Another potential future source of data will be the Veterans Informatics Computing Infrastructure (VINCI). VINCI may provide access to consolidated data sources that will include text data obtained as a result of procedures developed by the Consortium for Healthcare Informatics Research.
Another resource for researchers interested in SCI is the National SCI Database (NSCID). The NSCID, which includes more than 26,000 patients who received initial SCI rehabilitation in one of the designated SCI Model Systems in the United States, has been used in a considerable number of studies pertaining to demographics, neurological status, outcomes, secondary complications, and life expectancy [36]. Since it is well known to many researchers outside of VA, it is useful to contrast NSCID with the VA data sources described here. The greatest difference between patient populations is that NSCID is essentially limited to patients with traumatic etiologies of SCI/D (defined around the occurrence of an external event), while VA SCI/D data sources include both traumatic and nontraumatic etiologies and only include veterans. As mentioned previously, data on neurological classification are missing in many VA databases and the reliability of some data are suspect as they have been entered by clinicians or administrative staff who may not have received formal training. In contrast, NSCID data were entered for research (not clinical or administrative) purposes, and there are currently multiple quality checks to assure the accuracy of data. All data pertaining to events during the acute care hospitalization and initial rehabilitation are abstracted from the medical record (NSCID Form I). However, all subsequent data on healthcare utilization are collected based on patient report, without review of any medical records, 1 year postinjury and then at 5-year intervals postinjury (NSCID Form II). VA administrative databases, in contrast, record all instances of VA healthcare utilization, such as hospitalizations, outpatient care, laboratory studies, radiology, and pharmacy. Thus, NSCID may be preferable when precise neurological classification or details on acute care and initial rehabilitation are needed for a large number of patients, while VA databases are preferable if nontraumatic etiologies are to be examined or if healthcare utilization for chronic secondary conditions is of importance.
The ARC, SCD Registry, and Medical SAS data sets were not created for research purposes, but rather for resource allocation and clinical care. Therefore, individuals may be included in one list but not the other. Because SCI characteristics may be missing for some veterans, researchers should consider supplementing administrative data with primary data collection. VA data provides researchers with the largest longitudinal cohort of persons with SCI in the world and offers great potential for providing insights into healthcare utilization and morbidity. These capabilities are central to efforts to improve healthcare for veterans with SCI/D.
Go to TOP
Go to the Table of Contents of Vol. 47 No. 8
Last Reviewed or Updated Tuesday, November 9, 2010 9:46 AM