Volume 47 Number 8, 2010
Pages 781 — 796
Abstract — We evaluated the improvement in Department of Veterans Affairs (VA) race data completeness that could be achieved by linking VA data with data from Medicare and the Department of Defense (DOD) and examined agreement in values across the data sources. After linking VA with Medicare and DOD records for a 10% sample of VA patients, we calculated the percentage for which race could be identified in those sources. To evaluate race agreement, we calculated sensitivities, specificities, positive predictive values (PPVs), negative predictive values, and kappa statistics. Adding Medicare (and DOD) data improved race data completeness from 48% to 76%. Among older patients (≥65 years), adding Medicare data improved data completeness to nearly 100%. Among younger patients (<65 years), combining Medicare and DOD data improved completeness to 75%, 18 percentage points beyond that achieved with Medicare data alone. PPVs for white and African-American categories were 98.6 and 94.7, respectively, in Medicare and 97.0 and 96.5, respectively, in DOD data using VA self-reported race as the gold standard. PPVs for the non-African-American minority groups were lower, ranging from 30.5 to 48.2. Kappa statistics reflected these patterns. Supplementing VA with Medicare and DOD data improves VA race data completeness substantially. More study is needed to understand poor rates of agreement between VA and external sources in identifying non-African-American minority individuals.
Key words: African American, data quality, ethnicity, health science, Hispanic, Medicare, minority, race, validity, veterans.
Race/ethnicity-based differences in healthcare and health status in the United States are well known and continue to receive much research attention. Research has demonstrated that U.S. minorities, particularly African-American and Hispanic patients, receive lower quantity and quality of healthcare in many settings and for a wide range of conditions. Many of these differences are not explained by clinical factors, patient preferences, or ability to pay (as measured by health insurance and income) and thus represent inequities in care.
While these disparities have been well documented, their root causes and solutions remain unclear, requiring further research [1-4]. Monitoring progress toward eliminating disparities in health and healthcare is a major U.S. public health goal [1-2,5]. One challenge to these research and monitoring activities in the United States is the paucity of reliable and consistent collection and reporting of race/ethnicity data [6-8].
The Department of Veterans Affairs (VA) Office of Research and Development has identified health disparities and minority health as a priority research area [9], and many VA studies with race/ethnicity as a central focus are in progress or have been completed [10-22]. As the largest integrated healthcare system in the United States and a pioneer in electronic health information, the VA has vast data stores that provide rich opportunities for health services research. In addition to clinical information, VA databases frequently used in research contain patient demographic information including race/ethnicity. However, the quality and completeness of this race/ethnicity information has been identified as a potential limitation to research [23-26].
Obtaining veteran race/ethnicity information from external sources, including Medicare and Department of Defense (DOD) databases, has the potential to improve data completeness in VA research studies. However, little information is available to inform researchers about the utility of this approach. In this study, we evaluated the improvement in VA race data completeness that could be achieved by linking VA data with data from Medicare and DOD. Further, we examined the agreement in race values between the Veterans Health Administration (VHA) and these external data sources.
The VA has a national network of facilities that provides a comprehensive set of healthcare services, including inpatient and outpatient care, medications, and medical equipment, to more than 5 million U.S. veterans annually, approximately 20 percent of whom are racial/ethnic minorities [27]. All veterans eligible for VA care are offered the same set of services and pay no premiums, although some veterans are subject to copayments for medications for conditions not related to military service and some veterans with financial means surpassing a specified threshold also pay copayments for other services [28-29]. Given the large portion of patients from racial/ethnic minority groups who receive care at VA facilities, the availability of national data on healthcare use and outcomes, and the limited financial barriers to care in the VA, the VA is a valuable setting for studying racial/ethnic disparities [25,27]. Although most financial barriers to healthcare found in the private sector have been removed for VA users, racial/ethnic disparities in healthcare utilization and outcomes have been found in the VA [13,15,20,27,30-36]. Other studies, however, have found no disparities in care or outcomes or have found that the disparities that do exist in the VA population are reduced in size compared with the disparities found in other non-VA populations [12,14-16,21,37-40]. VA's contribution to reducing health disparities through improved understanding of factors responsible for their absence or attenuation as well as the continued existence of racial/ethnic disparities in some areas of VA care highlight the need for continued research and monitoring. Accurate and complete patient race/ethnicity information is critical to these endeavors.
Veterans' racial/ethnic affiliation in VA data is entered into the local healthcare facility electronic medical record known as the Computerized Patient Record System by healthcare facility personnel and then transmitted with patient healthcare encounter data to the VA's centralized data repository at the Austin Information Technology Center, where it is stored in the National Patient Care Database (NPCD). Data extracts from the NPCD, known as the Medical SAS (MedSAS) data sets, are frequently used by researchers and contain race/ethnicity data as well as clinical and other demographic information [26,41].
In accordance with a 1997 revision of Office of Management and Budget Directive 15, which established standards for the classification of Federal data on race/ethnicity, and VHA Directive 2003-027, VA healthcare encounter records from 2004 forward contain race/ethnicity information that is self-reported or reported by a representative who is authorized to speak for the patient (i.e., patient proxy-reported) [42-43].
Although the VA did not record the method of collection prior to 2003 when VA implemented the new data collection standards, it is widely assumed to have been predominantly observer-reported by clinic personnel. The transition to self-report (or patient proxy-report) as the preferred method of collection was an outgrowth of the evolution in understanding race to be a social rather than biological construct; self-identity is the most accurate and useful, and perhaps the only, valid measure of race/ethnicity [6,8,44-45].
Unfortunately, patient race/ethnicity information is frequently missing in VA healthcare encounter records [24-26]. A review of 114 studies focusing on racial/ethnic disparities in VA found that these studies reported missing race/ethnicity data rates as high as 48 percent [25,46]. Approaches for addressing these missing data have included creating a "missing" or "unknown" category, using patient race/ethnicity information obtained from other sources, or excluding patients with missing race/ethnicity values from the study [25]. Moreover, in more than 40 percent of the studies reviewed, the authors did not discuss the issue of or methods for addressing missing race/ethnicity, even when the information was from data sources in which race/ethnicity values were known to be missing [25]. Thus, previous examinations of racial/ethnic disparities in VA have failed to completely and consistently address the issue of missing data with unknown consequences for study results [25].
In this study, we examined the feasibility and utility of using non-VA data sources (Medicare and DOD) to address missing data problems in VA healthcare data. That is, we addressed two questions: (1) To what extent can missing patient race information be reduced using these sources? and (2) How likely is it that the information obtained from these sources will mirror the information that would have been available in VA data had it been obtained from the patient? Determining the agreement between self-reported VA race/ethnicity information and the information from external sources provides insight into the utility of using external sources to supplement incomplete race/ethnicity information in VA data. Because the vast majority of the 42 percent of elderly VA users and nearly 20 percent of younger users are enrolled in Medicare, results from this study could provide a method to address missing race/ethnicity values for a substantial portion of VA users. The utility of DOD data (more recently made available in the VA) for addressing missing data problems in VA has not previously been explored.
This was a retrospective cohort study of a representative 10 percent sample of individuals who received VA healthcare between October 1, 2003, and September 30, 2005 (fiscal years [FYs] 2004 and 2005). We identified patients in this cohort whose race in VA data was either missing or unknown (i.e., contained no "usable" value), and we determined the proportion for whom that information could be obtained from either Medicare or DOD data sources. For veterans whose VA records did contain a usable value, we determined the agreement between the VA values and those in Medicare and DOD data.
Our representative 10 percent sample consisted of 574,971 individuals who received VA healthcare in FY 2004 and 2005. We excluded 1,590 (0.3%) individuals whose age, calculated from the date of birth in the VA record, was implausible-younger than 18 and older than 110 years. We also excluded 3,363 (0.6%) individuals who had two or more different race values in the FY 2004 and 2005 data (approximately half reported multiple racial identities and the other half reported a single but different race over time).
To examine agreement between the data sources, we conducted a record match between VA and Medicare data and VA and DOD data. To ensure that the records from the three data sources contained information on the same individuals, we used conservative matching criteria: Social Security number (SSN) plus date of birth or SSN plus sex plus two of the three parts of the date of birth (month, day, year).
We obtained information on race from the VA MedSAS data sets for FY 2004 and 2005. These data sets are national workload data for VA-provided and VA-funded healthcare [42-43,47-48]. For this study, we used the outpatient "Visit" and inpatient "Acute Main" MedSAS data sets. Each record in the outpatient Visit file reflects the services provided to an individual at a VA facility on a single day. Information in these records, therefore, may be generated from multiple provider encounters (e.g., clinic visits) or services (e.g., radiology examination), all provided on the same day and at the same facility. The Acute Main file includes one record for each discharge from a VA acute care hospital stay in the respective FY. Race categories in VA data are white, black or African American, American Indian or Alaska Native, Asian, and Native Hawaiian or Other Pacific Islander (all referred to in this study as usable values). The data also contain "Declined to Answer," "Unknown," and missing values. Guidance during the time of our study period [43] instructed personnel to enter "Unknown" if the veteran either returned the Application for Health Benefits (1010-EZ) form to enroll in VA healthcare services without completing the race/ethnicity sections or refused to answer the question when he or she checked in for an outpatient visit or inpatient admission. Therefore, the frequency of "Declined to Answer" values in the data probably does not accurately reflect the number who declined and the "Unknown" category has unclear meaning. The vast majority (about 93%) of nonusable race values in this study were due to null values. VA collects and reports Hispanic ethnicity separately from race.
We obtained Medicare race/ethnicity information from the Medicare Vital Status file. This data set contains information about each beneficiary ever entitled to Medicare, including demographic information populated from the Medicare program's enrollment database [49]. It is updated annually to reflect changes in enrollment and vital status. Medicare race/ethnicity information comes primarily from the Social Security Administration (SSA) and has known data quality problems of its own, including the absence of a separate indicator of Hispanic ethnicity and a substantial proportion of "Unknown" values [50-51]. Medicare race/ethnicity categories are white, black, Asian, Hispanic, North American Native, and Other.
We obtained DOD race information from the VA/DOD Identity Repository (VADIR) database, which is owned by the Office of Enterprise Development of the VA Office of Information. VADIR contains DOD data elements for all veterans whose military separation date was 1980 or later and for some veterans discharged before 1980 [52]. The VA obtains these data from the Defense Manpower Data Center's (DMDC's) Defense Enrollment Eligibility Reporting System (DEERS) database through an established VA/DOD data-sharing agreement. Included in the DEERS database is self-reported race/ethnicity obtained from servicemembers when they first join the military as part of their military entrance processing. Active Duty servicemembers can change this information at any time at the service personnel office or online. Current rules allow servicemembers to decline to provide their race/ethnicity. In this article, we refer to race/ethnicity data obtained from VADIR as DOD data. Race values in the DOD data are white, Asian or Pacific Islander, black, American Indian or Alaskan Native, Other, and Unknown. DOD collects and reports Hispanic ethnicity separately; that information was not used in this study.
In Table 1, we have presented the race categories in the VA, DOD, and Medicare databases. Differences were present in categories of non-African-American minority groups across the three data sources. To facilitate comparison across databases, we combined some categories to create four mutually exclusive categories: white; black or African American; North American Native; and Asian, Pacific Islander, or Other (APIO). For the 1 percent of individuals in the Medicare data whose race/ethnicity was classified as Hispanic, we compared this with Hispanic or Latino ethnicity in VA MedSAS data sets.
We identified two groups of patients: those with and without a usable race value in the VA MedSAS data sets. For the patients without a usable value (the "missing" subsample), we linked VA with Medicare records and identified those whose race information was present in those records. We then calculated the proportion of the missing subsample whose race information was present after the record linkage. We also calculated the improvement in data completeness in our full VA sample of 570,018 individuals that resulted from the record linkage. We computed these proportions among patients in two age groups: elderly (≥65) and nonelderly (<65).
We conducted a similar analysis using data originating from DOD but focused on individuals <65 years. Since the VADIR database includes DOD data for only a limited number of veterans who separated from the military before 1980, very few individuals in our elderly sample would have DOD data in their VADIR record. In addition, because only approximately 20 percent of nonelderly veterans are enrolled in the Medicare program, Medicare data will have less value in improving race data completeness among individuals <65 years than among elderly individuals. Therefore, DOD data have the greatest potential to add value for the younger population.
In the final step of our missing race analysis, we linked records from all three sources—VA, Medicare, and DOD—and calculated the proportion of the full sample with a usable race value after all data sources were combined.
For the individuals in our sample who had a usable race value in VA data, we compared Medicare and DOD race values to VA values in a linked record data set (the "race consistency" subsample). We calculated sensitivities, specificities, positive predictive values (PPVs), negative predictive values, and kappa statistics to evaluate race category agreement between VA and Medicare and VA and DOD data. Again, we limited the DOD analysis to individuals <65 years.
To explore whether Medicare data might provide a useful source to supplement missing ethnicity information in the VA MedSAS data sets, we compared concordance of the patients reporting Hispanic or Latino ethnicity in VA data with the Hispanic ethnicity category in Medicare data.
Our full study sample comprised 570,018 individuals, 295,010 (52%) of whom had no usable race information in the MedSAS data sets and therefore were the focus of our completeness analysis (Figure 1, Table 2). The remaining 48 percent of the full sample (275,008 individuals) was the focus of the consistency analysis.
Figure 1. Click Image to Enlarge
View as PowerPoint Slide
Table 2 presents sample characteristics for those with and without a usable race value in VA data. Due to the large sample size, these groups differed statistically on all sample characteristics, but very few of the differences could be considered meaningful in any practical sense. The largest differences were found for sex, geographic region, and period of military service. Individuals lacking a usable race value in VA data were less likely to be male, reside in the South, or have served during the Vietnam era and were more likely to reside in the West than those with a usable value.
Figure 2 shows the results of linking Medicare data to fill in missing values among the subgroup whose race was missing in VA data. Results for the elderly and nonelderly age groups are broken out in Figures 2(b) and 2(c). Of the 295,010 individuals in the missing race subsample, 157,189 (53%) had a Medicare record. As expected due to Medicare eligibility criteria, the Medicare record match rate was much higher among elderly than nonelderly individuals (97% vs 18%). The Medicare record contained a usable race value 99 percent of the time overall, in 99 percent of individuals ≥65 years and in 98 percent of individuals <65 years.
![]()
Figure 2. Click Image to Enlarge
View as PowerPoint Slide
In Figure 3, we show the influence of combining VA and Medicare data on our full study sample of VA patients. Adding Medicare data improved race data completeness in the full sample from 48 to 76 percent. In the older age group, adding Medicare data improved completeness from 47 to 98 percent while completeness in the younger age group improved from 49 to 58 percent.
Figure 3. Click Image to Enlarge
View as PowerPoint Slide
Figure 4 shows the results of linking DOD data with VA data to fill in missing race among nonelderly veterans in VA data. Of the 162,882 individuals <65 years in our missing race subsample, 134,892 (83%) had a VADIR record (Figure 4). The VADIR record contained a usable DOD race value in 45 percent of those cases.
Figure 4. Click Image to Enlarge
View as PowerPoint Slide
In Figure 5, we show the influence of combining VA and DOD data on the subgroup of nonelderly individuals without a usable race value. Adding DOD data improved data completeness in this group from 49 to 68 percent.
![]()
Figure 5. Click Image to Enlarg
View as PowerPoint Slide
Finally, we combined both Medicare and DOD data to examine the benefit gained from using all three data sources together to fill in missing race among individuals <65 years. Race data completeness improved from 49 to 76 percent in that group (Figure 6). Among the nonelderly, combining Medicare and DOD data improved completeness 8 percentage points over that achieved by adding DOD data alone and 18 percentage points over that achieved by adding Medicare data alone.
Figure 6. Click Image to Enlarge
View as PowerPoint Slide
Figure 7 shows the concordance between VA and Medicare (Figure 7(a)) and VA and DOD data (Figure 7(b)). A high degree of concordance was found between VA and Medicare data for individuals identified as white (99%) or African American (96%) in VA data. Among individuals who were North American Native in VA data, only 36 percent were recorded as such in Medicare data, and among those who were APIO in VA, just 47 percent were APIO in Medicare data. The majority who had discordant race information in the VA North American Native and APIO groups were recorded as white in Medicare data (55% and 47% of those groups, respectively).
We also found a high degree of concordance between VA and DOD data for individuals identified as white (93%) or African American (95%) in VA data. Among individuals who were North American Native in VA data, only 39 percent were recorded as such in DOD data, while among those who were APIO in VA, 65 percent were APIO in DOD data. The majority who had discordant race information in the VA North American Native and APIO groups were recorded as white in DOD data (46% and 27% of those groups, respectively).
Figure 7. Click Image to Enlarge
View as PowerPoint Slide
Compared with Medicare data, DOD data had poorer concordance for the VA white group (93% vs 99%) and similar concordance for the African-American group (95% vs 96%). In contrast, concordance between DOD and VA data was better than the concordance between Medicare and VA data for the North American Native group (39% vs 36%) and markedly better for the APIO group (65% vs 47%).
Measures of agreement between VA data and each of the external data sources are shown in Table 3 (Medicare data) and Table 4 (DOD data), which assume the VA self-reported or proxy values to be the gold standard. In both data sources, sensitivities and PPVs for the white and African-American categories were high. In Medicare, the PPVs were 98.6 for the white and 94.7 for the African-American categories. In DOD, PPVs were 97.0 for the white and 96.5 for the African-American categories. Kappa statistics ranging from 0.86 (for the white category in DOD data) to 0.95 (for the African-American category in Medicare data) indicate high levels of agreement and reflect the high specificities and sensitivities shown.
In both Medicare and DOD data, sensitivities and PPVs for the North American Native and APIO groups were much lower. In Medicare, the PPVs were 38.0 for the North American Native and 48.2 for the APIO categories. In DOD, PPVs were 35.3 for the North American Native and 30.5 for the APIO categories. Kappa statistics ranging from 0.37 (for the North American Native category in both data sources) to 0.47 (for the APIO category in DOD data) indicate only fair agreement.
Of the 5,606 (3.7%) veterans in our sample who reported Hispanic or Latino ethnicity in the VA MedSAS data sets and were also in the Medicare data, only 25 percent were recorded as Hispanic in the Medicare data (Figure 8). The majority of patients reporting Hispanic ethnicity were recorded as white (64%) in the Medicare data.
Figure 8. Click Image to Enlarge
View as PowerPoint Slide
In this report, we evaluated the improvement in VA race data completeness that could be achieved by linking VA data with data from Medicare and the DOD. Medicare merged with VA data substantially improved race data completeness; the proportion of the full sample with a usable value increased by 56 percent, resulting in 98 percent completeness among individuals ≥65 years. Medicare data also improved completeness among the 18 percent of the younger age group who were Medicare enrollees. More modest improvements were realized with DOD data; race completeness increased by 38 percent (from 49% to 68%) among those <65 years. The greatest improvement in data completeness in the younger age group was achieved when both Medicare and DOD data were used to supplement VA data. In a merged data set that included VA and the two external data sources together, more than 75 percent of individuals <65 years had a usable race value.
We also examined the agreement in race values between VA and the two external data sources and found high levels of agreement between VA and each source for self-reported white and African-American individuals. Agreement for self-reported North American Native and APIO individuals was fair; a large portion of these individuals was recorded as white in Medicare and DOD data. These results suggest that researchers who use Medicare or DOD race data to supplement VA data will under-identify the non-African-American minority groups and that a substantial proportion of those groups will be misclassified as white. Since together those groups represent less than 2 percent of the sample in the VA-Medicare merged data and just 3 percent of the sample in the VA-DOD merged data, the likely effect on research study results is small, except in cases in which the study sample is small.
Our finding of much lower sensitivity and PPV of Medicare race data for the North American Native and APIO classifications than for the white and African-American classifications is consistent with results of other studies examining the validity of Medicare data on race (we are unaware of other studies examining DOD race data quality) [51,53-55]. However, these other studies found much higher PPVs for the North American Native classification than we found. For example, Waldo compared Medicare data to self-reported race in the Medicare Current Beneficiary Data and found a PPV of 69.5 for the Medicare North American Native classification [55], while we found a PPV of 38.0 in our study. This low PPV is owing to the large proportion of individuals who were identified as North American Native in Medicare data but some other race in VA data. The large majority of these "false positives" were classified as white in VA data. In fact, 58 percent of the 482 individuals identified as North American Native in Medicare were recorded as white in VA. The DOD data had very similar proportions and distributions of false positives for North American Natives. Additionally, a VA study comparing self-reported race from a survey in the VA's electronic health record found more than 85 percent concordance for whites and African Americans but only 20 percent concordance for North American Natives [56].
Reasons for this pattern of discordance in the North American Native and APIO groups are unclear. In this analysis, we have treated VA data as the gold standard against which Medicare and DOD data were compared. VA policy dictates that race/ethnicity be obtained from the patient or proxy. However, the data entry system does not prevent the entry of values that are not self-reported (for example, values based on clinic staff observation), and we have no way of verifying the true source of the information. In a recent study comparing VA self-reported race to observer-reported race in earlier VA data (prior to the 2003 mandated switch to the self-reported data collection standard), the investigators found that 58 percent of self-reported North American Natives were identified as white in observer-reported data [26]. Medicare has made special efforts to improve the quality of its race data and since 1999 has been using data provided by the Indian Health Service to identify enrollees who are North American Natives. These efforts have increased the identification of North American Natives by an estimated 68 percent [57]. We cannot rule out, then, that the misclassification is occurring in VA rather than Medicare data. Furthermore, since race information in VA, Medicare, and DOD were collected at different points in an individual's lifetime and race is a social rather than a biological construct, we also cannot rule out that individuals' racial identifications may have changed over time. While more than 93 percent of whites and blacks marry within their own racial group, 70 percent of Asians and 33 percent of American Indians do so [58]. Consequently, a substantial portion of individuals in the APIO and North American Native groups are likely to be multiracial. Among multiracial individuals, self-identity has been found to change over time, with North American Natives having the most instability in their racial identity [59]. So, it is also possible that a majority of individuals of North American Native heritage are reporting white as their race in VA.
We also found some differences between the two external data sources in their concordance with VA data. For example, sensitivity of Medicare data for the VA white group was 98 percent but only 93 percent in DOD data. In contrast, sensitivity of DOD data for the VA Other group (comprising Asian, Native Hawaiian, Pacific Islander, and Other) was 65 percent but only 47 percent in Medicare data. While race in both external data sources is principally self- or proxy-reported (in the case of SSA-originated Medicare data, parents may have applied for a SSN on behalf of a child), there could be several reasons for these differences in concordance. In our study and consistent with others' findings, the greatest discordance between VA and each of the external data sources was observed for the non-African-American minority groups and most of the misclassification involved identification as white rather than the minority group [3,26]. The proportion of non-African-American minorities in our VA/DOD analysis sample (3.1%) was nearly twice that in the VA/Medicare analysis sample (1.7%). Therefore, we would expect poorer concordance overall in the VA/DOD than in the VA/Medicare analysis. Additionally, some of the discordance could be related to shifts over time in likelihood of self-identifying as belonging to a minority group (on average, individuals in the VA/DOD analysis are 28 years younger than those in the VA/Medicare analysis) and/or to different preferences for revealing racial affiliation in the various settings (VA/Medicare, DOD) [44,60]. Finally, while DOD race information is purportedly self-reported and can be updated at any time, we were unable to find an organizational directive, a manual, or instructions that operationalize this.
As the largest integrated healthcare system in the United States, with a large minority population who face minimal financial barriers to access to care relative to the private sector, the VA has served as the setting for a substantial number of investigations into racial/ethnic disparities in healthcare utilization and quality of care. Researchers have often used the VA's electronic health information system as the source of race/ethnicity information but have faced the perennial problem of incomplete values in these databases, the prevalence of which has been reported to be as high as 48 percent in previous research [25]. We have shown that the use of Medicare data to supplement VA data will reduce the missing data quite substantially, to approximately 25 percent in a representative sample and to close to zero for the 53 percent of patients (97% of those ≥65 years; 18% of those <65 years) who were enrolled in Medicare. We have also shown a high level of agreement between VA and Medicare data for the white and African-American categories, which suggests that the information obtained from Medicare to supplement VA data will mirror the information that would have been available in VA data had it been obtained from the patient. This information is particularly important for researchers because Medicare race/ethnicity data is now available in the VA Vital Status file. Our results show that researchers can use this information to fill in missing white or African-American race with confidence; however, our results also show that researchers should use caution with race information for non-African-American minorities.
For veterans not enrolled in Medicare, this study is also the first to show the utility of supplementing race information with DOD data for individuals <65 years. Unfortunately, DOD DEERS data are not available for veterans discharged before the 1980s. Therefore, DOD data will not be a useful source of race/ethnicity information for older veterans at this time. The improvement in race data completeness was more modest with DOD data than with Medicare data. However, if race is a particular focus of research in this population, DOD data is a source that researchers could consider. Moreover, a high level of agreement was found between white and African-American categories in VA and DOD data for individuals with a usable race value in both data sources, which supports the usefulness of these data to fill in incomplete VA data for these categories.
For researchers needing race information, we would recommend that they supplement incomplete information with Medicare data from the VA Vital Status file. Because such a high proportion of VA non-African-American minorities are recorded as white in Medicare data, the most reliable classification when supplementing VA with Medicare data is a dichotomous grouping of African American versus not African American. For researchers needing race information for a younger cohort, supplementing VA and Medicare data with DOD data and again combining the non-African-American individuals into a single category may be a consideration. Researchers focusing on non-African-American minorities might consider other sources such as Indian Health Service data or conducting a survey.
Our study has some limitations. In order to have comparability in categories across the three data sets, we combined the Asian, Native Hawaiian, Other Pacific Islander, and "Other" classifications into one category. Other published literature examining Medicare race data has shown much higher sensitivity and PPV for the Asian than the "Other" category [53-55]. By combining these, we likely underestimated the agreement between Medicare-and possibly DOD-and VA data for individuals identified as Asian in VA data. We were unable to assess the utility of DOD race data for supplementing VA race information for elderly VA patients because of VADIR data limitations. We were unable to match VA with VADIR records in 17 percent of the subgroup we tried to match, those <65 years. This 17 percent (9,032) matched on SSN but not other match criteria (date of birth, sex). None of those individuals had a race value in their VADIR record. Therefore, inclusion of these individuals would not have affected either our completeness or consistency analysis.
Questions remain about best approaches to addressing problems presented by missing race information in VA data. Future studies should explore the potential contribution of Indian Health Service data as well as the additional benefit derived from DOD data obtained directly from the DMDC. Further exploration of VADIR data completeness would be highly desirable if it is to be used in future VA research.
Using Medicare data to fill in missing race information in VA records improves data completeness substantially. Among veterans <65 years, the benefit derived from supplementation with DOD data was substantial and use of the two data sources together improves completeness by 18 percentage points beyond that achieved with Medicare data alone. Medicare and DOD had similar rates of agreement with VA data. Use of either of these two external data sources will result in high rates of accurate classification of patients who are either African American or white. More study is needed to understand poor rates of agreement between VA and external sources in identifying race for individuals who are neither white nor African American. The best approach to managing the problem of missing race/ethnicity information may vary from study to study. This study has demonstrated that a potentially useful approach is to supplement VA data with Medicare and DOD data.

Go to TOP
Go to the Table of Contents of Vol. 47 No. 8
Last Reviewed or Updated Wednesday, October 27, 2010 11:33 AM