Determination of overall survival using on-line data from the Social Security Death Masterfile (Social Security Death Index/"SSDI").

Death dates were assigned to each patient using only data obtained through electronic searches of the SSDI, retrieved through the following World Wide Web site:

http://www.ancestry.com/search/rectype/vital/ssdi/main.htm

August 23, 2005 Note: The ancestry.com mirrored version of the Social Security Death Masterfile no longer provides the precise date of death. It now only provides the year of death, and is thus no longer useful for length of survival determinations.  The following (free) website is now a more useful mirror site:

 http://www.newenglandancestors.org/research/Database/ss/default.asp

The SSDI is a database maintained by the US Social Security administration and mirrored on a number of commercial internet sites, including the above site, which we have found to be generally the most "user friendly" (subjectively useful). A number of studies have examined the accuracy of the SSDI for use in determining the death dates of medical patients. Past studies, using different "gold standards," have found the SSDI to be 99 - 100% specific and 86 - 88% sensitive in determining death dates 11. This degree of accuracy was specifically reported in a recently-published study which used our exact search algorithm (described below) on the above World Wide Web site 11.

The above studies, however, examined deaths occurring in earlier time periods, and the current sensitivity may be substantially higher. Searching the internet, we discovered a much larger and more recent study, covering a cohort of 6590 deaths in Department of Energy workers between 1994 and 1997 12. The first author of this study was interviewed by telephone by LMW to obtain details for the specific study cohort in which complete demographic information was available (i.e. social security number, name, and date of birth), as all of these parameters were available for patients in our own, presently-described study.

In the telephone interview of October 28, 2002, the senior author (Dr. Mary Schubauer-Berigan of the National Institute of Occupational Safety and Health, Cincinnati, OH) provided the following statistics for sensitivity/specificity of the Social Security Death Master File in cases where the Social Security number, name, and birthday were available: 98.06% sensitivity (95% confidence interval 97.8 to 98.3) and specificity greater than 99. The "gold standard" for the sensitivity comparison was the National Death Index, which has, itself, been document to be 97% sensitive 13,14.

We searched the SSDI, using the above web site, according to the following protocol. First, we entered the social security number. If this then resulted in a match for a patient with the same name and birthdate as the patient in question, this was considered to be a true date of death and this date was entered into our database. If the social security number did not result in a match, we then entered the patient's last name, and compared the resulting retrieval with first name and with date of birth. If we had a match by name and date of birth, but not by social security number, we then re-reviewed the chart to see if there was any indication that the social security number entered into the database was entered correctly. We cross-checked our financial (billing) database and billing records to determine if there was some mistake or omission.

Ultimately, if we were able to match name, date of birth, and social security number obtained from a credible source with the name, date of birth, and social security number in the SSDI, then the date of death listed in the SSDI was entered into our database as the date of death of the patient. If we had a social security number, patient name, and patient date of birth and there were no SSDI matches searching by either social security number or last name and birth date, then the patient was (for purposes of data analysis) considered to be alive as of the date of the last SSDI update (listed on the website).

To allow for adequate follow-up, the present data analysis was restricted to patients who had their biopsy (and assay) date a minimum of three years prior to the date of the last SSDI update in our data retrieval. The entire period covered by this data analysis cancer specimens received between January 1, 1993 and August 31, 2000. The start date of January 1, 1993 was selected as the beginning date by which the above-described assay and reporting methods had been standardized. It is important to note that patients for whom we did not have a social security number were censored in the data analysis, even in a case where we had a valid follow-up or death date from some other source of information.

As a single example, among previously-untreated ovarian cancer patients, there were 115 patients for whom we had names, birth dates, and social security numbers. Two additional patients did not have social security numbers, including one European patient whose specimen was sent from Europe and one 80 year old patient who never had a social security number. Only these two patients were censored. The European patient had an incomplete assay (MTT only; DISC technically unsuccessful). The 80 year old patient had an assay which showed clear cut sensitivity to platinums and was independently confirmed to be alive 5 years following biopsy, but was censored, none-the-less, to ensure consistency in data analysis.

In the case of previously treated ovarian cancer patients, there were 327 cases where we had all three types of required demographic information (name, birth date, social security number) and 6 additional cases which did not have a social security number. These latter 6 cases were, likewise, censored. Thus, of a total of 450 cases (117 untreated and 333 treated), there were 442 cases (98.2%) in which we had complete information and which comprised the database for the survival comparisons, and only 8 cases (1.8%) which required censorship for lack of a social security number. For untreated patients, the corresponding numbers were 98.3% evaluable and 1.7% censored.

It is likewise important to note that all patients falling within a given assay category ("sensitive," "intermediate," "resistant") were included in the analysis, even if the patient was a very "early death," i.e. occurring within the first week from biopsy date. As in the case of the "censored" patients (for whom censoring actually weakened correlations), including "early deaths" likewise weakened correlations (e.g. as in the case of a previously-untreated patient with a highly "sensitive" assay, who died 7 days after the biopsy, before any chemotherapy was actually administered and before the assay was even completed (we did not learn of the patient death until after the assay had been completed and reported. The inclusion of this "early death" obviously weakened the correlations between assay results and patient survival, but the patient was not censored to maintain consistency). It was the intent to be as neutrally-objective as possible in the data analysis, to prevent bias from skewing the results.

Literature Cited

11. Hauser TH, Ho KK. Accuracy of on-line databases in determining vital status. J Clin Epidemiol 2001; 54:1267-1270.

12. Schubauer-Berigan M. Specificity of the National Death Index and the Social Security Administration Death Master File when information on social security number is lacking; NIOSH intramural study; 2001, World Wide Web, accessed October 28, 2002. http://www.cdc.gov/niosh/2001-133g.html#19.

13. Calle EE, Terrell DD. Utility of the National Death Index for ascertainment of mortality among cancer prevention study II participants. Am J Epidemiology 1993; 137:235-241.

14. Williams BC, Demitrack LB, Fries BE. The accuracy of the National Death Index when personal identifiers other than social security number are used. Am J Public Health 1992; 82:1145-1147.