National VA Rehabilitation Research and Development Center for Rehabilitative Auditory Research, Portland VA Medical Center, Portland, OR; Department of Otolaryngology, Oregon Health Sciences University, Portland, OR
Abstract--This study was conducted to document test-retest reliability of hearing thresholds using our computer-automated tinnitus matching technique and Etymotic ER-4B Canal PhoneTM insert earphones. The research design involved repeated threshold measurements both within and between sessions, and testing to evaluate the potential effect of eartip removal and reinsertion. Twenty normal-hearing subjects were evaluated over two testing sessions using a fully automated protocol for determining thresholds with 1-dB precision. Thresholds were first obtained at 0.5-16 kHz, in one-third octave frequency steps (16 test frequencies). The octave frequencies were then retested, first without removing the eartips, then after eartip removal and replacement. Responses between sessions differed by an average of 2.5 dB across all 16 test frequencies, and 91.5 percent of the repeated thresholds varied within ±5 dB (98.1 percent within ±10 dB). Reliability of within-sessions thresholds was also good, and there was no effect of eartip removal and replacement.
Key words: auditory threshold, hearing, reliability of results.
Efforts are ongoing at the Rehabilitation Research and Development (RR&D) National Center for Rehabilitative Auditory Research (NCRAR) to develop clinical techniques for quantifying the phantom acoustical sensations that define tinnitus. A basic premise of this work is that patients, by "listening" to their tinnitus, can control the adjustment of acoustical parameters of external sounds to match these parameters to their tinnitus. By so doing, an acoustical image of the tinnitus can be created that can be useful for a variety of clinical and research purposes (1,2). Using our automated testing technique, individuals with essentially non-fluctuating tinnitus can match their tinnitus loudness very reliably to pure tones across the audible frequency range (3). Additional studies are in progress to develop automated methods for matching tinnitus pitch, and for assessing other acoustical parameters of tinnitus such as its maskability and spectral content.
Historically, methodological variations for matching tinnitus loudness and pitch have been myriad. A common element of most methods, however, has been the requirement to obtain hearing thresholds. Each threshold serves as the level from which to begin matching tinnitus loudness at a given test frequency, and also as the point from which to calculate sensation levels of the loudness matches. Because loudness matches are usually determined to within 1 dB, hearing thresholds must also be obtained with 1-dB precision. Tinnitus pitch matching is often part of an interleaved testing protocol that involves evaluation of thresholds, loudness matches and pitch matches (4-6).
Our automated tinnitus-matching protocol also involves measurement of hearing thresholds. With the automated system, a number of factors could affect test-retest reliability of the thresholds, including: 1) a unique computer algorithm for obtaining thresholds; 2) the measurement of thresholds with 1-dB resolution; 3) the use of Etymotic Research (Elk Grove Village, IL) ER-4B Canal PhoneTM insert earphones; and, 4) reinsertion of eartips for the insert earphones. The present study was conducted, therefore, to demonstrate within-subject, within-session, and between-session reliability of hearing thresholds obtained with the automated tinnitus system, in a group of normal-hearing individuals.
Subjects
Twenty subjects with normal hearing sensitivity completed all
testing. One ear was selected as the test ear for each subject, and only
that ear was tested. For the test ear, the subjects were required to
have hearing thresholds <=25 dB Hearing Level (HL) at octave
frequencies from 0.25-8 kHz, and at 3 and 6 kHz. Subjects consisted of
16 females and 4 males ranging in age from 19-54 y (mean=33.9 y;
SD=10.8 y).
Computer-Automated Testing System
The equipment used for this study has been described in detail (3),
and is described briefly herein. There were four major system
components: 1) main computer; 2) subject computer; 3) signal-conditioning module; and, 4) the ER-4B insert earphones. A block diagram
of this system has been shown (refer to Figure 1, Henry et al.(3)). Both
the main and subject computers used the Microsoft Windows 95 operating
system, and all custom software was Windows 95 compatible.
Main Computer
The main computer (Dell Dimension, 166 MHz Pentium CPU) resided in a
control room, and was used to control all testing functions. A 16-bit
signal generator card (National Instruments, AT-DSP2200-128k) was
installed in one of the peripheral card slots of the computer. A custom
software application was developed to control all processes necessary
for the delivery of pure tone signals to the earphones, including
generation of pure tone signals from the signal generator card, and
attenuation parameters for the signal conditioning module.
The main computer was connected to the subject computer via a local area network (LAN) interface using standard networking protocols for two-way communication. The custom software application of the main computer communicated with the subject computer over the network. As pen-touch responses were made on the subject computer, the main computer received and analyzed these responses for program control and recorded the responses into data files. The software program of the main computer also provided dialog forms on the main-computer monitor for examiner entry of subject information, test session information, parameters for testing, and visual displays for monitoring testing status, progress, and results.
Subject Computer
The subject computer (Compaq Concerto 4/25) was selected specially
to provide the testing interface between the individual being tested and
the main computer. This notebook computer was enabled for Microsoft
Windows for Pen; that is, the subject used a pen-pointing device to
indicate responses by "pen-touching" the appropriate buttons on the
touch-sensitive video screen.
The subject computer resided in the testing booth. A remote custom software application, under control of the main computer, displayed testing instructions for the subject, received the subject's responses during testing, and transmitted response information to the main computer. Acoustic and electrical noise emanating from the subject computer was not a concern because the computer was operated under battery power, there was no fan, and the hard drive was disabled during testing.
Signal-Conditioning Module
A signal-conditioning module (custom-built at Oregon Hearing
Research Center, Portland, OR) was installed in-line between the signal
generator of the main computer and the earphones, and was used for
signal mixing, attenuation, and earphone buffering.
ER-4B Canal PhoneTM Insert Earphones
ER-4B Canal PhoneTM insert earphones (www.etymotic.com) are
designed to be used as high-fidelity studio monitor quality earphones.
Figure 1 provides photographs of the ER-4B earphone. The ER-4B
utilizes an ear-level transducer, eliminating the long tubing associated
with Etymotic TubephoneTM insert earphones. The ER-4B provides greater
overall output and enhanced high-frequency response (above 6000 Hz)
relative to the other insert phones (Figure 2). Sound output is
>100 dB Sound Pressure Level (SPL) output from 1 to 16 kHz, with <3
percent harmonic distortion. Black foam eartips (ER4-14F) from Etymotic
Research were used during both calibration and testing.
Figure 1.
Photographs of ER-4B Canal PhoneTM insert earphone a) shown prior to
insertion into human ear; b) shown coupled to the B&K Type 4157 ear simulator for calibration.
Figure 2.
Swept-frequency output, in dB SPL, for four types of Etymotic Research
insert earphones, using a fixed voltage in a Zwislocki coupler. The
ER-4B Canal PhoneTM had the highest relative output for frequencies
above 6000 Hz (data provided by Etymotic Research).
Instrumentation for Conventional Audiometry
Conventional-frequency (0.25-8 kHz) hearing thresholds were
obtained using a Virtual Corporation (Portland, Oregon) Model 320
audiometer with TDH-50P earphones in MX-41/AR cushions. Instrumentation
and procedures for manual threshold evaluation were as previously
described (7). Tympanometric screening was performed with a
Grason-Stadler GSI-37 Auto Tymp.
Calibration
Details of calibration have also been described (3). Briefly, output
of all pure tones was calibrated at the beginning of each test day,
using a custom automated-calibration application. The application used
serial interface control of a Bruel and Kjaer (B&K) Instruments
(Copenhagen) 2231 sound level meter with Type 1625 octave filter set.
The ER-4B insert earphone was coupled to the sound level meter using a
B&K Type 4157 ear simulator as shown in Figure 1b. A black foam
eartip of the same type used for testing (Etymotic ER4-14F) was applied
to the ER-4B earphone, inserted and aligned flush with the base of the
B&K DB2012 Ear Canal Extension (this ensured consistent placement for
calibration). Calibration values were stored in a database and later
accessed, while testing, to provide precise attenuation settings.
The conventional-frequency earphones (TDH-50P) were calibrated in compliance with American National Standards Institute standards (8) using a B&K 2231 sound-level meter with a one-third-octave band filter set in an artificial ear (B&K 4153).
Procedures
For each subject, procedures were conducted over two test sessions
that were separated by no more than 1 wk. Session 1 required 1-1.25 h
of time, and Session 2 required less than 1 h.
Initial Evaluation (Session 1 Only)
At the start of the first session, a short case history was obtained
to provide information regarding demographics, auditory and vestibular
disorders, and family history of hearing loss. Subjects were also asked
if they had been exposed to significant noise and, if so, they completed
a noise exposure questionnaire.
Tympanometric screening was performed with the Auto Tymp to rule out active middle-ear pathology. Before testing with the automated technique, hearing thresholds were obtained manually with the Virtual 320 audiometer at octave frequencies from 0.25-8 kHz, and at 3 and 6 kHz.
Selection of Test Ear
Subjects had little, if any, difference in hearing sensitivity
between ears. If one ear appeared to have better sensitivity, it was
chosen as the "test ear." If the ears were about equal in sensitivity,
the test ear was selected randomly.
Experimental Protocol (Both Sessions)
In order to evaluate test-retest reliability of threshold responses
of subjects both within and between sessions, thresholds were repeated
within sessions and all testing was repeated during a second session.
There were three stages of testing during each session. The first stage was to evaluate hearing thresholds at all frequencies in the frequency range 0.5-16 kHz, in one-third-octave steps (16 test frequencies). For the second stage, thresholds were repeated, but only at the octave frequencies between 0.5-16 kHz (six test frequencies). This second stage was conducted immediately following the first stage and without removing the foam eartip from the subject's ear canal. Testing in the third stage was identical to the second stage, except the foam eartip was removed and reinserted before retesting. With the eartip removed, the subject was encouraged to take a short break, which usually consisted of 5-10 min outside of the testing booth.
Foam Eartip Insertion
The examiner inserted the eartip for the ER-4B earphone by making
the outside eartip-surface flush with the concha bowl. If the eartip
could not be inserted to that depth, it was inserted as far as possible
without undue forcing.
Instructions to Subjects
Instructions for responding were presented at the beginning of each
of the three testing stages. This was accomplished by displaying the
instruction screen shown in Figure 3a. When subjects had read
and understood the instructions, they touched the "Go" button on the
screen with the pen device. The threshold-testing screen then appeared
(Figure 3b), and testing proceeded.
Figure 3.
Screen displays on subjects' notebook computer for hearing thresholds:
a) instructions; b) response screen.
Test Frequencies
Test frequencies for hearing thresholds obtained in Stage 1 included
0.5, 0.62, 0.8, 1, 1.26, 1.6, 2, 2.52, 3.18, 4, 5.04, 6.36, 8, 10.08,
12.7, and 16 kHz, and testing proceeded in a stepwise fashion, in this
frequency order. For Stages 2 and 3, only octave frequencies were
tested, which included 0.5, 1, 2, 4, 8, and 16 kHz.
Operational Definition of Hearing Thresholds
The goal for obtaining hearing thresholds with the automated system
was not to obtain hearing thresholds as defined normally (i.e., 50-percent response level). Rather, "threshold" was defined operationally
as the average of two minimum response levels determined using an
adaptation of the modified Hughson-Westlake audiometric test technique
(9). The two responses defining threshold were obtained during
presentation of tones in ascending 1-dB increments (i.e., during Stage
3).
Automated Testing for Hearing Thresholds
Details of the threshold-seeking algorithm were fully described in
Henry et al. (3). Briefly, initial presentation levels were fixed at 60
dB SPL for each test frequency. Three series of bracketing procedures
progressively reduced the step sizes to result in threshold responses
with 1-dB resolution. For Series 1, step increments were up 10 dB, down
20 dB, and the first response initiated the Series 2 algorithm. Series 2
and Series 3 used, respectively, increments of up 5 dB, down 10 dB and
up 1 dB, down 2 dB. Two responses were required for each of Series 2 and
3, and responses were averaged to obtain the minimum response level for
a series.
Conventional Hearing Thresholds
Given an equivalent input voltage, the ER-4B earphones provide
higher output and greater frequency response than other insert earphones
used for conventional audiometry (see Figure 2). Thus, the ER-4B
earphones offer advantages for audiometric testing, and could be used
for this purpose in the future. It was of interest, therefore, to make
within-subject comparisons of thresholds obtained with the ER-4B
earphones to thresholds obtained from the same subjects using the TDH-50
earphones. Such a comparison would provide preliminary normative
threshold data for the ER-4B earphones.
Mean thresholds were compared between the Virtual 320 audiometer and the automated system at test frequencies that were common to both systems (octaves from 500 to 8000 Hz). The threshold measurements using the Virtual 320 were obtained in dB HL. To compare between systems using the same dB metric, the dB SPL thresholds obtained with the ER-4B earphones were adjusted to dB HL using the reference equivalent threshold sound pressure levels (RETSPLs) for insert earphones calibrated in an occluded ear simulator (9). It should be noted that production of the same sound pressure level for both earphones in their respective calibration couplers did not ensure that the earphones produced the same sound pressure at the eardrum.
With this caveat in mind, Table 1 shows that the threshold means for the two systems differed by 1.0-10.2 dB at the different octave frequencies. To determine if these differences were significant, t-tests were calculated. Since multiple tests were performed on these data, Bonferroni corrections dictated significance levels to interpret the results (p<0.01 to correspond with 0.05 level for a single t-test). The mean thresholds were significantly different at 2, 4 and 8 kHz. All further threshold data are reported in dB SPL using the automated system and ER-4B Canal PhoneTM earphones.
Table 1. Mean hearing thresholds, in dB HL, obtained with two systems: (1) Virtual 320 audiometer with TDH-50P supra-aural earphones; and (2) automated system with ER-4B Canal PhoneTM earphones. |
|||
|
Mean hearing threshold (dB HL) |
---|
Frequency (Hz) | TDH-50P supra-aural earphones | ER-4B Canal PhoneTM earphones | p-value* |
---|
|
|||
500 | 2.0 | 4.7 | .0247 |
1000 | 4.3 | 5.3 | .2207 |
2000 | 4.3 | 7.0 | .0046 |
4000 | 10.8 | 3.8 | <.0001 |
8000 | 12.0 | 1.8 | <.0001 |
|
|||
* Results of paired t-tests; comparisons at 2000, 4000, and 8000 Hz were significant after corrections for multiple tests using Bonferroni's method. | |||
|
Between-Session Reliability
Within-Group Reliability Table 2 shows the across-subjects mean thresholds, in dB SPL, separated by test frequency, session, and stage of testing during each session. During Stage 1, the hearing threshold for each of the 16 test frequencies between 0.5 and 16 kHz was determined. For Stages 2 and 3, threshold testing was repeated, but only at the octave frequencies (0.5, 1, 2, 4, 8, and 16 kHz). There were thus six means for each of the octave frequencies, and repeated measures ANOVAs were calculated on these means at each octave frequency. When there were only two means (i.e., at non-octave frequencies), t-tests were calculated. The multiple tests required Bonferroni corrections to determine significance levels (p<0.008 to correspond with 0.05 level for a single ANOVA; p<0.005 to correspond with 0.05 level for a single t-test). None of the ANOVAs or t-tests revealed significant differences.
Table 2. Means of hearing thresholds, in dB SPL, obtained with automated system. Between Stages 2 and 3 during each session, foam eartips from insert earphones were removed and reinserted. |
|||||||
|
Freq (Hz) | Session 1 | Session 2 | p-value* |
---|
Stage 1 (All freqs) |
Stage 2 (Octave freqs) |
Stage 3 (Octave freqs) |
Stage 1 (All freqs) |
Stage 2 (All freqs) |
Stage 3 (Octave freqs) |
---|
|
|||||||
500 | 14.15 | 13.25 | 13.05 | 14.00 | 12.75 | 11.85 | .0141 |
620 | 11.55 | 11.05 | .3828 | ||||
800 | 9.75 | 9.60 | .7858 | ||||
1000 | 10.75 | 10.50 | 10.75 | 10.25 | 10.60 | 9.85 | .3531 |
1260 | 12.40 | 10.80 | .0252 | ||||
1580 | 13.00 | 12.35 | .4241 | ||||
2000 | 18.55 | 18.70 | 18.40 | 18.30 | 18.50 | 18.15 | .9181 |
2520 | 20.05 | 19.65 | .5219 | ||||
3180 | 19.40 | 18.65 | .4321 | ||||
4000 | 18.80 | 18.25 | 18.05 | 18.40 | 18.00 | 16.95 | .2886 |
5040 | 18.35 | 16.50 | .0506 | ||||
6340 | 19.45 | 18.40 | .3514 | ||||
8000 | 17.30 | 16.90 | 16.60 | 16.40 | 17.20 | 16.50 | .9116 |
10,080 | 35.70 | 35.00 | .5619 | ||||
12,700 | 47.50 | 46.70 | .4271 | ||||
16,000 | 66.00 | 66.06 | 67.00 | 65.18 | 61.88 | 65.00 | .1139 |
|
|||||||
* Results of repeated measures ANOVAs at octave frequencies (0.5, 1.2, 4, 8, 16 kHz); results of t-tests at non-octave frequencies. None of the ANOVAs or t-tests was significant after corrections for multiple tests using Bonferroni's method. | |||||||
|
Within-Subjects Reliability
Table 2 shows good reliability of threshold responses for
the subjects as a group, both within and between sessions. To evaluate
between-sessions reliability of responses, within subjects, differences
were calculated between individual repeated thresholds at each frequency
(Session 2, Stage 1 threshold minus Session 1, Stage 1 threshold). The
across-subjects means of these differences are shown in column 2 of
Table 3. These are the means of the actual differences, and thus
reflect the directionality of the responses between sessions.
Table 3. Means of individual differences in hearing thresholds, in dB, between Session 1 and Session 2. See text for full explanation of each column's data. (Diffs=Differences) |
||||||||
|
Freq (Hz) | Mean (dB) of actual diffs | Number of diffs > 0 | Number of diffs < 0 | Number of diffs = 0 | Standard deviation of diff scores (dB) | Pearson r* | r2 | Mean of absolute values of diffs (dB) |
---|
|
||||||||
500 | -0.15 | 9 | 6 | 5 | 2.93 | 0.853 | 0.728 | 2.15 |
620 | -0.50 | 7 | 10 | 3 | 2.50 | 0.865 | 0.748 | 2.00 |
800 | -0.15 | 10 | 8 | 2 | 2.43 | 0.859 | 0.738 | 2.05 |
1000 | -0.50 | 5 | 8 | 7 | 2.14 | 0.906 | 0.821 | 1.50 |
1260 | -1.60 | 6 | 13 | 1 | 2.95 | 0.806 | 0.650 | 3.00 |
1580 | -0.65 | 6 | 9 | 5 | 3.56 | 0.787 | 0.619 | 2.15 |
2000 | -0.25 | 6 | 10 | 4 | 1.94 | 0.971 | 0.943 | 1.45 |
2520 | -0.40 | 7 | 7 | 6 | 2.74 | 0.924 | 0.854 | 1.80 |
3180 | -0.75 | 6 | 10 | 4 | 4.18 | 0.837 | 0.701 | 2.55 |
4000 | -0.40 | 6 | 8 | 6 | 2.52 | 0.950 | 0.903 | 1.90 |
5040 | -1.85 | 5 | 15 | 0 | 3.96 | 0.879 | 0.773 | 2.85 |
6340 | -1.05 | 6 | 12 | 2 | 4.92 | 0.854 | 0.730 | 3.05 |
8000 | -0.90 | 5 | 14 | 1 | 3.64 | 0.949 | 0.901 | 3.10 |
10,080 | -0.70 | 7 | 9 | 4 | 5.30 | 0.948 | 0.899 | 3.60 |
12,700 | -0.80 | 9 | 10 | 1 | 4.41 | 0.986 | 0.972 | 3.50 |
16,000 | -0.82 | 5 | 9 | 3 | 4.02 | 0.983 | 0.966 | 2.94 |
|
||||||||
Average | -0.72 | 6.56 | 9.88 | 3.38 | 3.38 | 0.897 | 0.809 | 2.47 |
|
||||||||
* All correlation coefficients significant at p<0.0001. | ||||||||
|
It is noteworthy that all of these mean differences were negative, indicating a significant trend (p<0.05, Wilcoxon matched-pairs signed ranks test) for the Stage 1 mean threshold responses obtained at the second session to be less than those from the first session. The third column in Table 3 shows how many of the individual differences were positive at each frequency, which averaged 6.56 (out of a possible 20 individual differences), while column 4 shows an average of 9.88 negative differences. There was an average of 3.38 times, per frequency, when the thresholds were identical between Sessions 1 and 2 (column 5). The standard deviations of the between-sessions differences are shown in the next column, where it can be seen that they ranged from 1.94 dB to 5.30 dB, with an average standard deviation across frequencies of 3.38 dB.
Pearson product-moment correlations were also evaluated for each frequency, and the Pearson r's are shown in Table 3. Each of these r-values was >=0.787 (average r across frequencies=0.897), and all coefficients were significant at p<0.0001. The square of the correlation coefficient (r2) gives the proportion of the variance in the thresholds of the second session that is explained by the thresholds of the first session. These values ranged from 0.619-0.972, with a mean across frequencies of 0.809. Thus, approximately 81 percent of the variance in the Session 2 thresholds can be explained by the variance in the Session 1 thresholds. Put another way, 81 percent of the variance can be explained by the relationship between the Session 1 and Session 2 repeated thresholds, leaving an unaccountable variance of 19 percent.
The mean differences shown in column two of Table 3 are based on the actual differences in thresholds between Session 1 and Session 2. These means show the directionality of the responses, as described above. It was also of interest to determine the average magnitude of the differences between sessions. To do that, the absolute value of the between-session threshold difference for each subject was calculated before determining the across-subjects means at each frequency. These means of the absolute values of the between-sessions threshold differences are shown in the last column of Table 3. The means ranged from 1.45-3.60 dB. For the entire dataset of differences in hearing thresholds between Sessions 1 and 2, the average difference, ignoring the direction of the differences, was 2.47 dB.
Confidence Intervals for Difference Scores
The above analyses are based on group comparisons, with the
assumption that the individual subjects were reasonably representative
of the group. Reporting confidence intervals best shows the range of
individual between-sessions differences in hearing thresholds. These
intervals are shown in Table 4 with the numbers and percentages
of difference scores falling within each specified interval. Of the 317
between-sessions threshold differences that are represented in Table
4, 290 (91.5 percent) were within ±5 dB, 311 (98.1 percent) were
within ±10 dB, and 315 (99.4 percent) were within ±15 dB. Threshold
differences equaled 15 dB on only two occasions, and never equaled 20
dB.
Table 4. Confidence intervals for between-sessions differences in hearing thresholds. |
|||
|
Interval (dB) in which between-sessions threshold differences occurred |
---|
From (·) | To (<) | Number of differences* | Percent of differences |
---|
|
|||
-1 | 1 | 92 | 29.0 |
-2 | 2 | 166 | 52.4 |
-3 | 3 | 227 | 71.6 |
-4 | 4 | 269 | 84.9 |
-5 | 5 | 290 | 91.5 |
-10 | 10 | 311 | 98.1 |
-15 | 15 | 315 | 99.4 |
-20 | 20 | 317 | 100 |
|
|||
* Total number of between-sessions threshold differences=317. | |||
|
We also evaluated the confidence intervals at the individual test frequencies. These results are shown in Table 5, which is similar to Table 4 except that the percentages of responses for each dB interval are shown separately for each test frequency. These data indicate that, in general, between-session responses were more reliable at frequencies up to 1.26 kHz, with less reliable responses at the higher test frequencies.
Table 5. Confidence intervals for between-sessions differences in hearing thresholds. Each value represents the percentage of responses which occurred for each interval indicated. |
||||||||
|
Interval (dB) in which between-sessions threshold differences occurred | From (>=) | -1 | -2 | -3 | -4 | -5 | -10 | -15 |
---|---|---|---|---|---|---|---|---|
To (<) | 1 | 2 | 3 | 4 | 5 | 10 | 15 |
|
||||||||
0.5 | 30 | 45 | 70 | 85 | 90 | 100 | ||
0.62 | 25 | 65 | 75 | 90 | 100 | |||
0.8 | 15 | 45 | 80 | 95 | 100 | |||
1 | 45 | 70 | 85 | 95 | 100 | |||
1.26 | 10 | 20 | 50 | 85 | 100 | |||
1.58 | 30 | 60 | 90 | 90 | 95 | 95 | 100 | |
2 | 55 | 70 | 85 | 95 | 100 | |||
Frequency (kHz) | 2.52 | 35 | 60 | 90 | 90 | 95 | 100 | |
3.18 | 40 | 60 | 80 | 85 | 85 | 95 | 100 | |
4 | 30 | 60 | 75 | 95 | 95 | 100 | ||
5.04 | 21 | 63 | 79 | 95 | 100 | |||
6.34 | 42 | 59 | 68 | 95 | 95 | 100 | ||
8 | 25 | 45 | 50 | 70 | 85 | 100 | ||
10.08 | 25 | 40 | 60 | 65 | 80 | 90 | 100 | |
12.7 | 20 | 30 | 50 | 65 | 75 | 100 | ||
16 | 30 | 53 | 65 | 71 | 76 | 100 | ||
|
Within-Session Reliability
During each session, three thresholds were obtained at each of the
octave frequencies. This protocol enabled analyses of: 1)
within-subject, within-session response reliability; and, 2) the
potential effect of removing and reinserting the foam eartip of the
insert earphone before repeating the threshold measurement. Table
6 shows the means of the threshold differences between each possible
pair of tests (Stages 1, 2, and 3 as also shown in Table 2)
during each session.
Table 6. Means of actual values of individual differences in hearing thresholds. All means shown are for the various combinations of within-session differences. |
||||||
|
Session 1 | Session 2 |
---|
Freq (Hz) | Stage 2 minus Stage 1 |
Stage 3 minus Stage 1 |
Stage 3 minus Stage 2 |
Stage 2 minus Stage 1 |
Stage 3 minus Stage 1 |
Stage 3 minus Stage 2 |
---|
|
||||||
500 | -0.90 | -1.10 | -0.20 | -1.30 | -2.15 | -0.85 |
1000 | -0.25 | 0.00 | -0.25 | 0.35 | -0.40 | -0.75 |
2000 | 0.15 | -0.10 | -0.25 | 0.20 | -0.15 | -0.35 |
4000 | -0.55 | -0.75 | -0.20 | -0.80 | -1.45 | -1.05 |
8000 | -0.40 | -0.70 | -0.30 | 0.80 | 0.10 | -0.70 |
16,000 | 0.06 | 1.00 | 0.94 | -3.29 | -0.18 | 3.12 |
|
||||||
Average | -0.32 | -0.28 | -0.43 | -0.67 | -0.71 | -0.97 |
|
Stage 1 involved the baseline measurements (hearing thresholds at all 16 frequencies). For Stage 2, repeated thresholds were obtained at octave frequencies only, with the eartip left in place. Stage 3 involved repeated thresholds at octave frequencies only, with the eartip removed and reinserted.
Each difference score was calculated by subtracting an earlier response from a later response. The mean differences shown in Table 3, above, revealed a trend of Session 2 thresholds being lower than Session 1 thresholds, significantly more often than the reverse case. The within-session differences in Table 6 reflect the same trend (Wilcoxon, p<0.05). Of the 36 means shown in Table 6, 9 are positive and 26 are negative (with 1 mean being 0). The mean differences are again very small, with the average difference across the six conditions being less than 1 dB.
Table 6 shows the means of the actual differences in thresholds between the various within-sessions conditions, and, because differences could be positive or negative, Table 6 reflects the directionality of the paired responses. To reveal the magnitude of the individual differences in thresholds, the absolute value of each difference was calculated, and the means of the absolute values were determined (Table 7). The averages of these mean differences ranged from 1.28 to 2.93 dB.
Table 7. Means of absolute values of individual differences in hearing thresholds. All means shown are for the various combinations of within-session differences. |
||||||
|
Session 1 | Session 2 |
---|
Freq (Hz) | Stage 2 minus Stage 1 |
Stage 3 minus Stage 1 |
Stage 3 minus Stage 2 |
Stage 2 minus Stage 1 |
Stage 3 minus Stage 1 |
Stage 3 minus Stage 2 |
---|
|
||||||
500 | 1.70 | 2.50 | 1.30 | 2.30 | 3.35 | 2.85 |
1000 | 1.15 | 1.30 | 1.45 | 1.65 | 1.90 | 1.75 |
2000 | 1.05 | 1.30 | 0.55 | 1.70 | 2.25 | 1.85 |
4000 | 1.25 | 1.45 | 1.00 | 2.60 | 3.05 | 2.55 |
8000 | 1.50 | 2.70 | 3.10 | 2.00 | 2.60 | 3.00 |
16,000 | 1.00 | 1.47 | 1.77 | 4.24 | 3.00 | 5.59 |
|
||||||
Average | 1.28 | 1.79 | 1.53 | 2.42 | 2.69 | 2.93 |
|
It was a primary objective of the within-session study design to establish whether there was any effect on hearing thresholds when the foam eartip was removed and reinserted. To evaluate for that potential effect, t-tests were calculated, at each frequency, between the following means: "Stage 2 minus Stage 1" versus "Stage 3 minus Stage 1." This was done for both Session 1 and Session 2 pairs of means. None of these t-tests was significant (all p's <0.05). For completeness, t-tests were also calculated to examine for potential differences in thresholds between Stage 1 versus Stage 2, and Stage 1 versus Stage 3. Again, none of the t-tests was significant.
Confidence Intervals for Difference Scores
The range of individual between-sessions differences in hearing
thresholds is shown by reporting confidence intervals, seen in Table
4. Similarly, the range of within-sessions differences is shown in
Table 8. There were, however, multiple combinations of
differences to be reported for the within-sessions repeated thresholds.
For each session, three thresholds were obtained at each of the octave
frequencies, which allowed three difference scores to be calculated from
each session: 1) Stage 2 threshold minus Stage 1 threshold; 2) Stage 3
threshold minus Stage 1 threshold; and, 3) Stage 3 threshold minus Stage
2 threshold.
Table 8. Confidence intervals for within-sessions differences in hearing thresholds. |
||||||||
|
Percent of differences* |
---|
Interval (dB) in which within-sessions threshold differences occurred | Session 1 | Session 2 |
---|
From (>=) | To (<) | Stage 2 minus Stage 1 | Stage 3 minus Stage 1 | Stage 3 minus Stage 2 | Stage 2 minus Stage 1 | Stage 3 minus Stage 1 | Stage 3 minus Stage 2 |
---|
|
||||||||
-1 | 1 | 51.3 | 40.2 | 49.6 | 33.3 | 29.1 | 30.8 | |
-2 | 2 | 76.1 | 59.8 | 70.9 | 59.8 | 45.3 | 53.0 | |
-3 | 3 | 88.9 | 77.8 | 82.9 | 80.3 | 69.2 | 69.2 | |
-4 | 4 | 95.7 | 93.2 | 91.4 | 88.9 | 88.0 | 81.2 | |
-5 | 5 | 97.4 | 94.9 | 94.9 | 94.0 | 92.3 | 88.9 | |
-10 | 10 | 100 | 100 | 100 | 98.3 | 99.1 | 98.3 | |
-15 | 15 | 99.1 | 99.1 | 99.1 | ||||
-20 | 20 | 100 | 100 | 100 | ||||
|
||||||||
* Total number of within-sessions threshold differences=117. | ||||||||
|
Table 8 shows the percentages of difference scores for the various combinations within each specified confidence interval. For Session 1, the Stage 2 minus Stage 1 column shows the percentages of differences when testing was repeated without removing the eartips. The eartips were removed and replaced between Stages 2 and 3; thus, the next two columns in Table 8 (Stage 3 minus Stage 1, and Stage 3 minus Stage 2) reflect eartip replacement. In general, the percentages of differences were slightly higher for the no-replacement condition than for the replacement condition for each session.
Table 8 also shows that within-session reliability was somewhat better during Session 1 than during Session 2. During Session 1, 97.4 percent of the differences occurred within ±5 dB for the no-replacement condition, and 94.9 percent of the differences occurred within ±5 dB for each of the replacement conditions. The Session 2 respective percentages were 94.0 percent, 92.3 percent and 88.9 percent. For Session 1, 100 percent of the differences were within ±10 dB, while a few differences were between 10 and 20 dB for Session 2.
Our ultimate goal is to develop tinnitus assessment methodology suitable for routine clinical application. Attaining this goal will require the ability to conduct all testing rapidly, while maintaining a high level of test-retest response reliability. The automated method was developed specifically for quantification of acoustical parameters of tinnitus, and an essential component of such testing is the measurement of hearing thresholds. Although test-retest reliability of hearing thresholds is well documented, the unique features of the automated system required a system-specific analysis of threshold reliability. The purpose of the present study was, therefore, to demonstrate reliability of auditory thresholds using our computer-automated method.
Test-Retest Reliability of Pure-Tone Thresholds
Pure tone audiometry involves routine procedures that have been
thoroughly documented for response reliability by studies dating back to
the 1930s (10-13). Since that time, many studies have shown good
reliability of repeated threshold measurements in the
conventional-frequency <=8kHz) range (13-20). For high-frequency (>8
kHz) pure tone testing, standing waves have often been cited as a
concern (19,21-24). At frequencies >8 kHz, the quarter wavelength is
short enough to produce nodes and anti-nodes in the ear canal, resulting
in varied sound pressure across the surface of the tympanic membrane
(21). Thus, changes in the position of a transducer, unavoidable with
repeated testing, would be expected to have greater effects on higher
frequency tones in the ear canal than on lower frequency tones.
Therefore, investigators have compared threshold reliability between
conventional- and high-frequency ranges, and have reported that
reliability is equally good in both ranges (7,14,24-29).
1-dB Threshold Resolution
For most audiological applications, hearing thresholds are obtained
with 5-dB resolution; therefore, use of 5-dB step sizes was adopted for
the majority of reliability studies cited above. In the absence of
organic or non-organic change between tests, the standard error of the
estimated threshold (a measure of the intra-subject consistency) is
considered to be approximately 5 dB for both air- and bone-conduction
measurements (30,31). Clinical audiologists thus operate under the
assumption that repeated thresholds within ±5 dB reflect normal
tolerance for clinical error (13,32). Tinnitus loudness-matching,
however, requires step changes of 1 dB to obtain precise loudness
matches. Because the loudness matches are referenced to hearing
thresholds at each test frequency, the thresholds must also be obtained
with 1-dB precision. Thus, for the present study it was necessary to
obtain all thresholds to the nearest decibel.
Means of the actual differences in hearing thresholds were shown, both between sessions (Table 3) and within sessions (Table 6). These analyses reveal whether the thresholds trended higher or lower upon repeated testing (discussed in the next paragraph). The absolute values of these differences were also calculated, the means of which reveal the magnitude of the differences across subjects. These means generally ranged between 1 and 3 dB. For audiologists, the expected ±5 dB test-retest variability of hearing thresholds is predicated upon testing in 5-dB steps. Hearing thresholds are not normally obtained with 1-dB precision, thus there are no clinical norms for the variability of these measurements. However, results of this study indicate that the performance of this automated technique for obtaining reliable hearing thresholds is well within a clinically acceptable range. The mean differences between responses across all subjects and conditions were 1-3 dB, and 91.5 percent and 99.4 percent of the between-sessions threshold differences were within, respectively, ±5 dB, and ±10 dB. Our finding that 91.5 percent of differences are within ±5 dB indicates an improvement in test-retest reliability compared to previously reported data (33-35). Our data, therefore, suggest that greater precision of clinical thresholds may be achieved using a 1-dB step procedure as compared to the traditional use of 5-dB steps.
Learning/Practice Effect
There was a significant trend for the threshold measurements to
improve with repeated testing. All of the between-sessions mean
differences were negative (Table 3). These mean differences,
however, were small--all were less than 2 dB, and the average of the
means across the 16 test frequencies was only -0.72 dB. Improvements
in mean thresholds were also observed within sessions (Table 6).
These differences were again very small and averaged less than -1 dB.
The systematic improvement in absolute auditory thresholds after repeated measurements has been reported previously (36). Zwislocki et al. studied this effect under various experimental conditions and concluded that the threshold of audibility improves with practice. The improvement was attributed to effects of practice and motivation, and thresholds were noted to improve during several experimental sessions. The effect was also postulated to be due to the discrimination of tones against a background of physiological noise, and, with practice, this discrimination ability becomes more sensitive. Improvements in thresholds with repeated testing have been reported by additional investigations (10,18,19,37-39). Other studies, however, have shown no improvement in thresholds with repeated testing (15,40-43).
Although the practice/learning effect for thresholds is equivocal in the literature, the present data suggest that there is such an improvement in normal-hearing individuals. Our results agree with those of the one study that tested systematically for this effect (36). Although not stated in the study by Zwislocki, it is likely that his listeners also had normal hearing. The data from the present study, along with those from the Zwislocki study, together argue strongly that this effect occurs when hearing sensitivity is normal. There is yet the need to determine if this effect also occurs for subjects with cochlear hearing loss.
Automatic Audiometry
The present study was a component of a larger project that is
designed to develop automated methodology for obtaining
tinnitus-matching measurements. Thus, development of computer automation
to obtain hearing thresholds was not an end in itself. However, because
of the history of attempts to develop automatic audiometry as an
alternative to traditional manual audiometry, these data contribute to
this area and some relevant comments are warranted.
The defining characteristics of automatic methods for pure tone audiometry are that the listener maintains control over the level of stimulus presentation and that at least some of the procedures are automated (44). The first automatic, self-recording audiometer was described by von Bekesy in 1947 (45). That audiometer produced a sweep-frequency tone, and in 1956, a fixed-frequency version appeared, inviting direct comparison with manual audiometers. A number of studies were conducted subsequently to compare hearing thresholds, in the same individuals, between manual and self-recording audiometers. Most generally, these studies showed that self-recording audiometry resulted in slightly more sensitive thresholds than manual audiometry (46). Using 1-dB step sizes, most studies have shown an improved sensitivity of 1-2 dB with automatic audiometry, while an average difference of about 3 dB was reported by Robinson and Whittle (39).
For the present study, mean thresholds were compared between the conventional audiometer and the automated system at octave frequencies between 500-8000 Hz (Table 1). The use of different headphones (TDH-50P supra-aural versus ER-4B insert) required the caveat that, although dB HL was matched between earphones (8), the pressure produced at the eardrum was not necessarily equal because of the different acoustic characteristics of both the earphones and the couplers.
The advent of microcomputers provided another method for conducting automatic audiometry, and the potential advantages of computerized audiometry were recognized as long ago as 1971 (47). At that time, it was considered a "foregone conclusion" that computer-driven audiometry would supplant manual audiometry. Such a transition has obviously never occurred, but automatic audiometry has found utility for certain applications, especially industrial audiometric testing. The use of automated testing in the audiology clinic would require programming of computer algorithms to perform testing at the level of a skilled audiologist. This may be feasible for unmasked pure-tone air conduction audiometry, but sophisticated masking and bone conduction techniques may never be adaptable to automation. Nevertheless, just as automated testing is used for industrial monitoring purposes, it could also have application for ototoxicity monitoring.
The main problem with ototoxicity monitoring is the difficulty obtaining audiometric data from patients at repeated intervals. Whether these patients are in the hospital or in their homes, scheduling their repeated audiometric exams has proven to be cumbersome, and impossible in many cases. Consequently, many patients who are included in an ototoxicity-monitoring program do not receive the level of service that is available to prevent significant ototoxic effects. These kinds of difficulties make ototoxicity-monitoring programs difficult to operate effectively, and may be the reason such programs are scarce, despite published standards for early detection of ototoxicity (48).
In the present study, the differences in hearing thresholds between-sessions did not produce a single value that would have met the ASHA (1994) criteria for ototoxicity (48). Thus, this technique has the potential to reduce false positive responses that are associated with ototoxicity monitoring. To investigate this further, a threshold reliability evaluation of the automated technique should be conducted in a population of patients not receiving ototoxic drugs.
Etymotic Research ER-4B Canal PhoneTM Earphones
When faced with the decision of selecting earphones for use with the
automated testing system, our primary concern was to use earphones that
were capable of reproducing tones at high levels throughout the
frequency range of 0.5-16 kHz for tinnitus matching. Testing at high
frequencies (>8 kHz) requires high output capability due to the
gradual reduction in human auditory sensitivity in this frequency range.
After evaluating the commercial possibilities, the Etymotic ER-4B Canal
PhoneTM earphones appeared to provide the best performance
characteristics for our application. Considering the availability of a
variety of circum-aural and insert earphones that are intended
specifically for audiometric application, selection of an in-the-ear
transducer that was designed for listening to binaural recordings was
unexpected. The present study has demonstrated that the ER-4B has
practical application for use as a single earphone transducer to
evaluate an extended range of auditory sensitivity.
In addition to its utility for full-frequency testing capability, the ER-4B shares in the advantages that are offered by any type of insert earphone. Some of the most obvious advantages include the reduction of ambient noise during testing (49,50) and the significant increase in inter-aural attenuation (51,52). Lilly and Purdy (53) have described other advantages of insert earphones relative to supra-aural/circum-aural earphones.
The present study has further documented that test-retest reliability of threshold sensitivity using the ER-4B is at least as good as that shown for other insert earphones and for traditional audiometric earphones. Studies have compared test-retest reliability of hearing thresholds using the Etymotic ER-3A TubephoneTM versus standard supra-aural earphones, including the TDH-50 (34,54), TDH-39 (55) and TDH-49 (56). These studies all showed that reliability of thresholds for frequencies up to 8 kHz was at least as good for the ER-3A as for the standard audiometric earphones.
Other studies have shown good threshold reliability with insert earphones for frequencies above 8 kHz. Tang and Letowski (57) obtained repeated thresholds at 10-16 kHz using the Sennheiser HD-250 circum-aural earphone and Etymotic ER-1 TubephoneTM. Their results revealed significantly smaller variability with the insert earphones. Valente, Valente, and Goebel (58) compared test-retest reliability of high-frequency thresholds up to 18 kHz using the Koss HV/1A+ versus the Etymotic ER-2 TubephoneTM. Intra-subject response variability was comparable between the two earphones.
The present study adds to this literature by showing that the Etymotic ER-4B earphones can provide response reliability that is comparable to all earphones that have been demonstrated to be reliable for testing auditory sensitivity.
Reinsertion of Foam Eartips
An additional concern addressed by this study was whether removal
and replacement of the ER-4B foam eartips might have an effect on
threshold reliability. This is a particularly important question when
testing at higher frequencies where standing waves might be affected by
earphone placement, with the potential to significantly affect sound
pressure level at the eardrum.
Hickling (19) found that when TDH-39 supra-aural earphones were removed and replaced between tests, the reliability of 6 and 8 kHz thresholds was significantly poorer than when earphones were left in place during repeated testing. Earphone replacement did not have an effect at 1 and 2 kHz. The effect at 6 and 8 kHz was attributed to standing wave formation at these frequencies. Erlandsson et al. (40) found greater variability of repeated auditory thresholds when a circum-aural earphone was repeatedly replaced versus when thresholds were retested with the earphone fixed in position for each repetition. The authors suggested that a circum-aural earphone deforms the pinna, which can affect the transmission of sound pressure to the ear canal. Gauz, Robinson, and Peters (59) found no effect on threshold measurements when a circum-aural earphone was replaced.
Stelmachowicz et al. (60) compared reliability of high-frequency (8-20 kHz) thresholds using two systems. One was a prototype high-frequency audiometer, originally described by Stevens et al. (21), which used a 60-cm plastic tube to couple the high-frequency transducer to the ear of the subject. The other system used Koss HV/X supra-aural earphones. Repeated thresholds were obtained without moving the earphones. The earphones were then removed and replaced, and a third set of thresholds was obtained. For both systems, replacement of earphones resulted in a slightly higher standard error of measurement (SEM) than when earphones were left in place, with the supra-aural earphones having the best response reliability with replacement.
Larson et al. (34) conducted test-retest measurements using the ER-3A TubephoneTM. A component of that study was to conduct two retests, one with the ER-3A eartips left in place, and the second after removal and replacement of the eartips. They found no significant effect on test reliability when eartips were replaced. Larson's study is the only one we know of that evaluated threshold reliability between the two conditions of eartips fixed versus replaced. The present study confirms the results of Larson for reliability of thresholds using insert-style earphones.
These data validate use of our automated technique for obtaining reliable hearing thresholds. Results of this study may have further generalized application, including: 1) confirmation that the ER-4B Canal PhoneTM earphones can be used for clinical audiometry; 2) the ER-4B eartips can be removed and reinserted without appreciably affecting the measurements; and, 3) 1-dB step sizes can be used for obtaining precise tinnitus matching measurements. In addition to using the technique for tinnitus matching, there may be further uses for applications requiring serial monitoring of auditory thresholds, such as hearing conservation and ototoxicity monitoring.
Go to TOP.
![]() Previous |
![]() Contents |
![]() Next |
---|
Last revised Fri 8/24/2001; comments, problems, etc., to WM.