Volume 46 Number 8, 2009
Pages 1021 — 1036
Abstract — This study compared three different signal-processing principles (eight basic algorithms)-transposing, modulating, and filtering-to find the principle(s)/algorithm(s) that resulted in the best tactile identification of environmental sounds. The subjects were 19 volunteers (9 female/10 male) who were between 18 and 50 years old and profoundly hearing impaired. We processed sounds produced by 45 representative environmental events with the different algorithms and presented them to subjects as tactile stimuli using a wide-band stationary vibrator. We compared eight algorithms based on the three principles (one unprocessed, as reference). The subjects identified the stimuli by choosing among 10 alternatives drawn from the 45 events. We found that algorithm and subject were significant factors affecting the results (repeated measures analysis of variance, p < 0.001). We also found large differences between individuals regarding which algorithm was best. The test-retest variability was small (mean +/- 95% confidence interval = 8 +/- 3 percentage units), and no correlation was noted between identification score and individual vibratory thresholds. One transposing algorithm and two modulating algorithms led to significantly better results than did the unprocessed signals (p < 0.05). Thus, the two principles of transposing and modulating were appropriate, whereas filtering was unsuccessful. In future work, the two transposing algorithms and the modulating algorithm will be used in tests with a portable vibrator for people with dual sensory impairment (hearing and vision).
Key words: deaf, deafblind, environmental sound, identification, modulating, monitoring, perception, tactile, transposing, vibration.
For humans, vision and hearing are two highly important senses for acquiring information about the surrounding world. When one of these senses is severely impaired (the person is blind or profoundly hearing impaired), the other sense compensates to a great extent. Consequently, people with dual sensory impairment (hearing and vision) receive very limited information about events in their surroundings. They cannot detect someone approaching them until they feel vibrations from footsteps, smell the person's perfume, perceive the person's body heat or breath at a short distance, or finally, feel the person touching them. This lack of information about environmental events makes it difficult for people with dual sensory impairment to know what is happening around them; it creates difficulties in planning and may also cause fear or anxiousness .
Borg at al. interviewed 19 people with dual sensory impairment about their feelings on the shortage of information about events in their surroundings and their experiences and strategies . They found that people with dual sensory impairment regard this lack of environmental information as a significant problem and that they compensate for their impaired vision and hearing using the cutaneous senses (e.g., vibratory sense) to perceive the vibrations (sounds) produced by events. They also receive information through olfaction and are able to sense temperature and drafts (streams of air). The participants in Borg et al.'s study showed an interest in a portable aid that could help them monitor the environment.
Transformation of visual information into tactile or acoustic information has been attempted in several studies aimed at improving the environmental perception of blind people [2-6]. Johnson and Higgins developed a wearable device that converts the visual information picked up by a Webcam into tactile signals, which are then presented by 14 vibrating motors spaced on a flexible belt . The results showed that the device was useful for object detection. Bach-y-Rita et al. used a 7 × 7-point electrotactile array, placed on the tongue, to study the form perception of objects shaped as circles, squares, and vertex-up equilateral triangles of different sizes . Subjects identified 79.8 percent of the objects correctly. In further studies, Bach-y-Rita et al. developed a tactile vision system in which the optical images picked up by a head-mounted television camera were transduced into vibratory or direct electrical stimulation [2-4]. The stimu-li were displayed on a vibrotactile array, which could be sensed by the skin, a finger, or the tongue. After the blind subjects were adequately trained, they could identify objects in space. In the present study, we focused on translating sounds produced by events into vibrations that could be presented to the fingers and palm of the hand.
The possibility of receiving speech information through the vibratory sense-"tactile transfer of speech"-has been studied extensively in people with profound hearing impairment . The speechreading of adults with postlingual profound hearing impairment (persons who became hearing impaired after having established their oral language) improved when they combined speechreading with a tactile speech signal presented through a vibratory aid. The Sentiphone, MiniVib4, TAM, and Tactaid II and VII are examples of aids that use tactile transfer of speech information for profoundly hearing impaired persons who receive insignificant or no benefit from conventional hearing aids [8-10]. Traunmüller showed that the Sentiphone decreased word errors from 24 percent for speechreading alone to 3.3 percent for speechreading combined with the vibratory aid . In the MiniVib4, a 220 Hz sine wave is amplitude-modulated by the envelope of the input signal (extracted by low-pass filtering at 30 Hz) in the frequency range 500 to 2,300 Hz . Users of this tactile aid experienced improved speechreading (MiniVib4; Stockholm, Sweden; http://www.specialinstrument.se/), exhibited better control of their own voice, and became aware of sounds in their surroundings . Spens and Plant showed that subjects who were aided with a single-channel tactile aid rated their disability as less severe than when unaided because they could sense environmental sounds better .
Plant and Spens studied the speech perception skills of a Swedish man with postlingual profound hearing impairment in two languages (Swedish and English) with and without vibrotactile support . The tested subject used the "tactiling" method , in which he detected the vibrations that accompany speech by placing his thumb directly on the speaker's throat as a supplement to speechreading. The aid consisted of a throat microphone, amplifier, and a handheld bone vibrator. The results showed that the tracking rate in Swedish increased from around 40 words per minute with speechreading alone to 60 to 65 words per minute with speechreading plus the tactile aid. The subject also showed improved tracking rates for English materials, thus, for a nonnative language. Tactile cues led to improvements of around 40 percent over speechreading alone .
Today, vibrotactile aids are also used to inform the person with profound hearing impairment or dual sensory impairment about a limited number of selected events. For example, the vibratory aid Lynx Tactum (GN ReSound ALD Division; Stockholm, Sweden) is a watch with a vibrator that directs the user's attention to up to seven different events, e.g., baby crying, telephone signal, door signal, fire alarm, awakening alarm (http://www.gnresound-ald.com/Lynx.pdf).
Some people with dual sensory impairment use vibrotactile aids for speech perception, and promising results have been obtained . However, these preliminary studies have not been published.
Currently, cochlear implantation is also an expanding technique for those with dual sensory impairment. However, it requires surgery and functioning auditory nerves and is considerably more expensive than vibrotactile aids. Few adults who were born with profound hearing impairment or dual sensory impairment and use sign language are interested in obtaining a cochlear implant [7,17] because they cannot develop oral language (they are too old) and because, as a rule, their sign language communication is good. Another reason is that they risk encountering a cultural collision: the hearing culture versus the deaf culture [17-19].
Use of the tactile sense for perceiving sound, e.g., for speech communication, is hindered by the narrow bandwidth of the tactile sense and the physical properties of the vibrators. Four different types of mechanoreceptors are present in the skin, and they sense different aspects of mechanical energy applied to the skin. Fast adapting (FA) mechanoreceptors, which are divided into two different types (FA I and FA II), are sensitive to changes in the skin such as tangential forces (friction) and orthogonal vibrations. FA I receptors sense vibrations below 50 Hz, and FA II receptors are sensitive to vibrations above 50 Hz. Slowly adapting (SA) mechanoreceptors are divided into two types (SA I and SA II) and sense pressure. SA I receptors register pressure near the skin's surface and in a small field, whereas SA II receptors register pressure deeper in the skin and in a large field .
The vibration threshold of the skin primarily depends on receptor density. It is affected by several factors, for example, body location, stimulus frequency and duration, temperature, sex, age, and hairiness. For example, the distal pad of the middle finger has a lower tactile detection threshold than does the thenar eminence and the volar forearm and women and adults have higher vibrotactile detection thresholds than do men and children, respectively .
The vibration threshold also varies depending on the properties of the vibrator, for example, size of the contact area, size of the surrounding area, size of the gap between the contact area and the surrounding area, pressure (probe force), skin indentation, and measurement method [8,12, 21-24]. For example, the vibrotactile detection threshold at 200 Hz, with a contact size of 0.28 cm2 at the thenar eminence, is -8 dB related to 1 mm, while the corresponding value is -18 dB at the distal pad of the middle finger and +18 dB on the forearm .
The detection threshold is frequency dependent and is lowest for 250 Hz at the thenar eminence . Sensitivity increases with the duration of the signal. The dynamic range in which the skin can detect vibrations, the range between the lowest sensitivity level (detection level) and the highest sensitivity level (unpleasant or painful level), is 55 dB (compared with about 113 dB for hearing). The skin is more sensitive to vibrations at temperatures around 30 ×C than 15 ×C. Contact force (pressure) affects high frequencies such that larger contact force leads to lower thresholds .
Further, the skin has limitations in frequency discrimination, separation of nonsimultaneous signals (frequency difference [Df ] 30%), and intensity discrimination (intensity difference [DI ] 0.7 dB), which again depend on body location . These values can be compared with Df 0.3 percent and DI 0.3 dB (depending on the frequency and intensity level of the stimulus), respectively, for the auditory system .
The skin also has poorer frequency resolution and ability to separate simultaneous signals than the auditory system. The frequency resolution of the auditory system is 10 percent. For the skin to resolve two simultaneous signals with different frequencies, the frequencies must be in the sensitivity range of the different mechanoreceptors; i.e., one must be below about 50 Hz and the other must be above about 50 Hz [27-28]. Differentiating frequency resolution from frequency discrimination is important, as sound coding depends on both.
The skin and auditory system also have different temporal resolution (gap detection) thresholds depending on, for example, the stimulus (sinusoids or noise) and age of the subject. The gap detection of the auditory system is approximately 3 ms, while the gap detection of the skin is approximately 10 ms (100 Hz) for sinusoids. The gap detection threshold of the skin is lower for bursts of sinusoids than for bursts of band-limited noise. The maximal vibratory sensitivity for amplitude modulations (using sinusoids as carriers) occurs at modulation frequencies of 20 to 40 Hz [8,29-30].
For cochlear implants, multiple channels are generally accepted as having a clear advantage in signaling the processed sounds. Furthermore, no major disadvantages exist, except regarding financial concerns and availability. On the other hand, tactile aids with several vibrators are cumbersome, result in increased demands on the physical design (bigger and with additional cables), and require a larger electrical power supply. Regarding these concerns with multiple vibrators, our focus is on developing a single-channel device that is easy to handle, has low battery costs, and is emotionally and socially acceptable to the user.
In conclusion, presenting sounds as vibrations has its limitations and the sounds must be processed to fit the properties of the receptors at the location on the body where the vibrations will be presented.
The general aim of our ongoing project is to develop a vibrotactile aid that will help people with dual sensory impairment monitor sounds produced by environmental events. The aided person is expected to interpret the vibrations and obtain essential information about her or his surroundings that can be given meaningful interpretations with the help of additional contextual information.
In a previous study, six algorithms developed on the basis of two principles (transposition and modulation) were used to process environmental sounds and tested using the hearing sense . In the present study, the six basic algorithms (three transposing and three modulating) from the previous study (with some modifications) and seven additional algorithms were used to process the same environmental sounds and tested using the vibratory sense. The seven additional algorithms consisted of five of the basic algorithms adapted to the vibratory thresholds of the skin, one filtering (based on filtration principle), and one with no processing (NP), i.e., original sounds used as a reference.
The specific purpose of the present study was to develop/modify and test signal-processing algorithms based on three different principles-transposing, modulating, and filtering-using a stationary wide-band vibrator and to choose the principle(s)/algorithm(s) most suitable for identification of environmental events by the cutaneous senses.
Nineteen volunteers with profound hearing impairment (nine female/ten male) between 18 and 50 years of age participated. No subject had a hearing threshold better than 65, 70, 70, or 70 dB hearing level at the frequencies 250, 500, 750, and 1,000 Hz, respectively, and at higher frequencies, none had any hearing within the limits of the audiometer. Thirteen of the subjects were born with profound hearing impairment, six had become profoundly hearing impaired as children or teenagers, and all had sign language as their first language. The subjects were profoundly hearing impaired and did not hear the sounds produced by the vibrator. Therefore, we did not need to mask their hearing and possible problems associated with incomplete masking were avoided. The subjects had different experiences of using vibrations for observing environmental events, e.g., intentionally observing vibrations in a table or in the floor. The subjects were all members of the network "Dovas" (www.dovas.se); lived in örebro, Sweden; and had introduced themselves as deaf on their home pages.
The test sounds used in the present experiment were the same 45 environmental sounds (Table 1) used in previous experiments by Ranjbar et al. . The sounds were selected by people with both normal hearing and dual sensory impairment who classified the events causing the sounds as the most important to be informed about (described in more detail in Borg et al.  and Ranj-bar et al. ).
Table 1.Sound number and label of environmental event (sound) used in experiments.
Sound No. Environmental Sound Sound No. Environmental Sound Sound No. Environmental Sound
1 Doorbell 16 Two Men Talking 31 Noise from Breeze 2 Stream Murmur 17 Telephone Signalling Several Times 32 Spectator Excitement 3 Dripping Water 18 Door Opening and Closing 33 House Alarm 4 Heavy Traffic 19 Frying Bacon 34 Copier 5 Car Signalling Several Times 20 Water Running 35 Restaurant Buzz 6 Barking Dog 21 Coffee Maker 36 Keyboard 7 Wave 22 Washing Machine 37 Cutting Wood 8 People Laughing 23 Vacuum Cleaner 38 Cat Meowing 9 Bird Song 24 Toilet Flushing Twice 39 Signal at Crossing 10 Thunder Followed by Rain 25 Rain on Window 40 Hammer Blow 11 Train that Slows Down and Drives Past 26 Boiling Water 41 Opening Champagne Twice 12 Person Sneezing 27 Tractor Comes, Stops, and Idles 42 Riding Horse 13 Motorcycle Passing 28 Cry from Loudspeaker 43 Hiccup 14 Bicycle Bell 29 Person Walking on Gravel 44 Cow Mooing 15 Signal from Ice Cream Truck 30 Cutlery Clatter 45 Helicopter
The acoustic analysis showed that, for most of the sounds, the dominating spectral components were below 2,000 Hz, though some had important components up to 8,000 Hz. The temporal characteristics of the sounds were analyzed with the Soundswell Signal Workstation (Saven Hitech AB; Stockholm, Sweden). Spectral analysis of the envelope showed that most of the temporal information was in the range below about 10 Hz for the vast majority of sounds (90%).
The algorithms were implemented in MATLAB, version 7.0.4 (The MathWorks, Inc; Natick, Massachusetts). The sounds were played by a computer (Pentium® 4, 1.70 GHz, 256 MB RAM) and presented using a wide-band vibrator (Brüel & Kjær shaker type 4810 [Nærum, Denmark], weighing 1.1 kg). The vibrator was placed on a solid stand on the floor to keep it stable (Figure 1). An accelerometer (charge accelerometer, Brüel & Kjær type 4371) was placed on the membrane of the vibrator to measure the acceleration of the vibrations (which was recalculated automatically to amplitude in micrometers by integrating twice). Finally, the accelerometer was connected to a 10æ0 mm-long rod of only 1 mm diameter (to keep the weight low). The rod was surrounded by a metal tube (15 mm diameter) for protection. At the end of the rod, a cap with an area of 0.32 cm2 was mounted, and this cap made contact with the subject's skin. The amplitude values measured by the accelerometer were displayed by vibration exciter control (Brüel & Kjær type 1050).
Eight basic algorithms were developed to process the environmental sounds based on three principles: (1) transposing (transposing frequency components with highest amplitude [range 100-8,000 Hz to range 30-800 Hz] [TRHA]; transferring the sum of complex frequency components within every 1/3 octave [range 100-8,000 Hz to range 200-800 Hz] [TR1/3]; and transposing frequency range 1,200-2,400 Hz to 100-700 Hz [TR]), (2) modulating (amplitude modulation [250 Hz carrier wave] [AM]; amplitude and frequency modulation [250 Hz carrier wave] [AMFM]; and amplitude modulation with multiple channels [AMMC]), and (3) filtering (filtering using threshold of vibratory sense [equalizing] [EQ]) (Table 2). In addition, the NP algorithm (in which the sounds were presented in their original form) was used as a reference.
Table 2.Eight algorithms used to signal-process 45 environmental sounds. Versions of algorithms TRHA, TR1/3, TR, AMFM, and AMMC existed in which, in second version, sounds were also adapted (basic + EQ) to vibratory thresholds of skin. Algorithm AM was tested twice.
TRHA Transposing frequency components with highest amplitude in range 100-8,000 Hz to range 30-800 Hz (two versions: basic + adapted). TR1/3 Transferring sum of complex frequency components within every 1/3 octave in range 100-8,000 Hz to range 200-800 Hz (two versions: basic + adapted). TR Transposing frequency range 1,200-2,400 Hz to 100-700 Hz (two versions: basic + adapted). AM Amplitude modulation of 250 Hz carrier wave (tested twice: test + retest). AMFM Amplitude and frequency modulation of 250 Hz carrier wave (two versions: basic + adapted). AMMC Amplitude modulation with multiple channels (two versions: basic + adapted). EQ Filtering using threshold of vibratory sense (equalizing). NP No processing (i.e., original sounds).
The TRHA, TR1/3, TR, AMFM, and AMMC algorithms were also
tested in combination with the EQ algorithm, which was used for equalizing (adapting) with respect to the vibratory thresholds of the skin . The AM algorithm was not tested after adaptation to the skin because the output was dominated by one frequency (250 Hz), which would be practically the same after adaptation. The AM algorithm was tested twice: test and retest. In total, 14 test sequences (TRHA, TR1/3, TR, AMFM, AMMC, EQ, NP, AM test, AM retest, TRHA + EQ, TR1/3 + EQ, TR + EQ, AMFM + EQ, and AMMC + EQ) were evaluated. For a detailed description of the algorithms, see Ranjbar et al. .
The TRHA, TR, AM, AMFM, and AMMC algorithms were the same as those used in the previous study in which the processed sounds were identified auditorily . The TR1/3 algorithm was modified to make use of a wider range of the vibratory sensitivity of the skin than the frequency range used in the previous study .
The original sounds were sampled at a sampling frequency (FS) of 16,000 Hz, with an antialiasing filter just below 8,000 Hz. After signal processing, we down-sampled the sounds to 2,000 Hz using a decimate function that filtered the data with an eighth-order, Chebyshev Type I, low-pass filter, with a cutoff frequency of 800 Hz.
We transposed the important acoustic information in the frequency range 100 to 8,000 Hz to the "sensitive" frequency range 30 to 800 Hz by transposing the 24 frequency components with the highest energy using a Fourier transform-based method. The algorithm was the same as that used in the previous study . Good temporal information was maintained because no low-pass filtering was used.
We transposed the frequency components within the range 150 to 300 Hz (containing the fundamental frequency of speech, f0) to the frequency range 50 to 200 Hz (Figure 2). Further, we fed the input signal to a filter bank consisting of 13 third-order Butterworth band-pass filters (18 dB/octave). The pass-bands (3 dB cutoff) were 300 to 400; 400 to 500; 500 to 600; 600 to 800; 800 to 1,000; 1,000 to 1,200; 1,200 to 1,600; 1,600 to 2,000; 2,000 to 2,400; 2,400 to 3,200; 3,200 to 4,000; 4,000 to 5,300; and 5,300 to 6,600 Hz. The outputs from the filters were rectified and low-pass filtered (third-order Butterworth, cutoff frequency 10 Hz, 18 dB/octave), thus an envelope signal for each pass-band was obtained. We then used these 13 signals to amplitude-modulate 13 carrier waves: frequencies 307, 353, 379, 419, 431, 461, 509, 557, 577, 593, 631, 673, and 701 Hz. Further, we frequency-modulated the carriers (deviation typically ±50% of the carrier frequency) by independent uniformly distributed noise to avoid interference effects. We obtained the total output by adding the transposed signal (50-200 Hz) and the 13 modulated signals just described. This algorithm can be regarded as both modulating and transposing, though transposition dominates.
We transferred the frequencies within the range 1,200 to 2,400 Hz containing the most important acoustic information for identification of environmental sounds  to a low-frequency hearing range with the best sensitivity for the skin . Using a Fourier transform method, we transposed the frequencies within the range 1,200 to 2,400 Hz to the frequency range 100 to 700 Hz after first adding the complex frequency components in pairs, thereby decreasing the number of frequencies by one-half. We removed the spectral components outside the frequency range 1,200 to 2,400 Hz. No temporal information was removed through low-pass filtering of the envelope.
We transferred the temporal pattern of the environmental sound that contained important information for auditory identification  to a low frequency range by amplitude-modulating a sine signal (250 Hz) with the envelope of the input signal. We extracted the envelope of the input signal by first rectifying the waveform and then filtering with a three-pole, low-pass Butterworth filter (18 dB/octave) at a cutoff frequency of 10 Hz. We chose the frequency 250 Hz because it is in the range of the lowest vibration threshold .
We both amplitude- and frequency-modulated the environmental sounds with the purpose of transferring the temporal and spectral information of the sounds to the low-frequency range. The idea was based on the study by Gygi et al. , which showed that the frequency composition and amplitude variations of the sound both carry important information . First, we extracted the envelope of the input signal by rectifying and then low-pass filtering at 10 Hz, as in the AM algorithm. Thereafter, we frequency-modulated a 250 Hz carrier signal by the derivative (it enhances the time variations, especially transients) of the envelope. Finally, we amplitude-modulated the resulting frequency-modulated carrier signal by the envelope.
We first filtered the input signal using six third-order Butterworth band-pass filters with different pass-bands: 120 to 240; 240 to 480; 480 to 960; 960 to 1,920; 1,920 to 3,840; and 3,840 to 6,000 Hz. Thereafter, we extracted the envelope of the output signal from each filter by rectifying and low-pass filtering (at 10 Hz) the output signal. We used the envelope of the output signal to amplitude-modulate the six sine signals with frequencies 55, 105, 215, 335, 445, and 650 Hz. We added the six modulated signals to produce the final output signal.
We adapted the input signal to the vibratory threshold of the skin using the approximate values described in the study by Verrillo  and our results in Figure 3. Our purpose was to equalize the sound spectrum and to widen the available frequency range of the cutaneous presentation. We amplified the sound spectrum below 80 Hz and above 500 Hz, but in the region 100 to 500 Hz, we attenuated it. We removed the high-frequency components above 1,000 Hz. We tested the eight basic algorithms individually; five of them (TRHA, TR1/3, TR, AMFM, and AMMC) were also tested in conjunction with EQ (adapted, i.e., basic + EQ).
For the NP algorithm, the test sounds were not processed (i.e., the original sounds were used). The purpose of the NP algorithm was to provide a reference to determine whether processing the environmental sounds was advantageous. The presented sounds had a sampling frequency of 16,000 Hz, and the only possible filtering was that caused by skin and receptor properties [8,22]. The amplitude of the signal-processed signal for all algorithms was adjusted to the same level as the original (natural) sound.
The TRHA, TR1/3, AM, AMFM, AMMC, and NP algorithms covered the whole spectrum of the original sound, while the TR and EQ algorithms covered the spectrum between 1,200 to 2,400 Hz and 0 to 1,000 Hz, respectively.
The output signal for the EQ and NP algorithms had the largest bandwidth, after which (in order of decreasing frequency bandwidth) came the TRHA, TR1/3, TR, AMMC, and AMFM algorithms, and the AM algorithm contained only one frequency with small side bands.
The subjects were seated in a relaxed manner in a quiet room. They kept the thenar eminence of their dominant hand in a fixed position on a table on which the vibrator surface was placed and sensed the presented vibrations. The contact surface was at the same level as the table surface on which the hand rested in order to minimize pressure on the vibrator. The subjects could not see the computer screen or the test leader.
We measured the subjects' vibratory thresholds at the frequencies 25, 40, 80, 150, 250, 350, 450, 700, and 1,000 Hz using the ascending and descending method . For each frequency, we presented pulses of 1,300 ms total duration (150 ms rise and fall time, 1,000 ms steady time). The subjects were instructed to signal perceived vibration by pressing a button when they were certain they felt the signal. The experiment was repeated twice, in a practice and a test phase. The average vibration threshold values of the 19 test subjects are shown in Figure 3. The values in both practice and test were in fair agreement with those found in the study by Verrillo . For example, detection of a vibrotactile stimulus at 80 Hz required an approximately 20 dB stronger stimuli than detection of a vibrotactile stimulus at 250 Hz.
When testing the algorithms, we seated the subjects in the same room and under the same conditions as when we measured their vibratory thresholds. The subjects adjusted the signal amplitude to a comfortable level, once for each algorithm, when the test started. When the EQ and NP algorithms and the adapted versions (basic + EQ) of the basic algorithms TRHA, TR1/3, TR, AMFM, and AMMC were tested, the subjects changed the amplitude of the vibrations several times (<5 times), during the tests (e.g., when testing "bird song," "bicycle bell," or "house alarm"). The level of the processed signal was to some extent an identification cue (e.g., the sound "spectator excitement" was stronger than the sound "motorcycle passing").
For each presented sound in each algorithm, the subjects had 10 response alternatives, of which 1 was correct and the other 9 were randomly selected from the 45 sounds. The order of the 14 presented sequences (8 basic algorithms, 5 adapted, and 1 retest) was random for each subject. The order of the presented sounds was random for each test sequence but constant across subjects; i.e., a certain algorithm was always tested with the same set of 45 × 10 sounds (45 presented × 10 response alternatives).
The subjects sensed the vibrations presented and indicated the sound (one of the 10 response alternatives) the vibration represented. The sounds were presented up to five times if the subject required repetitions, and subjects were allowed to take as much time as they needed to identify the environmental sounds. The same procedure was applied to all 14 sequences. The experiment took up to 9 hours and could be performed over 2 or 3 days. The subjects could choose to take a break between each test sequence. The AM algorithm was tested twice (test and retest) in random order among the other algorithms, though the subjects were not aware that AM was tested twice.
Each subject identified 630 signals in total (13 algorithms + 1 retest × 45 sounds) without any feedback. The project was approved by the Regional Ethics Committee in Uppsala, Sweden, Reg. No. 2006:AÄ16.
A correct response resulted in 1 point, and an incorrect response resulted in 0 points. Thus, the maximum number of points was 45 (100% identification score) for the total of 45 events.
We calculated median and mean values to evaluate systematic errors and result trends.
We used intraclass correlation coefficients (ICCs) and the Spearman test for test-retest analysis [34-35]. The Spearman test also defined the correlation between identification scores and vibratory thresholds.
We used the Friedman test  and the Wilcoxon signed rank test with asymptotic two-tailed significance and Bonferroni correction  for description and comparison of the algorithms.
We used repeated measures analysis of variance (RM-ANOVA) to evaluate the effect of the factors subject and algorithm. To test for violation of sphericity, we used Mauchly's test of sphericity. We used the Hyyanh-Feldt correction if sphericity could not be assumed .
The results of vibratory identification of the 45 environmental sounds processed by the different algorithms and identified by 19 subjects determined the percentage scores for each participant and each algorithm. The identification results for sounds processed using the adapted versions of the TRHA, TR1/3, TR, AMFM, and AMMC algorithms showed that the corresponding basic algorithms had better scores than their adapted versions, and the difference was significant (p < 0.007, Wilcoxon signed rank test) for the TR1/3 and AMFM algorithms. Therefore, in the present article, only the results for the basic algorithms will be presented in detail.
The median value of identification scores for the AM algorithm, both in test and retest, was 42 percent, while the mean value (± standard error of the mean [SEM]) for AM was 41.5 ± 2.9 percent in test and 42.2 ± 3.5 percent in retest. The results showed no significant difference (p = 0.70, Wilcoxon signed rank test) between the mean identification scores. The mean (± 95% confidence interval) of the absolute value of the difference between identification scores at test and retest was 8 ± 3 percentage units. The correlation between test-retest values for the AM algorithm was r = 0.71 (p < 0.01, Spearman). The ICC between test-retest was 0.71 (p < 0.01, one-way random).
Subject was a significant factor (RM-ANOVA, F(18, 126) = 27.2, p < 0.001) affecting the identification scores (regardless of assumption of sphericity). We noted a significant effect of the factor algorithm (RM-ANOVA, F(7, 126) = 6.34, p < 0.001) for the eight basic algorithms.
The algorithms were also grouped into four groups (corresponding to the design principles): transposing (mean value of identification scores of TRHA, TR1/3, and TR), modulating (mean of AM, AMFM, and AMMC), filtering (EQ), and NP. The modulating algorithms had better results than did the filtering algorithms (p < 0.05, Wilcoxon signed rank test). RM-ANOVA showed a significant effect of the factor group (F(3, 54) = 4.779, p < 0.01) for the four groups.
Vibratory identification of the 45 environmental sounds processed by the 8 basic algorithms and identified by 19, subjects is shown in Figure 4. Figure 4(a) shows the percentage identification score for each subject and algorithm. Most subjects had a similar pattern (heavy lines) and scored above the chance level (10%), but the curves are shifted more or less in parallel, for example, subjects 1, 2, 4, 6, 7, and 17. Some subjects also had a different pattern at the chance level, particularly subject 19, who showed signs of low motivation during the test and lack of concentration and had a low identification score. The correlation between the individual vibration threshold and identification score was not significant (r = -0.14, p = 0.58, Spearman).
Figure 4(b) shows the mean ± SEM value of the identification scores of different subjects, and Figure 4(c) shows the mean ± SEM value of the identification scores of the different basic algorithms. As seen in the figures, we noted a greater difference between the results for different subjects (Figure 4(b)) than between the different algorithms (Figure 4(c)). For example, the results of different subjects for the TRHA algorithm varied between 16 and 67 percent correct (51 percentage units difference), while the results of subject 13 (one of the subjects with the largest difference) varied between 19 and 34 percent (15 percentage units difference) for the different algorithms.
Of the total 19 subjects, 2 had the TRHA algorithm as their best, 2 had the TR1/3, 1 had the TR, 1 had the AM, 6 had the AMFM, 3 had the AMMC, 1 had the EQ, and 1 had the NP.
The AMFM algorithm had the largest total scattering between subjects (64 percentage units) (maximum score minus minimum score), followed by the AMMC (56 percentage units), TRHA (51 percentage units), NP (49 percentage units), AM (47 percentage units), TR (42 percentage units), and TR1/3 (36 percentage units) algorithms, and lastly, the EQ algorithm had the smallest scattering (36 percentage units).
The median value was highest for the TRHA and AMFM (44%) algorithms, followed by the TR1/3, AM, and AMMC (42%); NP (38%); and TR (35%) algorithms; the EQ algorithm was lowest (33%). The mean value was highest for the AMFM (47%) algorithm, followed by the TRHA (44%), AMMC (44%), AM (42%), TR1/3(41%), and NP and TR (38%) algorithms; the EQ algorithm was lowest (36%).
We used the Wilcoxon signed rank test with asymptotic significance (two-tailed) to compare and define the significance of possible differences between the algorithms . Accordingly, the AMFM algorithm had the highest mean score (across subjects) for vibratory identification, followed by the TRHA, AMMC, AM, TR1/3, NP, and TR algorithms; lastly, the EQ algorithm had the lowest identification score. The TRHA and AMMC algorithms showed better results (p < 0.05, Wilcoxon signed rank test) than did the TR, EQ, and NP algorithms. The AMFM algorithm had a better (p < 0.05, Wilcoxon signed rank test) identification score than did the TR1/3, TR, AM, EQ, and NP algorithms. The TR1/3 and AM algorithms showed better (p < 0.05, Wilcoxon signed rank test) results than did the EQ algorithm.
The AMFM algorithm showed better results (p < 0.05, Wilcoxon signed rank test) than did the TR, EQ, and NP algorithms after Bonferroni correction. The TRHA algorithm had a better (p < 0.05, Wilcoxon signed rank test) identification score than did the EQ algorithm.
We also ranked the algorithms using the Friedman test, which resulted in the same ranking order as with the Wilcoxon signed rank test. According to the Friedman test, the AMFM algorithm showed better results (p < 0.05, Friedman) than did the TR and EQ algorithms, and the TRHA algorithm had a better (p < 0.05, Friedman) identification score than did the EQ algorithm.
In summary, adaptation to vibratory sensitivity threshold did not improve the identification scores. The test and retest results did not differ (p = 0.7). Individual variability was large, but no correlations were found between vibration thresholds and identification scores. Subject and algorithm were significant factors (RM-ANOVA, p < 0.001), but the differences between the algorithms were relatively small. The TRHA, AMFM, and AMMC algorithms had better (p < 0.05) scores than did the TR, EQ, and NP algorithms. The TR1/3 and AM algorithms showed better (p < 0.05) results than did the EQ algorithm.
We found considerable individual variability in the results, and subject was a significant factor (Figure 4(b)). The variability can be assumed to be due to person, as well as test-related conditions and random variations. Possible person-related factors are age at onset of hearing loss and earlier experience of use of vibrations for environmental monitoring or communication. It is interesting to observe that we did not find any correlation between perceptual thresholds and identification scores. Other possible causes of variability are differences in sound alternatives for each presented sound for each algorithm, equipment stability, algorithm order (training effect), and levels of motivation and concentration.
The analysis of test-retest variability (AM algorithm) showed that no significant improvement existed at retest, and a relatively good correlation (ICC = 0.71, p < 0.01, one-way, random; r = 0.71, p < 0.01, Spearman) indicated relatively good stability and reproducibility for the test conditions . In contrast to auditory experiments in which subjects had heard the original sounds, in the present tactile experiments, some of the subjects were born with profound hearing impairment and had neither heard the original sounds nor had any auditory memory of them. After two or three algorithms, the subjects became familiar with the signals and could respond more rapidly. Note that they did not receive any correct answer feedback.
Individual differences are common in perception studies and decrease after long-term training [38-40]. The sound alternatives could have been chosen differently. For example, the sounds could have been grouped in classes on the basis of contextual similarities, e.g., general home, kitchen, office, and outdoors, as in the study by Reed and Delhorne [40-41], and the subjects could have identified the sounds by choosing one from the same class. Contextual information, however, may also confuse the subject, as the same sound, e.g., "telephone signaling," can occur in several environments. In addition, contextual information would increase the probability of correct responses from guessing. Training effects would improve the results of subjects with low identification scores and thereby decrease individual variability.
Algorithm was a significant factor (RM-ANOVA, p < 0.001), as was algorithm group (RM-ANOVA, p < 0.01). The transposing and modulating principles had better results than did the filtering principles and the EQ and NP algorithms. The low scores for EQ can partly be explained by the fact that the environmental sounds have a smaller part of their spectrum left after the frequency components higher than 1,000 Hz have been removed. This means that the environmental sounds that have their spectrum above 1,000 Hz, e.g., "bird song," "bicycle bell," and "house alarm," are removed completely (the skin has a high vibratory threshold at frequencies above 800 Hz) and that the energy of these high-frequency sounds must be transposed to the sensitive low-frequency range of the skin [31-32,42].
The transposing algorithms TRHA and TR1/3 showed higher identification scores than did the TR algorithm (transposing only 1,200-2,400 Hz), confirming the importance of the high-energy components  and of covering the whole original spectrum [32,42]. In the TRHA and TR1/3 algorithms, the frequency components that were transposed to the frequency range above 600 Hz were probably difficult to perceive through the skin [8,22], which means that the TRHA and TR1/3 algorithms are not optimal. One way to solve this problem in, for example, the TRHA algorithm, is to reduce the number of transposed frequency components from 24 to, for example, 12 or less and in the TR1/3 algorithm to reduce the number of filters in the bank (see subsection "Signal-Processing Algorithms" in main "Materials and Methods" p. 1025) from 13 to, for example, 6 and transpose only to the range below 600 Hz. Decreasing the number of frequency components could also increase the frequency intervals (Df > 30%), thereby improving the skin's ability to separate the components.
Persons with profound hearing impairment would also benefit from using the TR1/3 and TRHA algorithms to improve their speechreading. The formants have the highest energy in speech  and will therefore be selected for transformation with the TRHA algorithm. The fundamental frequency for the female voice (f0: 150-300 Hz) is retained with the TR1/3 algorithm but somewhat attenuated for the male voice (f0: 100-150 Hz).
The modulating AM (p < 0.05), AMFM (p > 0.05), and AMMC (p > 0.05) algorithms had a better result than did the NP algorithm, even when the test sounds dominated by high frequencies ("bird song," "bicycle bell," and "house alarm," which could not be perceived by any of the subjects) were excluded, which is compatible with the lower frequency resolution of the skin as compared with the auditory system . The poor result for the NP algorithm, also after the test sounds dominated by high frequencies were excluded, indicates that the frequency components must not only be detectable but that the skin sensors must also be able to separate the components. When the NP algorithm was used, a great number of frequency components were present, possibly masking each other .
The modulating AMFM algorithm used almost the same signal-processing method as the AM algorithm but achieved a higher identification score than AM did, which could be explained by the fact that the AMFM algorithm used a wider range of frequencies that the skin is capable of sensing than the AM algorithm did. In addition, the AMFM algorithm emphasizes the time variations, i.e., temporal information, more than the AM algorithm does (see "Materials and Methods" section, p. 1024).
The AMMC algorithm had better scores than the TR, EQ, and NP algorithms did. The carrier frequencies 550 and 650 Hz of the AMMC algorithm were probably difficult to separate. In order to improve this algorithm, we could choose the carrier waves differently, for example, by beginning at about 30 Hz and keeping frequency differences at a minimum of 30 percent (Df 30% Hz).
To increase the temporal information of sounds processed by the TR1/3, AM, AMFM, and AMMC algorithms, we could extract the envelope of the output from the filters by filtering at about 50 Hz and not at 10 Hz as in the present study because the temporal resolution of the skin is maximal at 20 to 40 Hz [8,30]. The choice of 10 Hz was based on the fact that most (90%) of the temporal information was below about 10 Hz for the vast majority of sounds, as well as on the study by Gygi et al. .
An alternative design for a vibratory aid using the AMMC and TR1/3 algorithms could be to feed the signal-processed sounds from each filter bank to separate vibrators, as in the Tactaid II and VII. In this way, spatial separation would compensate for the low frequency resolution of the skin . Such an extension would, however, increase the size of the aid, add cables, and require a large battery supply. The advantages of the current vibratory aid compared with the Tactaid II and VII are its signal-processing method, which is designed for environmental sounds and not for speech, as well as its potentially smaller size.
The EQ and the adapted version of the algorithms had low identification scores. The sounds that had their most important spectrum information between 100 and 500 Hz were attenuated by the EQ algorithm and thus more difficult to sense. Attenuation may be one of the reasons for the lower identification scores of the adapted algorithms. A better alternative might have been to use the subjects' own vibratory thresholds. Using the personal vibratory threshold, we could have fixed the amplitude setting of the vibrator and the subjects would not have needed to adjust it. Frequent adjustments, as in the EQ, NP, and adapted versions of the algorithms, may negatively affect the quality of the signal; for example, the signal could be too weak to sense or be overloaded, which disrupts the temporal pattern. Fixed amplitude will be used in experiments in a forthcoming field study, in which the subjects will be few and will use their individually adapted (EQ) prototype of the portable monitoring aid.
The AM, TR1/3, and AMMC algorithms are similar to methods used in the speech processing vibratory aids MiniVib4 and TAM , the vibratory aid developed by Ling , and the vibratory aid Tactaid VII, respectively. However, comparing the results of the current study and other studies is difficult, as the test conditions and test sounds were different. No vibratory aids were found that use algorithms similar to the TRHA, TR, AMFM, EQ, or NP algorithms.
Tactaid VII was used to identify environmental sounds through vibrations  in a situation in which the subjects also had contextual information. After sufficient training with correct answer feedback, the subjects' performance varied between 40 and 80 percent correct, which is at about the same level as the present results, though our subjects did not receive any training or contextual information. The TRHA, TR1/3, and AMMC algorithms would also work for speech, because they retain information about the f0, have relatively good temporal coding, and include more than five frequency channels, which is necessary to achieve a high level of speech understanding through the hearing sense . The temporal information in the TR1/3 and AMMC algorithms could be improved by low-pass filtering the envelope used to amplitude modulate at, for example, 50 Hz (rather than 10 Hz; "Materials Methods" section, p. 1024).
We can compare the identification results of the TRHA, TR, AM, AMFM, AMMC, and EQ algorithms because they are the same in the present and previous studies .1 However, the subjects differ, which limits the validity of the comparisons.
The rank order of algorithms in the vibratory tests differed from that in the hearing tests. In the vibratory tests, algorithms based on the filtering principle had the lowest rank order and modulating algorithms the highest, in contrast to the auditory tests in which the filtering algorithms had the highest rank order and the modulating algorithms the lowest.*
The AMFM algorithm had the highest rank order in vibratory identifications but the lowest rank order in auditory identifications of environmental sounds in the previous study .*
The differences in rank order of the algorithms in the vibratory and auditory tests can partly be explained by the fact that the frequency discrimination/resolution and temporal resolution of the vibratory sense are inferior to those of the hearing sense .
The frequency discrimination of the hearing sense is approximately 100 times (Df skin/Df auditory system, 30/0.3 = 100) better than that of the skin. The corresponding relationship for intensity discrimination is approximately two times better (DI skin/DI auditory system, 0.7/0.3 2), and for temporal resolution three times better (time difference [Dt] skin/Dt auditory system, 10/3 3).
The finding that the auditory system has a higher rank order for algorithms with good spectral information while the skin has a higher rank order for algorithms with good temporal identification is compatible with the basic differences between the two systems.
The transposing algorithm TRHA had a relatively high rank order in both the vibratory and the auditory tests , confirming the importance of frequency components with high energy. The good results with the TRHA algorithm also show the importance of temporal information because the envelope was not low-pass filtered at 10 Hz, and they also show the benefit of frequency transposition . To increase the temporal information of the AM, AMFM, and AMMC algorithms, we could low-pass filter the sounds up to ~100 Hz instead of 10 Hz because gap detection is approximately 10 ms for the skin. In conclusion, our findings on environmental sounds are in-line with the basic psychophysical data on the hearing versus the cutaneous senses.
The present study is part of a series of studies describing the development of a technical aid for detection, localization, and identification of environmental sounds. The purpose is to help persons with dual sensory impairment monitor environmental events using the skin senses. A prototype for localization has been developed previously [45-46]. The identification aspect, in which the sounds must be processed by an algorithm, is the focus of the present study. Five algorithms-TRHA, TR1/3, AM, AMFM, and AMMC-covered the entire spectrum of the environmental sounds and gave high identification scores. Therefore, they are good candidates for use in a vibratory aid and can be chosen for testing in future experiments performed under more realistic conditions. These algorithms have no obvious shortcomings, such as attenuating or partly removing important frequency components (as do the TR, EQ, and NP algorithms) and will be tested further both with and without adaptation to the skin (using better adapting algorithms). In future experiments, we will continue to use transposing and modulating principles. The TRHA and TR1/3 algorithms will be modified by decreasing the number of transposed signals and using a 30 percent interval between main frequency components. The TR1/3, AM, AMFM, and AMMC algorithms will be modified by extracting the envelope of the sounds by low-pass filtering at a higher frequency, for example, 50 Hz.
The transposing (algorithms TRHA and TR1/3) and modulating (algorithms AM, AMFM, and AMMC) principles were suitable for further application in a portable vibratory aid for people with dual sensory impairment. Algorithm and subject were significant factors affecting the identification results. The TRHA, AMFM, and AMMC algorithms were significantly better than the NP algorithm.
Go to TOP
Last Reviewed or Updated Wednesday, March 31, 2010 10:20 AM