When I was in high school, I spent the summer of 1983 fixing video games. I had some background in electronics, so a guy who owned video arcades around my home town hired me to sit in a workshop and repair broken games. That whole summer, my closest companion was a little radio in the shop, which played the songs of the summer repeatedly.
One of the songs that played at least twice a day that summer was “Every Breath You Take” by the Police. Most of the song had clear lyrics (“Every breath you take, every move you make…”). But there was one line that puzzled me. A few times in the song, Sting sounded like he was singing, “I’m a pool hall ace…” which made no sense at all. At some point, I bought Synchronicity on cassette (it was the 80’s after all) and was able to read the lyrics. Turns out, it was “Now my fool heart aches.”
So, is there any way to get better at understanding the lyrics of songs?
It turns out that being able to see the singer as the song is being sung will help. This issue was examined in a paper in the June, 2010 issue of Psychonomic Bulletin & Review in a paper by Alexandra Jesse and Dominic Massaro.
You might think that understanding speech is basically done through what you hear. After all, speech is an auditory mode of communication. Of course, many people have the ability to read lips, so there must be some information about what is being spoken from the movements of people’s mouths.
In fact, there is a lot of evidence that people put together various sources of information for what is said. The best example of the influence of vision on speech is the McGurk effect. In this effect, you hear a soundtrack of someone saying a syllable usually spoken from the front of the mouth (like ba, where you put your lips together to speak it). At the same time, you see a mouth saying a syllable spoken from the back of the mouth (like ga, where you place your tongue at the back of your mouth). If you close your eyes and listen to the soundtrack, you hear ba. But, if you watch the face as you listen, then what you perceive is an average of what you hear and what you see. So, you end up hearing da, which is a sound you would make if you put your tongue toward the front of your mouth.
For an example of the McGurk effect in action, check out this demo from Dominic Massaro’s website. http://mambo.ucsc.edu/psl/dwmdir/01_5.mov
In the paper by Jesse and Massaro, they had people watch a video of someone singing the song Don’t Cry for me Argentina from the musical Evita. People either saw the video, heard the singing or saw both. To make the task a bit harder, the soundtrack had some noise added to it to make the lyrics harder to hear. Interestingly, people were about twice as good at identifying the lyrics they heard when they were able to see the singer while the song was being sung than when they just heard the song or just saw the singer. This is an impressive gain in performance.
This finding has a variety of interesting implications. For example, it may help to explain why we often have so much trouble talking to people on the phone. When you talk with someone face-to-face, you get information from them from the sounds of what they say as well as from their mouth. Over the phone, you just have the auditory information.
In fact, a lot of research suggests that talking on the cell phone while driving (even using a hands-free device) still draws a lot of attention and can increase people’s likelihood of getting in an accident. One reason why conversations on the cell phone may draw so much attention is that people have to concentrate hard on the voice to understand what is being said, because they are only getting spoken information.
And, of course, it means that you can save yourself some embarrassment when singing a song with your friends if you can get the lyrics right before you belt out a song in a group.