I built a porta-booth the other day, and was agog to test it out. I made two recordings in very similar, horribly challenging, conditions. For both of the recordings I was within 10 feet of a fridge whose motor was running, and I could hear vehicles passing outside the house. Each recording has a maximum (peak) level of -3.1 dB. Although I didn’t use exactly the same text for each recording, I did say roughly similar things, and I did try to keep my speaking voice about the same for both recordings.
I listened to both recordings, and I was surprised by how “dead” the porta-booth version sounded compared to the other recording. I wanted to see whether I could understand what was happening, so I made a graph of the spectrum of each recording. The blue line in the graph represents the spectrum with the microphone placed outside the porta-booth, while the red line represents the microphone in the porta-booth.
I can see three distinct areas in each curve. The first area, on the left up to about 700 Hz, is the highestand has a sharp peak. The second area, 700-9100 Hz, wiggles around a very gently falling line. The final area, from 9100 Hz on the right, falls faster, but more smoothly.
I don’t really know how to interpret this pair of curves, or whether they even explain the differences in the qualities of the two recordings. Let’s look at the differences in each area, and see whether I can make anything of them.
In the first area, the porta-booth recording peaks much higher than the plain one. That means its lower frequencies are stronger than those in the plain recording. Certainly, the porta-booth recording sounds deeper overall. Wikipedia says that the typical adult male’s fundamental voice frequency is between 85-155 Hz, which falls well within this area. I don’t have an especially high nor an especially deep voice, so I’ll assume that my fundamental vocal frequency is about 120 Hz. I can easily cover two octaves, probably more. 2.5 octaves is equivalent to multiplying by 2x2x1.4=5.6. 5.6×120=768. I’m going to guess that this first area represents the fundamental frequencies of my voice. The fridge might well operate in this area too, however, but I’ve just checked the spectrum for a section of audio in which I’m not speaking, and it’s nowhere near as high as the parts where I am speaking, even in this area of the spectrum. I think that rules out any fridge theory.
Next I’ll look at the area on the right, from 9100 Hz up. I think that represents mostly noise. Looking closely, the smoother lines here fall at two different rates, changing at about 16400 Hz. I suspect that the microphone responds less to these higher frequencies, hence the faster fall-off. The porta-booth is much quieter than the plain recording in this region, probably contributing quite a lot to the deadness of the sound.
Finally, the middle region probably represents the harmonics of the voice, which gives it its unique colour. Again the porta-booth version is lower than the plain one, meaning that the plain one has a higher proportion of high frequencies, contributing this time I think to the difference in the timbre of my voice between the two recordings.
The upshot is, I think, that I can recover some of the brightness of tone that I think has gone by boosting some of the middle frequencies using EQ. Time for some experimentation, I think…