Audio transcription is the process of taking spoken words and making them into written text. In the past, a person would sit and write words as they were spoken. Now, audio recordings of various types and several methods of transcription exist. Analog and digital recording methods will allow a person that isn’t present during the talking to still transcribe the text. In addition, many software packages will read audio files and quickly convert them to text without having to actually play them.
For many years, audio transcription was a specialized and tedious profession. People that transcribed speech had to be present at the time of speaking, often meaning companies would have to hire people trained in advanced techniques such as shorthand. This also limited transcription services to those who had access to a trained transcriber.
With the invention of audio recordings, this field changed dramatically. With a recording, the transcriber could work from anywhere where the recording could be delivered. In addition, transcription no longer needed shorthand as the recording could be reversed and listened to multiple times. A single transcriber could also work for a multitude of clients simultaneously, since she no longer needed to be present for the speeches.
With the increase in computer use and Internet speeds, the field of audio transcription stayed largely the same. Files, rather than tapes, were often emailed instead of being sent by normal mail. The speed of the process increased, but the methods didn’t.
This changed in the late 90s with the increasing use of speech recognition and dictation software. The job of transcribing was going more and more towards computer assistance and then full automation. Software packages came out that could read the information inside an audio file and use the speaker’s wave patterns to build a text version of a speech. This would take seconds rather than the minutes or hours of a human transcriber.
Computer-automated audio transcription has a few flaws that are difficult to overcome, the largest of which is a relative lack of corrective speech. When a human transcriber listens to text, she can correct slight errors in the speech in order to make it more readable. While some transcription is verbatim, meaning it is exactly what the person said, most is not. Without corrective speech, a human will often have to check the transcription for errors before it is used.
The other common flaw of computer-based audio transcription lies in the very speech of humans. Since people have a huge range of tones and patterns when they speak, creating a computer program that can accurately read and translate the entire range is exceptionally difficult. This means that a certain amount of error is common in nearly all transcription software. The most common way of working around this flaw is through learned speech, where the program and a single speaker work together enough that the program focuses on the single person’s patterns.