[Nfb-web] Speech recognition and production of DTB's and text transcripts
Peter Donahue
pdonahue1 at sbcglobal.net
Tue Jun 5 14:40:19 CDT 2007
Hello everyone,
This appeared on our Digital Talking Books List, but wanted to share it
here as this technology could aid in enabling us as Webmasters to quickly
and efficiently produce text transcripts of audio and video content on
affiliate Web sites to permit deaf-blind visitors to avail themselves of
this information. Otherwise one needs to produce such transcripts by typing
the audio presentation using a typewriter or a PC. I've all ready done some
of this for one of our affiliate Web sites. The nice thing about
digital-audio editors is that they double nicely as a Dictaphone machine.
You can move back, or foreword in the audio by a small amount. In Sound
Forge the Page-Up and Page-Down keys permit you to move back or foreword in
the audio to review what is being said. Thus if you miss a word or two
pressing page-up will allow you to hear it again so you can type it in the
text document and for proofreading purposes.
How much of a time-saver it would be to have a program that would do
this automatically for you leaving you to only have to proofread and make
corrections in the text transcript before marking it up for the Web. I
couldn't help but think about this as a result of a recent conversation with
Gary Wunder. See you all in Atlanta.
Peter Donahue----- Original Message -----
From: "Graczyk, William" <WGracz at milwaukee.gov>
To: <dtb-talk at nfbnet.org>
Sent: Tuesday, June 05, 2007 11:25 AM
Subject: [Dtb-talk] Speech recognition and production of DTB's
Aaron Cannon wrote:
"I would imagine that, at some point, someone will employ the use of a
speech
recognition engine to automate the synchronization of the text to the
voice.
It could be pretty accurate, because the computer would know what the
person
was going to say, it just wouldn't know when. This seems like a much
simpler task for the computer to deal with than if the source text were
not
known, something which many programs do routinely. . . ."
This has in fact been done by the Portuguese Library for the Blind and a
team of computer scientists. See: "Modular Production of Digital Talking
Books": http://www.inesc-id.pt/pt/indicadores/Ficheiros/1711.pdf
The PDF has several diagrams; if you want just the HTML, do a Google
search on the title of the paper and View as HTML. There are several
other papers by the same team available too.
There are three earlier efforts that you can read about in the patent
applications:
IBM (1997), patent number 5,649,060
Microsoft (2000), patent number 6,260,011
RFBD (2005), patent number 6,961,895
And those are just the patents applied for the by the heavy hitters. You
can get to the text of the patents by using the Patent Office Quick
Search and just copying and pasting the number:
http://patft.uspto.gov/netahtml/PTO/search-bool.html
The real questions are how much it would cost to verify the accuracy of
the digital text and of the synchronization between narration and text.
It may well be cheaper to start from scratch. I haven't found any
evidence that anyone is presently producing books this way. If there are
people on this list from RFBD, perhaps they could tell us whether they
ever put their patented method into operation and what the results were.
Bill Graczyk
Wisconsin Regional Library for the Blind
----------------------------------------------------------------------------
----
_______________________________________________
Dtb-talk mailing list
Dtb-talk at nfbnet.org
http://www.nfbnet.org/mailman/listinfo/dtb-talk
More information about the Nfb-web
mailing list