[Dtb-talk] Speech recognition and production of DTB's
Graczyk, William
WGracz at milwaukee.gov
Tue Jun 5 11:25:54 CDT 2007
Aaron Cannon wrote:
"I would imagine that, at some point, someone will employ the use of a
speech
recognition engine to automate the synchronization of the text to the
voice.
It could be pretty accurate, because the computer would know what the
person
was going to say, it just wouldn't know when. This seems like a much
simpler task for the computer to deal with than if the source text were
not
known, something which many programs do routinely. . . ."
This has in fact been done by the Portuguese Library for the Blind and a
team of computer scientists. See: "Modular Production of Digital Talking
Books": http://www.inesc-id.pt/pt/indicadores/Ficheiros/1711.pdf
The PDF has several diagrams; if you want just the HTML, do a Google
search on the title of the paper and View as HTML. There are several
other papers by the same team available too.
There are three earlier efforts that you can read about in the patent
applications:
IBM (1997), patent number 5,649,060
Microsoft (2000), patent number 6,260,011
RFBD (2005), patent number 6,961,895
And those are just the patents applied for the by the heavy hitters. You
can get to the text of the patents by using the Patent Office Quick
Search and just copying and pasting the number:
http://patft.uspto.gov/netahtml/PTO/search-bool.html
The real questions are how much it would cost to verify the accuracy of
the digital text and of the synchronization between narration and text.
It may well be cheaper to start from scratch. I haven't found any
evidence that anyone is presently producing books this way. If there are
people on this list from RFBD, perhaps they could tell us whether they
ever put their patented method into operation and what the results were.
Bill Graczyk
Wisconsin Regional Library for the Blind
-------------- next part --------------
<!-- /* Font Definitions */ @font-face {font-family:Verdana; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} pre {margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} span.EmailStyle17 {mso-style-type:personal-compose; font-family:Arial; color:windowtext;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} -->
Aaron Cannon wrote:
“
I would imagine that, at some point, someone will employ the use of a speech
recognition engine to automate the synchronization of the text to the voice.
It could be pretty accurate, because the computer would know what the person
was going to say, it just wouldn't know when. This seems like a much
simpler task for the computer to deal with than if the source text were not
known, something which many programs do routinely. . . .”
This has in fact been done by the Portuguese Library for the Blind and a team of computer scientists. See: “Modular Production of Digital Talking Books”: http://www.inesc-id.pt/pt/indicadores/Ficheiros/1711.pdf
The PDF has several diagrams; if you want just the HTML, do a Google search on the title of the paper and View as HTML. There are several other papers by the same team available too.
There are three earlier efforts that you can read about in the patent applications:
IBM (1997), patent number 5,649,060
Microsoft (2000), patent number 6,260,011
RFBD (2005), patent number 6,961,895
And those are just the patents applied for the by the heavy hitters. You can get to the text of the patents by using the Patent Office Quick Search and just copying and pasting the number:
http://patft.uspto.gov/netahtml/PTO/search-bool.html http://patft.uspto.gov/netahtml/PTO/search-bool.html
The real questions are how much it would cost to verify the accuracy of the digital text and of the synchronization between narration and text. It may well be cheaper to start from scratch. I haven’t found any evidence that anyone is presently producing books this way. If there are people on this list from RFBD, perhaps they could tell us whether they ever put their patented method into operation and what the results were.
Bill Graczyk
Wisconsin
Regional Library for the Blind
More information about the Dtb-talk
mailing list