[Dtb-talk] Speech recognition and production of DTB's

Graczyk, William WGracz at milwaukee.gov
Tue Jun 5 11:25:54 CDT 2007


Aaron Cannon wrote:

"I would imagine that, at some point, someone will employ the use of a
speech
recognition engine to automate the synchronization of the text to the
voice.
It could be pretty accurate, because the computer would know what the
person
was going to say, it just wouldn't know when.  This seems like a much
simpler task for the computer to deal with than if the source text were
not

known, something which many programs do routinely. . . ."

 

This has in fact been done by the Portuguese Library for the Blind and a
team of computer scientists. See: "Modular Production of Digital Talking
Books":  http://www.inesc-id.pt/pt/indicadores/Ficheiros/1711.pdf 

 

The PDF has several diagrams; if you want just the HTML, do a Google
search on the title of the paper and View as HTML. There are several
other papers by the same team available too. 

 

There are three earlier efforts that you can read about in the patent
applications:

 

IBM (1997), patent number 5,649,060

Microsoft (2000), patent number 6,260,011

RFBD (2005), patent number 6,961,895

 

And those are just the patents applied for the by the heavy hitters. You
can get to the text of the patents by using the Patent Office Quick
Search and just copying and pasting the number:

http://patft.uspto.gov/netahtml/PTO/search-bool.html

 

The real questions are how much it would cost to verify the accuracy of
the digital text and of the synchronization between narration and text.
It may well be cheaper to start from scratch. I haven't found any
evidence that anyone is presently producing books this way. If there are
people on this list from RFBD, perhaps they could tell us whether they
ever put their patented method into operation and what the results were.

 

Bill Graczyk

Wisconsin Regional Library for the Blind

 

 

-------------- next part --------------
<!-- /* Font Definitions */ @font-face {font-family:Verdana; panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} pre {margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} span.EmailStyle17 {mso-style-type:personal-compose; font-family:Arial; color:windowtext;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} -->
Aaron Cannon wrote:
&#8220;
I would imagine that, at some point, someone will employ the use of a speech
recognition engine to automate the synchronization of the text to the voice.
It could be pretty accurate, because the computer would know what the person
was going to say, it just wouldn't know when.  This seems like a much
simpler task for the computer to deal with than if the source text were not
known, something which many programs do routinely. . . .&#8221;
 
This has in fact been done by the Portuguese Library for the Blind and a team of computer scientists. See: &#8220;Modular Production of Digital Talking Books&#8221;:  http://www.inesc-id.pt/pt/indicadores/Ficheiros/1711.pdf
 
The PDF has several diagrams; if you want just the HTML, do a Google search on the title of the paper and View as HTML. There are several other papers by the same team available too.
 
There are three earlier efforts that you can read about in the patent applications:
 
IBM (1997), patent number 5,649,060
Microsoft (2000), patent number 6,260,011
RFBD (2005), patent number 6,961,895
 
And those are just the patents applied for the by the heavy hitters. You can get to the text of the patents by using the Patent Office Quick Search and just copying and pasting the number:
http://patft.uspto.gov/netahtml/PTO/search-bool.html http://patft.uspto.gov/netahtml/PTO/search-bool.html
 
The real questions are how much it would cost to verify the accuracy of the digital text and of the synchronization between narration and text. It may well be cheaper to start from scratch. I haven&#8217;t found any evidence that anyone is presently producing books this way. If there are people on this list from RFBD, perhaps they could tell us whether they ever put their patented method into operation and what the results were.
 
Bill Graczyk
Wisconsin
Regional Library for the Blind
 
 


More information about the Dtb-talk mailing list