[BlindMath] Future of AI
Al Maneki
apmaneki at gmail.com
Fri Nov 1 14:46:59 UTC 2024
Hello all,
*Today’s Artificial Intelligence: Our Future for Reading and Writing STEM?*
By Al Maneki
*The author acknowledges the valuable contributions of David Austin, Rob
Beezer, Michael Cantino, David Farmer, Karen Herstein, Alexei Kolesnikov,
Martha Siegel, and Volker Sorge to this article. They have read several
drafts, including the final version, of this article. They have offered
many valuable comments and suggestions for improvements. As always, the
author assumes all responsibility for errors and oversights. For comments
and questions, please email me: apmaneki at gmail.com <apmaneki at gmail.com>.
-Al Maneki*
*Introduction*
Much bandwidth has been allotted in the public media lately about the
growing uses of Artificial Intelligence (AI) and its impact on our daily
lives. We have heard from the boosters who claim that AI will dramatically
reduce the drudgery of human existence by eliminating the most boring
tasks. We have also heard from the nay-sayers who claim that this time is
different: “let’s not go there.” They worry that this time AI will
eliminate so many jobs, even those requiring intellectual abilities, that
our livelihoods will go to ruin, leaving us with little purpose in life.
These pessimists maintain that the job growth stimulated by AI will not be
sufficient to compensate for the job losses created by it. The jury is
still out on this. Along with the benefits that AI has already produced, we
have also seen its abuses, e.g. the production of videos, for nefarious
purposes, showing perfect replicas of prominent personalities espousing
points of view that are contrary to their well-publicized value systems.
AI has been around for many years. I took a course in AI over 30 years ago.
The textbook for this course was *Artificial Intelligence* by Patrick Henry
Winston, now available as a free PDF download from
*https://courses.csail.mit.edu/6.034f/ai3/rest.pdf
<https://courses.csail.mit.edu/6.034f/ai3/rest.pdf>*. The AI of today is
much more robust than the AI of 30 years ago. Computing hardware today is
cheaper, faster, and smaller. Advances in AI’s computing hardware are a
prime example of Moore’s Law, which futurist Ray Kurzweil has discussed
with us in any number of speeches he has delivered before our annual NFB
conventions. (Moore’s Law states that “the number of transistors on an
integrated circuit will double every two years with minimal rise in cost.”)
Since I studied from Winston’s book, AI has benefited from further advances
in machine learning. For example, Alexei Kolesnikov, one of the automated
Nemeth translation collaborators, used Google’s NotebookLM to produce a
podcast based on an earlier draft of this article, available at
https://wp.towson.edu/akolesni/files/2024/10/Podcast-about-Al.wav, using
only the draft’s Word file and no input from humans. Advances in AI have
also been stimulated by spectacular advances in neural networks, or models
inspired by the structure and function of biological neural networks in
animal brains. According to the popular science writer David Berlinski in
his book *The Advent of the Algorithm* (Library of Congress Braille
edition, BR13263), the concept of the algorithm, an essential component of
AI, has its humble beginnings in the ancient Greek writings of Aristotle.
In the remainder of this article, I will use the terms “AI”, “AI algorithm”
and “AI device” interchangeably. I also want to familiarize readers of NFB
publications with some of AI's terminology and how AI works generally.
Then, based on my limited experiences with AI, I present my views on AI’s
possible impact on how we, blind people, can benefit from it in the areas
of learning and doing STEM.
*A Bit of Background*
Although the term “AI” has only recently appeared in the public discourse,
the NFB has been involved in AI research before this term came into vogue.
In 1975, I attended my first national convention in Chicago. During one of
the general sessions, Dr. Kenneth Jernigan introduced us to Ray Kurzweil to
talk about his remarkable reading machine. Dr. Jernigan opined that
Kurzweil’s reading machine would expand the availability of reading
materials to us. The first Kurzweil reader was a floor model console, too
heavy to be moved by one person. Unfortunately, Dr. Jernigan did not live
long enough to see the day when the KNFB reader, much more powerful than
the first Kurzweil reading machine, was tiny enough to be loaded as an app
on our smart phones.
In more recent history, the NFB has sponsored the Blind Driver Challenge,
the initiative to develop a vehicle which a blind driver could operate
independently. As a result, in 2009, Mark Riccobono independently drove a
prototype through an obstacle course on the Daytona International Speedway.
This test vehicle was developed by an engineering team from Georgia Tech
University. It should be noted here that obstacles on the course were laid
down after Mark began to drive this vehicle. These two examples now fit
neatly into a branch of AI known as machine vision.
It is easy to imagine other ways in which machine vision can help blind
people. For example, if we are in an unfamiliar building, an AI could,
using a map of that building, direct us to a specific location. If we must
get to a certain floor in that building, the AI could direct us to the
elevators, or even direct our hands to the key panel to call for that
elevator. Applications of machine vision to serve as visual aids may
already be under development. However, this is not the area of AI I want to
discuss.
Let us remind ourselves again that many technologies have been previously
oversold to us. In this case, AI will be no different. Regardless of the
technology, there will always be tasks and functions we can perform more
efficiently with alternative techniques. As technology changes, however, we
may have to adapt some of these techniques to work with new devices and new
modes of thinking. While it is difficult to see exactly what impact AI will
have on our lives, we should always examine the offerings of AI carefully.
We should not hesitate to call out its flimflams when the promoters claim
that what they have to offer is the next wonder drug or the greatest
invention since sliced bread. At the same time, we must also recognize the
benefits of AI when truly innovative ideas are brought before us for
consideration. As in the past, we, the organized blind, will work closely
with the innovators and inventors who best understand our needs.
STEM activities may be broken down into two parts: learning it by reading
textbooks, lecture notes, and research papers; and disseminating our work
in Braille or print. In this article, I will pay special attention to the
uses of AI for the translation of spoken mathematics into Nemeth Braille.
There will be an obvious spillover into other STEM subjects because math is
virtually involved in every aspect of STEM.
With AAF funding, the work I have been conducting with my academic
colleagues has been involved in the automated translation of
PreTeXt-specified math content into either Nemeth Braille or synthetic
speech. An additional NSF grant has allowed us to make improvements in the
automated process of translating graphics from print to tactile form.
Moving in the reverse direction, given a document in Nemeth Braille, we may
wish to have it produced in a printed format, enabling us to communicate
our ideas and results with sighted readers. To date, no research has been
done in this direction of information flow. This is where I think AI comes
into the picture. More on this later. There is also the challenge of
converting a verbal description of a mathematical diagram into embossed or
printed formats. This poses an even greater challenge. But if you believe
the pronouncements of AI’s most enthusiastic advocates, all things are
possible with AI.
The two essential components of AI are the algorithm, a mechanical
procedure for arriving at the most probable conclusions or deductions based
on a given set of data, and the computing hardware that is necessary to
make the calculations required by the AI’s algorithms.
In order to arrive at correct conclusions, i.e. the actions taken by human
subjects given that set of data, the best algorithms require enormous
quantities of data/response pairs that have been accumulated. We may think
that it is an easy task to recognize the voice of a specific individual or
to identify the face of a particular person in a huge crowd. But underlying
these tasks is an enormous quantity of work performed by our brains to make
these correct judgements. To duplicate these human mental tasks
electronically requires an enormous amount of computer power. Today, our
best computers can barely approach human mental capacity. However, with the
development of modern microchips, i.e. the integrated circuits that are
packed into minute pieces of silicon, we are better able to process these
massive quantities of data/response pairs. We may think of these integrated
circuits as direct translations of lines of computer code specified by the
AI algorithms. Thus, the millions of calculations required by these
algorithms can be computed in microseconds.
*But What Can It Do for Us?*
When I think of what AI could do for us in STEM, I think of my own
experiences using human readers. My entire math career has involved the use
of readers one way or another. I’ve had many readers all of these years,
some better than others. The best readers were with me for longer periods
of time. Without uttering every symbol (comma, dot, left parenthesis, left
bracket, etc.), the transmission from written to spoken math or vice versa
can be extremely time-consuming or most boring. But, given time and
experience, the rapport that developed between me and my reader could, in
most instances, enable us to dispense with all of this mathematical
verbiage and communicate the exact context entirely from the manner of
speaking. For the most part, we tend to speak quite consistently in terms
of pauses and inflections of voice. The direction of speaking went both
ways. When a textbook or research article was being read, I was the
listener. When I was dictating a homework assignment, course or seminar
lecture, or research paper, my reader was the listener. Regardless of whom
the speaker was, the listener deduced the exact mathematical context from
the consistent manner in which the pauses and inflections were employed.
As an example of inexact verbiage, the phrase “a slash b plus c” could be
interpreted as either a/(b+c) or (a/b)+c. With experience, I could
understand which was meant (depending on how my reader read it), or my
reader could understand it (depending on how I said it). There are numerous
examples of this type of ambiguity in spoken math. With sufficiently many
samples of how an individual speaks math compared with the correct written
expressions of those spoken samples, an AI algorithm could “learn” how to
interpret your spoken math.
There are cases where a spoken expression bears absolutely no resemblance
to what is written. The instance of this which immediately comes to my mind
is that of the binomial coefficient, a staple in many required
undergraduate courses. The binomial coefficient is represented by a column
of two positive integer variables, n and k, with k less than n, in which n
is written above k with elongated parentheses surrounding the column formed
by these two variables. Here, the binomial coefficient is defined as
n!/((n-k)! k!), where n! represents the product of integers from 1 to n,
(n-k)! represents the product of integers from 1 to n-k, and k! represents
the product of integers from 1 to k. When we refer to the binomial
coefficient of n and k in speech, we could say “the binomial coefficient of
n and k” or “n choose k” or “n C k”. (The use of the word “choose” here
refers to the fact that there are exactly “n choose k” ways in which a
subset of k objects can be chosen from a set of n objects). The AI
algorithm should contain instructions to recognize either of these three
spoken forms as the column of n and k described above.
Word processors and software editors are equipped with compilers which
enable them to recognize spelling and syntax errors. When a compiler
detects such an error, it offers the user a range of choices to correct
this error. It also enables the user to instruct the compiler to ignore the
error in this case. In a similar vein, a UEB or Nemeth compiler could be
developed and installed on refreshable Braille displays to aid users with
suggestions for correct code usage.
AI software is not needed if all we want to do is to have a tool which aids
the user in typing correct UEB/Nemeth code. Here, the UEB/Nemeth compiler
is sufficient. However, AI will come into the picture if we are ever to
produce UEB/Nemeth code on our Braille displays directly from speech. In
this case, a UEB/Nemeth compiler is an absolute prerequisite if an AI
algorithm is to be written for UEB/Nemeth code from human speech. If the
produced Braille code is not consistent with what the speaker wants, the
compiler will be the means through which the user communicates the
corrected code. A UEB/Nemeth compiler could serve as the conduit through
which an AI algorithm “learns” the correct AI interpretation of spoken
text.
The typical math document that I, or anyone else, dictates to an AI will
consist of a mixture of UEB and Nemeth output. It would be most desirable
if the AI were smart enough to know when UEB was to be used and when Nemeth
Braille was to be used. Short of this capability, we should have a switch
on our Braille displays to set the AI in UEB mode or in Nemeth mode,
depending on what is needed.
Just as a word processor still requires the user to have a knowledge of the
rules of English grammar, an AI algorithm would still require its users to
have a command of the UEB and the Nemeth code. Without this knowledge a
user is totally dependent on what the AI recommends, a most unsatisfactory
situation.
When I was doing mathematics in graduate school and at my job, in the
interest of saving time, I developed my own shorthand Nemeth, ignoring the
rules for exact usage according to context. After all, I knew what I was
writing about, so the context was always clear to me. I would then dictate
a math document to my human reader who would write my spoken material into
perfect printed notation. It seems to me that the ideal Nemeth AI algorithm
would work in the same way. As I am reading from my Braille notes, the
algorithm would translate it into perfect Nemeth Braille code. Since the
rules of UEB and Nemeth Braille are precise and since the printed math
notations are precise, AI should not be required to translate from Braille
to print, or vice versa. As far as I know, we still do not have software
for reverse Braille to print translation.
If you have taken a number of math courses, you have probably endured the
professor who has lectured by speaking minimally and writing minimally. He
would often say something, then point to this or that item on his
blackboard, and simultaneously say something like “from this (pointing) and
that (pointing), we conclude that …”, or he would write his conclusion that
seemed to have no resemblance to what he had previously written or said.
Often times such antics would leave even the sighted members of the class
befuddled and confused. An AI, possibly installed on our smart phones,
could combine spoken and blackboard materials into a more comprehensible
form that would benefit everyone.
Also, think of the ways in which a math AI could assist in test taking.
Suppose there wasn’t sufficient time to produce a test in Braille, large
print, or spoken form. University accessibility support services hesitate
to let us use our own readers. The readers they provide don’t always know
how to read the math content. How much simpler it would be for us and for
the DSS offices if there were a math AI to read test questions to us and
take dictation of our answers if needed.
*Perchance to Dream…*
The problem with reading math or building OCR software for math is that
math is not consistently read linearly. There are times when within a line
you must read vertically (think of subscripts and superscripts, limits of
integration, or binomial coefficients). In our work on automated Nemeth
translation, we have evaded this problem by extending the PreTeXt authoring
language to specify items for Nemeth translation and UEB. It may be
possible to build a neural network capable of parsing a page of printed
math and reconstructing it for tactile or spoken formats.
Even if the math AI that I have suggested were to be built, I hope that
such an AI would never dispense with our need for human readers. The value
of personal contact and working relations should never be discounted. The
reason I was so successful in getting classmates to read was that it
afforded us time to study together, learn by asking questions of each
other, and sharing what we had learned. The use of readers also gave me
experiences that have carried over into daily as well as professional
activities. I developed the confidence to sell myself to potential readers
by explaining how beneficial it would be for both of us. I learned how to
schedule my study time efficiently, how to adjust to the schedules of
others, and how to plan the work I needed my readers to do in a limited
amount of time.
Given the state of AI algorithms and hardware today, the AI that I have
described for translation from human speech to Braille/print is achievable.
But it is unreasonable to expect commercial adaptive technology vendors to
undertake the massive research and development efforts that are needed to
put this kind of AI application together. The number of Nemeth readers is
just a small fraction of UEB readers. What is needed here is a massive
collaborative effort between the organized blind movement, the
universities, the science and math organizations, and the adaptive
technology vendors. Before we can begin this collaborative effort, we, the
organized blind, must have a clear and unified understanding of the AI
products that we want. This article is just the first step in coming to
this understanding. Others will have different ideas that need to be
considered. Once we know exactly what we want, then we will be in a strong
position to promote our ideas, recruit the talent that is needed, and
secure the needed funding. This effort will require much more than the
generous funding that AAF has previously given us. Obviously, the talent we
have gathered around us for automated Nemeth translation will not be
sufficient. Professionals with other skill sets will have to be recruited.
However, we are off to a strong start with the team we currently have in
place. We have established strong working relationships with the academic
and government sectors. We need to extend these relationships.
I don’t have the foresight possessed by futurists on the order of Ray
Kurzweil. But I remain convinced that given the language recognition
possessed by today’s smartphones and smart speakers, what we want is well
within the realm of possibility. I’m not suggesting that my ideas for the
future of AI in STEM are entirely correct, but I hope that this article
will stimulate others into thinking about what AI could do for us and
bringing their ideas to the table. Perhaps the brighter souls among us
could even take part in writing the code for some of these AI applications.
After 60 years of learning and doing math, and watching all of the
technological developments, I find myself on the side of the boosters, at
least in the area of AI applications to help us with STEM. The broad goals
we set now may only be accomplished incrementally. But let us never lose
sight of what we are after. Let the future of AI for us begin, now!
--
*Dr. Al Maneki*
Senior STEM Advisor
NFB Jernigan Institute
443-745-9274
apmaneki at gmail.com
More information about the BlindMath
mailing list