During my undergraduate studies, a professor I greatly admired
(perhaps feared would be a better word) once told me that optimization
theory was perhaps one of the most important advances
of this century. Specifically, he pointed to optimality theory,
developed simultaneously by Richard Bellman ("Principle of
Optimality", 1958) and Lev Pontryagin ("The Maximum Principle", 1959)
as a fundamental breakthrough in optimization theory. It is hard to
underemphasize the impact on society of these advances. Every time
you make a telephone call, travel reservation, stock transaction, or
purchase at the supermarket, you are benefiting from advances
in optimization theory. Modern manufacturing processes are dominated
by optimization engines today (what is the best way to cut a tree so
that it yields the most lumber?). Problems such as speech recognition
would be far beyond the capabilities of even today's most powerful
computers without innovations such as dynamic programming (DP).
The two featured papers in this issue deal with the problem of search
in speech recognition. Search is perhaps the single most important
part of a speech recognizer today. The search algorithms in use today
are rooted in dynamic programming (DP). However, we must add many
enhancements to these algorithms beyond classic DP to even approach a
system capable of performing the tasks embodied in commercial products
such as Dragon Systems' Naturally Speaking or IBM's ViaVoice. A
reasonable analogy might be DP is to search what wheels are to an
automobile. There is a lot more to an automobile beyond its spinning
wheels.
Search is an essential part of every day life. Humans perform the
complex operations required for efficient searching effortlessly. We
find optimal, or near optimal solutions, with minimal amounts of
conscious effort. For example, while writing this editorial, I had
the occasion to be traveling through an airport I use frequently
during a busy period. I was reminded how experienced travelers soon
learn all the tricks required to find the shortest path through an
airport. This involves large amounts of prior information (layout of
the airport, time of arrival, efficiency of various airport services,
etc.) and lots of real-time computing (weaving your way through
rush-hour crowds as you exit the airplane).
In the early days of speech recognition, we used essentially a
one-level DP algorithm. This was easy to implement - the core engine
amounted to about 5 lines of code. Today's search engines are
monuments to data structure programming. We try to empower our
computer programs with the same type of human-like intelligence in
search problems, but often fall far short of the mark. The essence of
search is to make use of as many knowledge sources as early as
possible in the search process to limit the search space.
It is easy dismiss this topic as simply an implementation detail. The
words from a famous popular music song resonate with the challenge of
the search problem:
"I have climbed the highest mountains... but I still haven't found what
I'm looking for." (U2, 1987)
Algorithms systematically explore local optima hoping that one of
these will survive to be the global optimum. The millions and millions
of alternatives we search are usually dead-ends. Fortunately, for
speech problems, suboptimal solutions work extremely well, especially
since we rarely find the globally best solution. A significant body of
knowledge has been developed in the last ten years about how to speed
up the search process without introducing search errors. Most of this
information has yet to appear in modern enginering or computer science
textbooks. Hence, the motivation behind this issue is to document these
techniques in a tutorial fashion, and to help students understand the
theory and implementation of these algorithms.
The first paper in this issue, by H. Ney and S. Ortmanns, represents
an excellent overview of the topic, and provides a nice theoretical
framework for this problem. Hermann Ney has done seminal work in this
area and is widely regarded as one of the leading experts in this
field. The second paper, by Deshmukh, et al, describes a specific
approach to search used in a public domain speech recognition
system. This software can be downloaded via the world wide web, and
reviewed along with the paper. The system described is part of an
ambitious program to disseminate speech recognition technology and
promote its research and development.
The search algorithms presented in this issue are rapidly finding
their ways into many non-speech applications. For example, maximum
likelihood decoding, long the staple of digital communications systems
and error control coding, are rapidly moving to beam search strategies
as code lengths become extremely large and the associated memory
requirements exceed resources. It is our hope that this issue will
shed light on a topic not well-known to people outside the speech
recognition community, and motivate signal processing researchers to
apply this technology to other research areas.
Joe Picone
picone@isip.msstate.edu