|
|
|
|
|
|
|
|
|
|
Great East Software | 2010-09-04 | |||
|
Aho and Corasick |
|||||
| Applications Benchmarks Contact Home |
Aho and Corasick first developed a system for construction of finite state automata (fsa) for matching parallel patterns in text (A.V. Aho and M.J. Corasick. Fast pattern matching: an aid to bibliographic search. CACM, 18(6):333–340, June 1975). Since their system was initially implemented in Fortran, a language primarily used for mathematical programming, they suggested representing each state by a number. They first run each pattern through a function which creates a partial state machine. This first step is straightforward and intuitive. However, due to the severe restrictions imposed by the use of Fortran as their implementation language, their further steps involve a quite complicated (and, to one not familiar with Fortran, counterintuitive) set of manipulations of the states, transitions (and failure states) already created in order to complete the construction of a non-deterministic fsa. They then describe a further function for construction of a deterministic fsa from non-deterministic one. Once again, the logic is extremely complicated. Since the publication of their article, C programmers have followed their directions, implementing them in the C language, and replacing the arrays of numbers with linked lists of data structures. A prominent example of this appeared in an article in Dr. Dobbs Journal by Ian Ashdown. “Parallel pattern matching and fgrep ”, (Dr. Dobbs J., 10(9):46-67, 1985). Since then others have devised even more complex algorithms for parallel pattern matching, notably Commentz-Walter, (“A string matching algorithm fast on the average”, In Proc. ICALP'79, number 6) whose algorithm is optimized for a few long patterns, but performs poorly for many short patterns. | ||||
|
|
|
|
Sealevel SiteCare™
|
|