
he idea for BLAST was very simple. So simple, most people did not believe it would be very effective." Webb Miller, professor of computer science and engineering at Penn State, is talking about Basic Local Alignment Search Tool, the subject of a 1990 paper he co-authored with Stephen Altschul, Warren Gish, and David Lipman, from the National Center for Biotechnology Information, and Eugene Meyers, of the University of Arizona. BLAST is a computer program that compares a proteins amino-acid sequence with a database of all known proteins.
BLAST works in three steps, Miller explains. "First, it finds pairs of short regions, one region from each of the two sequences being compared, that are exactly the same. Second, for each of the pairs found in step one, it determines whether these short matching regions lie in longer regions that match even if insertions or deletions in the regions are not allowed. Third, for each of the longer matching regions in step two, it determines whether the matching regions are alike in longer regions that match when insertions and deletions are allowed.
"The reason for having so many steps is that step one is much faster than step two, and step two is much faster than step three," Miller adds. "The idea is to do a fast step that eliminates most of the possible matches and then apply the slower step to relatively few cases."
The program surprised even its creators. It was both fast and accurate. You could quickly determine the function of a new amino-acid sequence by searching for matching sequences that had already been investigated. BLAST was so successful, in fact, that the article in the Journal of Molecular Biology became the most cited paper of the decade, according to the Institute for Scientific Information. "The number of citations a paper receives reflects its impact on science," says Miller; with over 100,000 citations in ten years, BLAST has played a critical role in DNA research.
Miller hopes his latest program, PIPMaker, published in the April 2000 issue of Genome Research, will have a similar impact. BLAST can compare amino-acid sequences thousands of letters long; PIPMaker can handle sequences millions of letters long. It can compare the complete genomes of the human and the mouse. "When I charted my goal in 1990, I was aware that my prediction for completion of the mouse sequence, 2008, coincided with me reaching retirement age. If the prediction had been accurate, I could finish my project and then turn my attention to some hobby. As it turns out, Ill soon need to identify another goal, because the completion of the human and mouse sequences is way ahead of schedule."
Kristin McKee
Webb Miller, Ph. D., is professor of computer science and engineering in the College of Science, 326a Pond Laboratory, University Park, PA 16802; 814-865-4551; wcm2@psu.edu.