The Statistics of Gapped Sequence AlignmentsJohn L. SpougeNational Institutes of HealthNational Library of Medicine National Center for Biotechnology Information |
|
ABSTRACT:
Sequence comparison is indispensable to modern molecular biology. For
example, biologists use the BLAST program more than once a second over
the web to compare their query sequences to databases. If a query
matches a database sequence of known function with a small p-value, the
biological function of the query can be inferred. Presently, no on-line
method can compute p-values to the accuracy the BLAST program requires,
so sequence matches are restricted to certain pre-computed statistical
parameters, to the detriment of certain types of database retrieval
applications. Over the past two years, we have reduced the simulation
time required to estimate BLAST statistical parameters from about two
days to less than one second, with prototype code for on-line
estimation. Our mathematical methods entail many interesting
speculations. |