next up previous
Next: New constructions for visualization Up: Algorithm construction Previous: The one state ground

Modifying the one state ground process

We modify the computation of the array $P$ to compute a related array $Q$.

We consider further implications of the reversibility of the ground process. The parameters $l$ and $m$ are a priori two degrees of freedom in the ground process. If the lengths of ${\bf A}$ and ${\bf B}$ are informative, setting $l$ and $m$ to maximize the probability to observe sequences of the given lengths gives a relationship between $l$ and $m$. We choose $m$ and derive $l$.

If the sequences ${\bf A}$ and ${\bf B}$ are subsequences of very very long genomes, the lengths of ${\bf A}$ and ${\bf B}$ may be artifacts of truncation. We express this in the ground process by $l = m$, making insertions and deletions equally likely. In this case, extension of ${\bf A}$ is a zero information event, expressed by $\alpha = 1$. Other transition probabilities are computed in terms of

\begin{displaymath}B = l / (1 + l).\end{displaymath}

Gaps at the ends of alignments could be artifacts of truncation. We model this by replacing $B$ by $\alpha $ at the edges of the $P$ array. This models extending the sequences in each direction infinitely with bases selected from distribution $\pi.$

We normalize by dividing by the probability to observe sequences ${\bf A}$ and ${\bf B}$ separately given $l,$ $m,$ and $\pi$ and omitting the factor for the inital state $1 - B$. We report separately the log likelihood to observe the given sequences,

\begin{displaymath}\log \Pi_{\bf A}+ \log \Pi_{\bf B}.\end{displaymath}

Assembling the above insights, we compute

\begin{eqnarray*}
Q(0,0) & = & 1, \\
Q(i,0) & = & Q(i-1, 0), \\
Q(i,j) & = & Q...
... \pi_{B[l_{\bf B}]}) - \\
& & Q( i-1, l_{{\bf B}}-1) \cdot E^2.
\end{eqnarray*}



We do not multiply the final entry by $1 - \alpha $, because we are not asserting the sequences end.

The $Q$ array provides the same kind of information as the $P$. The essential difference is that the gap cost for leading and trailing gaps is canceled. It is possible to treat gaps differently along each edge of the array.


next up previous
Next: New constructions for visualization Up: Algorithm construction Previous: The one state ground
Lawren Smithline 2003-11-13