Kolmolgorov Complexity Based Distance Phylogenies

John Scoville

 

Information theory, for obvious reasons, plays a key role in bioinformatics.Ê The purpose of this project is to study the applicability of the theory of Kolmolgorov complexity to the determination of protein phylogeny.Ê After a short mathematical exposition and a word about algorithms for producing phylogenies from sequence data, a new scheme for the generation of protein-protein distances is proposed.Ê This scheme is based on universal data compression algorithms and produces distance matrices without a multiple sequence alignment.

 

Entropy and Information

Kolmolgorov Complexity and Solomonoff Induction

Phylogenies, Parsimony, and Distance Methods

The Algorithm

Interfacing Swiss-Prot

Summary of Results

Bibliography