1) Download sprot40.dat (or the newest release) from the ExPASy server.
2) Run the buildindex.pl script to build a protein.dat file
3)
Run the extract.pl script to extract
the sequences of proteins matching a desired regular expression into directory
usage: ./extract.pl regexp (directory)
4) Calculate distances from sequence files matching fileglob: ./distance.pl fileglob
Warning: typing something like ./extract * can dispose of disk quotas rather quickly