I enjoyed this cartoon from xkcd (via genetic future).
In order to to see how quickly FastTree runs for me I need some automated method of timing it. While some programs like phyML return a runtime at the end FastTree doesn’t seem to. So I searched the web and found bits of perl code to put a script timer together.
I have uploaded the relevant script (posixtime.pl) to my repository website. It seems to work well for me but test it for yourself. Since it is based on POSIX I think it will only run on *nix systems (like Linux and MacOSX) although it can be made to work on Windows too perhaps (see here).
It has some placeholder code that prints out stuff and reports back how long it has taken. All the section between # Script goes here # and # Script ends here # can be deleted and replaced with the appropriate commands. So to run FastTree include a line like this-
system (FastTree -nt alignment_file > tree_file);
The system command however is also *nix specific I think. Sorry Microsoft guys I’ve never run perl on a Windows machine. There must be an equivalent way to launch external programs if you are working in e.g. ActivePerl.
I think the above command without ./ prefix depends on FastTree being installed in the correct location. This is usr/bin on my system.
Type the command: which perl
You should probably get: /usr/bin/perl
Move to the level above: cd /usr/bin
Copy the FastTree application to here: sudo cp path_to_application ./
Enter password when asked. If you don’t want to type out the path to the application you can (in OSX) just drag and drop the application into the terminal window after you have typed the sudo cp part, and it will paste in the location of the file you have dropped.
You should then be able to launch FastTree by giving the FastTree command wherever you are, without having to cd and move to the directory containing the application.
—
I have posted the FastTree application I described in the previous post at my file repository site, in case you don’t want to install developer tools and mess around with malloc errors.
The script I referred to in my last post is actually seqConverter.pl written by Olaf Bininda-Emonds, with a few minor modifications to send the output directly to phyml. I thought I would flag up his site which has a large and very useful collection of perl scripts for phylogenetic data wrangling. These are open-source scripts and I frequently find myself using and modifying these programs. Thanks Olaf!
I’m not a very competent perl programmer. Even writing the word programmer here makes me slightly embarrassed. I do carry out frequent sequence conversions and manipulations with perl scripts I’ve put together though. Sometimes when I need to run a script many times I’ve found the most irritating thing is launching the scripts and pointing it towards the right input file. A much simpler option in this case is to save the script as an application and drop the files onto it to carry out the conversion. I’ve come across two options for doing this (all this is very Mac-centric I’m afraid but I’d be interested to see MS equivalents in the comments).
The first is the open-source program Platypus by Sveinbjorn Thordarson that “can be used to create native, flawlessly integrated Mac OS X applications from interpreted scripts such as shell scripts or Perl and Python programs”. Make sure that the “is droppable” check box is selected. I found it quite straightforward to turn scripts into droppable applications this way. As it says on the site, but it needs some remembering, you will need to modify your script slightly to accept the infile correctly. The basic tutorial page says the following
Enabling “Is droppable” for an app will modify the property list for for the app in question so that it can receive dropped files in the Dock and Finder. These files are then passed on to the script as arguments via @ARGV. However, the first argument to the script ($ARGV[1], $1 etc., depending on your scripting language of choice) is always the path to the application bundle (for example “/Applications/MyPlatypusApp.app”).
Essentially this means that (in perl at least) where your input file would be identified right at the start by @ARGV[0] it should be changed to @ARGV[1] before creating your application.
Another interesting aspect is the ability to bundle in code files referred to in your script. This means for example that if you have a script that depends on bioperl, it needn’t break, just add in the path to the parts of bioperl needed.
The second option is an AppleScript droppable application. I have to admit that I have never written an AppleScript but I came across this post recently from TUAW outlining the “do script” command. Applescripts can be saved as dropplet applications onto which you drop input files. A bit of Googling reveals people using both do script “script.pl“ and do shell script “script.pl”. The last seems a bit odd since script.pl is a perl not shell script, but it looks like either will work.
As an example I once created a perl script that took an alignment in a range of formats and converted to a format acceptable to phyml, then ran the program using standard settings of my choice. I have this on my desktop as a droppable application called “runPhyml”. It works very nicely for generating quick trees.
In order to really get information out of building phylogenetic trees (especially large ones) some thought has to be given to how to annotate the tips (OTUs).
The two programs that seem to do this in a powerful way are ARB and Treedyn. I also want to explore Tree-Q vista, which looks promising, but haven’t really had chance yet. (Has anybody got experience with Tree-Q vista?).
Treedyn is a very good program for editing and annotating phylogenetic trees. Its action can be driven by scripts and it can carry out many sophisticated graphical transformations.
“Many powerful tree editors are now available, but existing tree visualisation tools make little use of meta-information related to the entities under study such as taxonomic descriptions, geographic distribution or gene functions. This meta-information is useful for the analyses of trees and their publications, but can hardly be encoded within the tree itself (the so-called newick format). Consequently, a tedious manual analysis and post-processing of the tree’s images is required. Particularly with large trees, multiple trees and multiple meta-information variables. TreeDyn links unique leaf labels to lists of variables/values pairs of annotations (meta-information), independently of the tree topologies, remaining fully compatible with the basic newick format.” [www.treedyn.org]
What information can it be labeled with? The best thing would be to parse the information out of the original GenBank files of the sequences that created the tree. Treedyn allows conditional annotation of OTUs by adding to or replacing the existing names. This can be done from an annotation file where the information is held as “key{value}” pairs, such as accession_number{AY123456}, on a line following the unique name from the newick file.
I wrote a little perl script to do this. This could be done much better using BioPerl. My perl skills are very basic, but it works.
#! usr/bin/perl
# Creates an annotation file for treedyn from a file containing multiple
# Genbank files. Annotations are of the form key{value}. Keys must not
# contain spaces.
# usage: genbank2treedyn.pl infile.gb > outfile.tlf
$/ = “//”; # break up records on genbank // delimiter
while (<>) {
/ACCESSION[ ]*(\S+)/; # matches ACCESSION line
$accession = $1;
/AUTHORS[ ]*(\w+),/; # matches first author surname
$author = $1;
/organism=”[ ]*(\S+)[ ]*(\w+).|”/; # matches genus, species
$genus = $1;
$species = $2;
/isolate[ ]*(\S+)/; # matches isolate line
$isolate = $1;
print “$accession \tgenus {$genus} \tspecies {$species} \taccession {$accession} \tisolate {$isolate} \tauthor {$author}\n”;
}
exit;
In addition to tip names Treedyn is able to annotate OTUs with graphical character data, some nice examples on the website.
Of course I also have some grumbles about Treedyn. It doesn’t work properly on Macs, never has. The PC version though seems very stable. The interface is an absolute nightmare. One of the most illogical and confusing I have ever seen. But you can learn to survive it with a little patience. Despite all this the actual functions are well thought out and powerful, even if applying them is difficult sometimes.
The best thing about Treedyn in my opinion is that it is open source.



