Mar 282014

We have two jobs open at the moment in the Hull Evolutionary Genetics group @EvoHull. Both are, I think, quite exciting; not your standard postdoc positions and the group is looking forward to getting two new colleagues.

1-Year Lectureship in Evolutionary Biology

This is maternity cover for Dr Domino Joyce. You will be covering teaching in evolution, ecology, genetics and similar. All the teaching is already prepared, but you can modify and improve as much as you wish. You will be strongly encouraged to be part of the dynamic EvoHull group, that has regular lab meetings, journal clubs, workshops and the like. Its a really fun place to work and you could get  great experience, not just with university teaching but also research and forge new collaborations. This position could really improve your CV when applying for permanent lectureships! Feel free to discuss the position with Domino Joyce or me. Closing date 10th April 2014. Apply here:

2-Year Bioinformatics Research Fellow in Evolutionary and Environmental Genomics

This is an exciting new Research Fellow position for a bioinformatician to work with staff in Evolutionary and Environmental Genomics. We are looking to work with a bioinformatics colleague and scientist, this is not a technical post. We have quite a number of projects, most already with data, on which you could take the lead. We would additionally welcome the development of new projects in collaboration with staff in the group. We anticipate that for the right candidate this could be a very productive fellowship in terms of publications and collaborations. We know that there are  a lot of  positions open for bioinformaticians at the moment, but something that stands this opportunity apart is that its a fellowship not a technical position. You will be treated as a colleague, get to choose form a range of projects, build research collaborations, and develop your own interests alongside the core projects. This is great position for someone who has existing genomic bioinformatics skills, is a first rate scientist, and likes writing lots of papers. Please feel free to discuss the position with me. Closing date Sunday 24th April 2014. Job advert here, apply here:

Other positions

We regularly have postdoc positions to advertise, but if you would like to be pro-active we would love to hear from you. Have a look at the staff on the website and get in touch. Several of us have projects that you could adapt to your own tastes. Our department has a great track record of really supporting fellows (several of whom have gone on to permanent positions) so if you would like to apply for an independent fellowship to work here, make contact and we can help you to develop it (and help you through the bureaucracy too).



School of Biological Biomedical and Environmental Sciences, University of Hull, UK

Hull named in Sunday Times ‘best cities’ list :)

EvoHull group website

Follow @EvoHull on Twitter

Jul 172013
Godfrey Hewitt

Godfrey Hewitt in 2001 after examining my first ever PhD student

10 JANUARY 1940 – 18 FEBRUARY 2013

I was asked to write a piece about my PhD supervisor Godfrey Hewitt for the UK Genetics Society magazine, and have reproduced a version here. I’d been putting off writing about Godfrey since he died in February, making excuses to myself, so a big thank you to the editor Manuela Marescotti for prompting me to just sit down to type. 

Godfrey Hewitt was an outstanding researcher, mentor, teacher, and professor of evolutionary biology at the University of East Anglia. Godfrey was an excellent geneticist who championed the field and promoted the incorporation of molecular genetics into diverse biological fields throughout his distinguished career. A probably incomplete list of the disciplines in which he applied evolutionary genetics might include speciation, phylogeography, hybridization, phylogenetics, molecular evolution, cytology, ancient DNA, conservation, pest biology, animal domestication, island biogeography, population genetics and molecular ecology. It is hard to overstate how influential Godfrey was in several of these areas. He is very highly cited, making him by some metrics one of the world’s most influential ‘ecologists’, received many awards, and had several conferences organized in his honour. Perhaps more importantly though, along with his many collaborators, he synthesized a change in scientific worldview for those working in the areas phylogeography, speciation and Quaternary biology.

Born in Worcester, and always proudly associating with the city, Godfrey chose in the late 1950s to become an undergraduate at the University of Birmingham. This decision was made largely due to the department’s expertise in genetics, and Godfrey later carried out his PhD research there with Kenneth Mather, John Jinks and Bernard John. It was genetics that he initially identified as both personally fascinating and of increasing importance in biology, a view that he maintained throughout his career and which would be hard for anyone to argue with today. Something he proudly recognized, and often mentioned, was the academic rigour of genetics-based science compared to some other disciplines in biology, and scientific rigour was an important component of his own research.

Godfrey was far from a one-dimensional character, and conversations with him about literally almost anything would soon appear and then spiral into completely different and fascinating directions. This was sometimes disconcerting, especially for students first meeting him at conferences, as although he was always very friendly you were quickly far from the topics on which you might have rehearsed speaking to the great man. In addition to a broad scientific knowledge, history, geography, human civilizations, current affairs and sport would all be topics for strong, sparky, and often provocative views. In conversation on many science topics I often found myself wondering, slightly bemused, how on earth he knew anything about this specific and obscure area. He often didn’t, but as with all great scientists he could incisively follow the logic (or lack of it) of an argument without prior knowledge. This excellent foundation of logic and scientific rigour was something that he imparted to the very many scientists who passed through his lab. As a PhD student in Godfrey’s group I learned to think like a scientist, and although there are many things I owe him, this is perhaps the most valuable.

Godfrey Hewitt was one of the most intelligent people that I have ever met. This may surprise some who have spoken with him, as his bonhomie and down to earth common sense were a million miles away from the quirky boffin-like ‘intelligence’ with which the popular media caricaturizes outstanding scientists. Godfrey though was not for intellectual showmanship, he was for getting things done, and the ability to get truly important and complex problems solved is as good a definition of intelligence as I have found. This was Godfrey’s real talent. He could see the wood for the trees, the wood in all its beautiful complexity, the patterns by which the wood had come to have its position, composition and structure, and the relevance of this for other biological systems. He was interested in, but did not obsess over, small areas of methodological or theoretical advance, preferring instead to collaborate productively with those who were experts. This approach and vision was the basis for many of the significant advances that he synthesized.

Godfrey worked extensively with the journal Molecular Ecology, including a period as senior editor. Colleagues speak of the huge amount of time he freely gave, not only reflected in the handling of prodigious numbers of manuscripts but also in the advice and discussion with authors. His generosity with his time was a central part of his character, and extended outside of journal activities being equally given in person to those who approached him at meetings, came to his lab, or just happened to work in the same building.

Much has been written in tribute about his exceptional mentorship of students and postdocs, for which he won a Nature lifetime achievement award. He gave personal support and scientific mentorship naturally and spontaneously, which is a topic frequently returned to by those who worked with him. It is not contradictory to say that although almost all remember Godfrey fondly he could also be very tough. He did not tolerate foolishness, selfishness, or inactivity, and would be very direct with those who disappointed him. This toughness has left a positive mark, still subconsciously setting the bar very high for many of his students and postdocs, even though their enduring memory may still be his fatherly support. Very many of his former lab members have themselves gone on to academic positions worldwide and his scientific genealogy is truly impressive.

Godfrey died in February 2013 after a stubborn battle with cancer that had lasted for a number of years. He will be remembered by most for his exceptional scientific legacy although this impressive body of work will be eclipsed by his generous humanity for those who knew him.

Dr Dave Lunt, The University of Hull, June 2013

Godfrey Hewitt Wikipedia page

UEA tribute page

Lewis Spurgin’s excellent blog post

Heredity tribute

Molecular Ecology tribute 2013

Godfrey Hewitt — Recipient of 2005 Molecular Ecology Prize

Telegraph newspaper obituary

Mar 162013

nowheretogoIn today’s Guardian newspaper geneticist Steve Jones has a short column replying to a 7 year old child who had asked “Will humans evolve into a new species?“. Jones is known in the UK as the media’s favourite geneticist and evolutionary biologist; he is a frequent guest on media shows and contributor in print media. Unfortunately, although very polished, and far from incompetent, he really isn’t very good with the details. He seems to be a self-confident man and often promotes his personal (not very mainstream) views at the expense of what evolutionary geneticists in general think. I don’t like this much, especially when the places he does it are looking for science information as currently understood rather any one person’s views.

Replying to the 7 year old today he first talked about how the speciation process is driven primarily by natural selection (I’m not going to address that in this post though many would be uncomfortable with that idea too). In the second part of the column he goes on to run out his view that evolution has stopped for humans. I’m actually not going to pick apart this silly idea, though many others have, but really just to encourage him to publish as soon as possible. I haven’t found any academic paper in which he puts forward this view, though he has been talking about it in the media for approximately 20 years. If this idea were true it would be important, very important, and very interesting. I would love to read that paper. He should gather his evidence and publish it as soon as possible in a peer reviewed open access scientific journal. Or else shut up.

Some other scientists’ views on Steve Jones’ ideas:

Human evolution stopping? Wrong, wrong, wrong
No Virginia, evolution isn’t ending
Evolution, why it still happens (in pictures)
Steven Jones is being silly
Not the end of evolution again!
Some comments on Steve Jones and human evolution

Mar 102013

Error_404I’ve been thinking about sustainable and accessible archiving of bioinformatics software, I’m pretty scandalized at the current state of affairs, and had a bit of a complain about it before. I thought I’d post some links to other people’s ideas and talk a bit about the situation and action that is needed right now.

Casey Bergman wrote an excellent blog post (read the comments too) and created the BioinformaticsArchive on GitHub. There is a Storify of tweets on this topic.

Hilmar Lapp posted on G+ on the similarity of bioinformatics software persistence to the DataDryad archiving policy implemented by a collection of evolutionary biology journals. That policy change is described in a DataDryad blog post here: and the policies with links to the journal editorials here

The journal Computers & Geosciences has a code archiving policy and provides author instructions (PDF) for uploading code when the paper is accepted.

So this is all very nice, many people seem to agree its important, but what is actually happening? What can be done? Well Casey has led the way with action rather than just words by forking public GitHub repositories mentioned in article abstracts to BioinformaticsArchive. I really support this but we can’t rely on Casey to manage all this indefinitely, he has (aspirations) to have a life too!

What I would like to see

My thoughts aren’t very novel, others have put forward many of these ideas:

1. A publisher driven version of the Bioinformatics Archive

I would like to see bioinformatics journals taking a lead on this. Not just recommending but actually enforcing software archiving just as they enforce submission of sequence data to GenBank. A snapshot at time of publication is the minimum required. Even in cases where the code is not submitted (bad), an archive of the program binary so it can actually be found and used later is needed. Hosting on authors’ websites just isn’t good enough. There are good studies of how frequently URLs cited in the biomed literature decay with time (17238638) and the same is certainly true for links to software. Use of the standard code repositories is what we should expect for authors, just as we expect submission of sequence data to a standard repository not hosting on the authors’ website.

I think there is great merit to using a GitHub public repository owned by a consortium of publishers and maybe also academic community representatives. Discuss. An advantage of using a version control system like GitHub is that it would apply not too subtle pressure to host code rather than just the binary.

2. Redundancy to ensure persistence in the worst case scenario

Archive persistence and preventing deletion is a topic that needs careful consideration. Casey discusses this extensively; authors must be prevented from deleting the archive either intentionally or accidentally. If the public repository was owned by the journals’ “Bioinformatics Software Archiving Consortium” (I just made up this consortium, unfortunately it doesn’t exist) then authors could not delete the repository. Sure they could delete their own repository, but the fork at the community GitHub would remain. It is the permanent community fork that must be referenced in the manuscript, though a link to the authors’ perhaps more up to date code repository could be included in the archived publication snapshot via a wiki page, or README document.

Perhaps this archive could be mirrored to BitBucket or similar for added redundancy? FigShare and DataDryad could also be used for archiving, although it would be suboptimal re-inventing the wheel for code. I would like to see FigShare and DataDryad guys enter the discussion and offer advice since they are experts at data archiving.

3. The community to initiate actual action

A conversation with the publishers of bioinformatics software needs to be started right now. Even just PLOS, BMC, and Oxford Journals adopting a joint policy would establish a critical mass for bioinformatics software publishing. I think maybe an open letter signed by as many people as possible might convince these publishers. Pressure on Twitter and Google+ would help too, as it always does. Who can think of a cool hashtag? Though if anyone knows journal editors an exploratory email conversation might be very productive too. Technically this is not challenging, Casey did a version himself at BioinformaticsArchive. There is very little if any monetary cost to implementing this. It wouldn’t take long.

But can competing journals really be organised like this? Yes, absolutely for sure, there is clear precedent in the 2011 action of >30 ecology and evolutionary biology journals. Also, forward-looking journals will realize it is their interests to make this happen. By implementing this they will seem more modern and professional by comparison to journals not thinking along these lines. Researchers will see strict archiving policy as a reason to trust publications in those journals as more than just ephemeral vague descriptions. These will become the prestige journals, because ultimately we researchers determine what the good journals are.

So what next? Well I think gathering solid advice on good practice is important, but we also need action. I’d discussions with the relative journals ASAP. I’m really not sure if I’m the best person to do this, and there may be better ways of doing it than just blurting it all out in a blog like this, but we do need action soon. It feels like the days before GenBank, and I think we should be ashamed of maintaining this status quo.


Dec 042012

Today I got an email from David E. Schindel, who is the Executive Secretary of the Consortium for the Barcode of Life, announcing Google funding for DNA barcoding. The project aims to create a reference library of endangered species COI sequences so that DNA barcoding can be used as a tool against wildlife trafficking. Good for them, this is a good use of money.

However I was shocked to read later in the email

DNA barcoding is a technique developed at a Canadian university for identifying species using a short, standardized gene sequence

What? Either this was typed and not checked in a bad moment or we have entered the world of barcoding political spin. I assume that ‘at a Canadian university’ refers to Guelph, where the the Canadian Centre for DNA Barcoding is based, lead by Paul Hebert.

The problem is that this Canadian group didn’t invent barcoding, neither the name nor the discipline. I can’t really go into a detailed history of DNA barcoding in this post but the statement in this email makes me squirm, just like when I hear politicians take credit for natural events or someone else’s work. But the meme is out there, the Consortium for the Barcoding of Life begins

In 2003, researchers at the University of Guelph in Ontario, Canada, proposed ‘DNA barcoding’ as a way to identify species.

I don’t want to deny Paul Hebert’s contribution, nor that of the barcoding organisations. They have together popularised, formalised, extended and refined DNA barcoding. DNA barcoding is a force for good in the world and they have explained it beautifully to many diverse biologists, gained funding for several large studies, and refined the methodologies. Good for them.

I would like someone unconnected to the international barcoding groups to write a history of the discipline in a broad context, not just the projects labelling themselves ‘DNA barcoding’. The origins of the methodology and approach probably lie with the bacterial 16S sequencers like Norm Pace. They used short standardised gene segments to identify species and although some bacterial projects were undoubtedly environmental surveys, assigning taxa into molecular clusters with little extra biological information, many others incorporated well-characterised reference strains, which is exactly what most people would describe as DNA barcoding. Jonathan Eisen has an article (“Barcoding” researchers keep ignoring microbes) of relevance here- make sure to read the comments. The first use of the exact term “DNA barcoding” is unclear to me, and may possibly be in classic Hebert paper (12614582), although Blaxter used something essentially the same in the title of his 2002 paper “Molecular barcodes for soil nematode identification” which also employed a short standardised segment of 18S rRNA (11972769). Although there are some who dismiss these sorts of similarity based groupings as ‘environmental surveys’ like those used for bacteria, Floyd et al  also use a phylogenetic approach to link their environmental sequence clusters (MOTUs) to known, classically-described species that have been identified through morphology and vouchers lodged in museums- see Fig 4 in Floyd et al 2002. This is DNA barcoding and differs from typical studies only in the reference locus used. Ritz and Trudgill (1999) cited Blaxter as talking about a ‘molecular bar-code’ a few years earlier in a 1999 publication (Ritz K and Trudgill DL 1999 Plant and Soil 212: 1–11).

Baker and Palumbi (1999) tree identifying whale meat samples by comparison to whale voucher specimen sequences.

So what about mtDNA studies? Well, I haven’t done real research, I’m just trying to remember stuff, and I would be delighted to hear of examples in the comments. It wouldn’t surprise me at all to find that John Avise’s group (pioneers of mtDNA analysis) had used mtDNA to match unknown samples to voucher specimens. They tended to use whole mtDNA and RFLPs though rather than sequencing, would that still count, what do you think? Certainly Silberman and Walsh (1364049) were identifying lobster larvae by RFLPs of PCR amplified rRNA early on, does that count? Alan Wilson’s lab developed some of the first ‘universal’ mtDNA primers used in ecology and evolution (2762322) and again I wouldn’t be surprised to learn that they had assigned unknown specimens to type by DNA barcoding. But they usually chose cytochrome b or 12S rRNA, so would that still count?

A classic DNA barcoding study was published in Science in 1994 (17801528). They took ‘whale’ meat samples from Japanese markets and tried to identify which species they really belonged to. This is almost identical to many classic DNA barcoding studies (10.1016/j.foodres.2008.07.005) in all but that they used a standardised section of the mitochondrial control region rather than COI. I could also mention Hoelzel (2001) “Shark fishing in a fin soup” who identified the species present in shark fin soup using cytb and NADH2 sequences compared to the database.

So what about COI? Folmer et al designed some of the earliest (and best) COI universal primers (7881515). These are great primers and still the most commonly used for DNA barcoding. I was unaware of the Folmer primers when I designed my own universal primers (Lunt 1994 PhD thesis)(8799733) and several labs were doing this. In Godfrey Hewitt’s lab at UEA we had up to that point been using conserved mtDNA primers from Richard Harrison’s lab at Cornell (they were in pairs named after US presidents and their wives). We weren’t barcoding, the primers were being used for phylogeography, phylogeny and molecular evolution studies. This background just illustrates that COI primers had been around and used widely in all types of evolutionary biology for over a decade before the famous Hebert et al 2003 paper. So had anyone used DNA sequencing of COI with universal primers to match unknown specimens to described vouchered species? Had anyone used this approach to discover and describe cryptic species (another important aspect of DNA barcoding)? Definitely, probably lots of people! A study I designed with Africa Gomez an published in 2002 did exactly this (12206243). We had known rotifer isolates characterised by morphology, mating, ecology etc. We had lots of unknown eggs and identified them using a phylogenetic analysis of COI with the standard barcoding primers. Were we the first? Definitely not, we never thought for a minute that we were the first to do this, but I couldn’t tell you who was. Let me just repeat that, we were NOT the first, we did NOT invent DNA barcoding, not even in animals. I just wish people would stop claiming to have ‘invented’ DNA barcoding and instead understand the context in which their work stands. I doubt very much that DNA barcoding in any meaningful sense had a single origin. It was not a moment of inspiration, it was incremental change, as almost all scientific advance is.

If you know any good science journalists please buy them beers and persuade them to write the history of ‘DNA barcoding’ in the wide sense, and especially of the work of the bacterial 16S pioneers, I’d like to read that.


Dec 042012

 I have in front of me a copy of the book “Nucleotide sequences 1984 Part 1 A compilation from the GenBankTM and EMBL data libraries” published by IRL Press. Wow, what a surreal book for anyone used to dealing with sequence databases today. The idea that DNA sequences would be printed out, in an actual book made of paper, and put on a shelf for people to consult, takes some getting used to. To say that it is an idea that has passed is something of an understatement. I bought it for almost nothing as a curio, and it is going to sit proudly on my office shelves. I might even buy Part 2 to go with it.

The sequences range from 1967 to late 1983. The paper is not very white and slightly absorbant, not due to age I just think it was just published that way. It weighs 1.55kg and isn’t a large book. I’ve put a gallery of images below with the book next to a DNA double helix for scale! OK there is a baseball too, a strange collection of things just came to hand, apparently. Quite a number of sequences are very short (<100bp) and remind me of second gen sequence reads! Despite my incredulity at the start of this post, some of the ideas concerning open access to data, which are referred to in this book’s Introduction are very contemporary. The international sequence databases really have been important torch bearers for open access to research data for the last few decades.

There are some nice quotes in the Introduction

While computerized management of the data is needed to provide accuracy, easy maintenance, and electronic access, it is also important to publish the complete database in printed form. This first annual printed compendium effectively makes the entire collection of information available to every member of the scientific community who wishes to use it, including investigators without access to computers.

One of the goals of the collaboration between GenBank and EMBL is continued movement toward common standards and conventions for the two databases.

This compendium, drawn from the American and European databases, is the first printed compilation of substantially all nucleic acid sequences reported between 1967 and late 1983.

As combined in this compendium, the two databases contain a total of nearly three million bases from over 4000 reported sequences.

Yeast and fungal sequences are in the Plant Sequences section

The individual entries within each section are arranged alphabetically by entry name.

The records seem to be closer to EMBL format than GenBank, although Appendix E (which is in part 2) “illustrates how the format used in the compendium relates to the formats used in the two databases“. The sequences are grouped into mammalian, other vertebrate, invertebrate, plant, and organelle sequence lists. There is also a table of contents, one record per line, giving the length of the sequence and what page it is on.

The first sequence in the entire book is “APE (CHIMPANZEE) ALU TYPE DNA ACCESSION NUMBERS: J00322″ and the last is “YEAST (S. CEREVISIAE) MITOCHONDRIAL VAR1 GENE 3′ FLANK . ACCESION NUMBERS: K00385″

Google books seems to have scanned in the entirety of both volumes, but I couldn’t get it to work for me. What a fantastic book.

Oct 142011

Cryptic Species: Illustration of genetically, geographically, ecologically and reproductively isolated 'groups' currently classified as the single bryozoan species Celleporella hyalina. From Gomez et al 2007

Rod Page at iPhylo draws attention to a new paper in Systematic Biology (Costello et al 2011) estimating the total number of species. They come to a much lower figure than a previous paper (Camilo Mora et al 2011). Rod said something interesting that linked in to my thoughts on species numbers.

“The fuss over the number of bacteria and archaea seems to me to be largely a misunderstanding of how taxonomic databases count taxa. Databases like Catalogue of Life record described species, and most bacteria aren’t formally described because they can’t be cultured. Hence there will always be a disparity between the extent of diversity revealed by phylogenetics and by classical taxonomy.”

These papers seem to be estimating the number of species that would be formally described if we carried on as we have been. The interesting thing is how this relates to the actual number of species that exist. I wonder what the slope of the line of increasing number of species formally described and the slope of informal ‘descriptions’ (eg from DNA) would look like? After all surely we are only really interested in the number of species in nature, not in our catalog of nature. Studies of the change in our estimates of a parameter are always less interesting and useful than the parameter itself, species number in this case.

Cryptic Species

Even putting consideration of bacteria and archaea aside, use of population level DNA barcoding has revealed large numbers of cryptic species. These are often, as you might expect, among small dull-looking taxa where its hard to tell them apart by eye (although we do also find cryptic species in very well characterised groups such as birds, and mammals).

My feeling is that it is very rare indeed for the outputs of DNA barcoding to lead to formal descriptions of species. This is partly because those scientists do not have suitable training and partly because species description is a very difficult and frustrating task.

Meiofaunal Community Sequencing

Meiofaunal community sequencing has suggested very large increases of biodiversity of eukaryotes compared to morphological approaches. Studies of nematodes for example reveals very large numbers of (conservatively judged) Operational Clustered Taxonomic Units (likely species or higher level groupings). The work of Si Creer and colleagues is particularly informative.

Remarkably, along only an 800 m transect, we detected 182 Nematoda OCTUs, compared with 450 species of Nematode that have been described from around the entire British Isles. From a geographical perspective, these data represent the discovery of 40% of the previously known phylum richness from a transect that represents 0.004% of the length of the British coastline (~17,820 km, Ordnance Survey). (Fonseca et al. 2010)

Yes we can argue about what is a species. Yes there can be problems with defining taxa by % sequence divergence alone. But really these would be fine-scale adjustments, its hard to get away from the fact that lots of lines of evidence suggest that there are a lot of undescribed species. Don’t forget that this isn’t DNA barcoding, this is species identification and discovery. Often when extensive geographic sampling is carried out on these small organisms they may additionally fall into cryptic species assemblies. So cryptic species complexes may be overlaid on top of this realisation that much/most biodiversity is undiscovered.

Will these species ever be described?

Will these species ever be described? No, they won’t. Almost none of these will ever be described formally, and yet they exist, they comprise a very important component of our ecosystems. This problem is not going to go away, and will likely get more evident with high throughput environmental sequencing. The approaches to estimating species numbers need to be more explicit (especially towards the press) about what they are actually counting. They are not counting species numbers, but the frequency with which people write up species descriptions, and I would argue that DNA barcoding and environmental sequencing remove any plausible correspondence between these two rates.

You could of course take the view that a species only exists when it has been formally described. I’m sure formal description is a good thing, but irrespective of their official status these species do exist, they do contribute to actual biodiversity, they do interact in networks, they do harm/help our soil, crops, livestock, and health. Unlike King Canute at some point we have to respond to the flood in a practical manner, and counting described species as if they were true estimates of species numbers is starting to look rather naive.

I don’t want to criticise these groups too much. I like anyone who has a go at species diversity estimation, and Camilo Mora et al in particular do look at links between described and actual species numbers, and it is always difficult to get the subtleties of your work over in the press. Its just that I am yet to really see this important difference come over in the reporting of this work, nor have I really seen many biologists (at coffee time, or the web) who see it. We are making the assumption that species description in birds and mammals represent ‘small beasties’, and that pre-molecular estimates are representative of post-molecular. Do you feel comfortable with those assumptions? I don’t really have new suggestions how we should estimate the true number of eukaryotic species on our planet, but we need to think much more broadly and critically about how we should estimate this. And don’t even get me started on the overlooked bacterial and achaean diversity…


Camilo Mora, Derek P. Tittensor, Sina Adl, Alastair G. B. Simpson, Boris Worm. How Many Species Are There on Earth and in the Ocean?. PLoS Biol 9(8): e1001127.doi:10.1371/journal.pbio.1001127

Mark J. Costello, Simon Wilson and Brett Houlding. Predicting total global species richness using rates of species description and estimates of taxonomic effort. Syst Biol (2011) doi:10.1093/sysbio/syr080

Gómez, A et al Mating trials validate the use of DNA barcoding to reveal cryptic speciation of a marine bryozoan taxon. Proceedings of the Royal Society B: Biological Sciences 274, no. 1607 (January 2007): 199. doi:10.1098/rspb.2006.3718

Fonseca, Vera G,  et al. Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature communications 1 (January 2010): 98. doi:10.1038/ncomms1095

Creer, S et al. Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Molecular Ecology 19 Suppl 1 (March 2010): 4-20. doi:10.1111/j.1365-294X.2009.04473.x.


Sep 282011

There is a really interesting take on the ethics of human genomics from Dienekes’ Anthropology Blog prompted by the aboriginal genome recently released. I can’t say I disagree with anything. Potential bad ethical outcomes of genetic sampling are very rarely clearly explained and just left hanging in the air as something that must be true. If 23andme and the other genomic testing companies have taught us anything it is that huge numbers of people want to know about their genomes. They are interested in their ancestry and not at all concerned by the supposed dangers of knowing something more about themselves. I remember when the southern African genomes were released seeing interviews with one guy who had been sequenced (Desmond Tutu, one of the other genomes, had gone to meet him, I seem to remember). He was really proud that his part of human diversity was being represented. Good for him. I doubt very much that this is a rare view, and I find it slightly patronising that although we know there is no real concern we assume a priori that non-Western peoples might be concerned. Do we also assume that they will be concerned photographs may steal their souls? Even if this were true don’t we have a duty to explain and teach much more than we have a duty to pander to possibly non-existent fears?

I can’t help agree but with Dienekes’ concern over the worrying power of unelected bodies to represent the community.

I am glad that the “Land and Sea Council” gave Willerslev its content. But, seriously, who are they to decide whether the hair sample should be used or not?

It could be argued that Haddon’s unknown hair donor did not authorize a particular use of his hair sample. But, it is ludicrous to expect people from the past to anticipate all the potential uses that their tissues may have in the future. Nor is there any evidence that the anonymous donor authorized some council representing 5,000 future Aboriginal Australians, including a few of his distant relatives to prevent it from being used.

I would take it even further though, in that even elected bodies such as governments do not have automatic rights to determine such ethical issues over their citizens. They are elected to collect taxes, fix roads and the like. If they do wish to set out ill-defined ‘ethical’ restrictions they should start putting them in their election manifesto immediately.

I do very little science that could be of ethical concern to anyone, yet ethics committees still manage to make my life worse. Their actions are often nonsensical, and occasionally even unethical. They often seem to be mostly constituted to protect organisations from criticism rather than to consider actual ethics. My university has ethical restrictions for all animals, not just those mandated by UK law (vertebrates). So, do we get rid of those fish parasites or not? One fish, lots of parasites, do we treat them equivalently? I love nematodes, but even I find equating them to be quite hard core ethics! I was once told that before I could run a student practical class I had to get a medical declaration from all students to their health status including infectious diseases and whether they were pregnant or not. I tried to point out that since I couldn’t use any of this information for any reason it seemed actually unethical to demand the students to tell me such personal information via these ethics forms. Appreciation of irony is not common among ethical committees.

Jul 222011

For those of you who haven’t come across it before Bio-Linux is an operating system set up for bioinformatics with a huge number of programs pre-installed. It can be obtained (for free) from the NERC Environmental Bioinformatics Centre. I’ve spent quite a while recently messing with installations of software packages and wanted to see how everything would work in a pre-installed environment. You can obtain a USB drive from NERC and boot from that, but it doesn’t work for OSX. Also, I wasn’t sure that I wanted to reboot each time as I may need to flip backwards and forwards between applications in Bio-Linux and OSX. Here I document a few experiments with installing and running Bio-Linux within OSX (so I don’t have to re-boot) using VirtualBox.

Here are a few choice quotes about Bio-Linux

Bio-Linux 6 packs a wealth of bioinformatics tools, scientific software and documentation into a powerful and user-friendly 64-bit Ubuntu Linux system. Download Bio-Linux today and turn your PC into a powerful workstation in minutes.

Bio-Linux 6.0 is a fully featured, powerful, configurable and easy to maintain bioinformatics workstation. Bio-Linux provides more than 500 bioinformatics programs on an Ubuntu Linux 10.04 base. There is a graphical menu for bioinformatics programs, as well as easy access to the Bio-Linux bioinformatics documentation system and sample data useful for testing programs. You can also install Bio-Linux packages to handle new generation sequence data types.

FYI: I’m running OSX 10.6.8 (Snow Leopard) on a MacPro with 4GB RAM and 2x 2.8GHz Quad-Core Intel Xeon processors. The list below is going to take >1 hour.

Here’s what I did to install

  1. Download and install VirtualBox from
  2. Download Bio-Linux6 (2.2 GB) from Since this is a free, supported, software paid for by the UK taxpayer it would be really great for NBAF-W if you registered so that they can say ‘X people have downloaded this software’. Also please cite the paper (Field et al 2006) when you can.
  3. Open VirtualBox and click “New” from the toolbar. Follow the installation Wizard.
  4. Give your virtual machine a name like “BioLinux”, choose Linux as the operating system, and select Ubuntu 64 bit as the version.
  5. Select the amount of RAM to give it- 1024MB should be OK, 512MB the default could be a bit mean. More RAM is always better, especially if you are going to set it to do a lot of hard work. This can always be changed later.
  6. Virtual Hard Disk- use the defaults (create new), and again on the next screen (VDI).
  7. Virtual disk storage details. “Dynamically allocated” is the default and I used this first time out. I suspect that it was the cause of slowness though and changed to “Fixed size” next time through. Certainly if you go for Dynamically allocated make sure to give it enough space on the following screen.
  8. VD file location and size- I used 8GB and Dynamic first time through and it was immediately short of space after I did a system update. I would definitely choose 16GB if you have the space on your HD. When I compared the two this 16GB fixed size felt much faster.
  9. The next screen is a summary and now you can press “Create” to create your virtual disk. If you have chosen “Fixed size” it will take a little while to create this virtual disk (5-10 mins) but will likely run faster in the future. At the end of the process you come back to exactly the same summary screen as at the start, with no indication that anything has happened. If you press the “Create” button again though it immediately updates to show you your new virtual disk in the VirtualBox Manager window.
  10. You can now press the Green Start arrow in the toolbar to launch it. You will now get a “First Run Wizard”.
  11. Select Installation Media. Now is the time to select the operating system that you specified in step 4, ie point it towards your download of BioLinux. If you click on the little folder icon to the right of the drop-down menu you can select your BioLinux file. Use the dropdown in the file list window to select “RAW (*.iso *.cdr)” as your BioLinux is an .iso file. Check your downloads folder to locate it. At this point it is very easy (I did it 4 times across 2 installs) to click on something that causes the screen to freeze and bleep whenever you click on anything. The Esc key solved this for me. Be careful where you click! When you have selected the file you should be back at the Select Installation Media dialog with bio-linux-6-latest.iso now selected. Continue.
  12. The next screen claims that you are installing your file from CD/DVD, ignore that, you know the truth. Click Start.
  13. You should now get an Ubuntu window and wait a couple of minutes before it boots and you see the BioLinux desktop and the install window.
  14. Choose your language and “Install Bio-Linux 6″ at the bottom. Don’t click on “Try Bio-Linux”. Then select time zone.
  15. Keyboard layout “Choose your own” then select “United Kingdom Macintosh” from the right panel.
  16. Accept the defaults, then add your name and password. I set it to log in automatically here.
  17. Now click INSTALL. Almost done. It will take a few minutes to install, go and have a coffee.
  18. “Installation complete- you need to restart the computer.” This refers only to the virtual computer. Restart. “Please remove the disc and close the tray (if any) then press enter”. This is because the software still thinks you are installing from a DVD. Ignore it, and press enter, Ubuntu Bio-Linux will boot.
Congratulations, all done, you are ready to go off and play with it.
There might be a few things you want to do in this new operating system.
  • You should probably set a network proxy: System –> Preferences –> Network proxy. Similarly you might want to use the “Ignored hosts” tab to exclude your university domain “*” in my case
  • You might want to update the system software. System –> Administration –> Update manager.
  • You might want to go to the VirtualBox Manager window  and click on Shared Folders. Then add a folder from your HD where you want to keep data accessible to both operating systems. I set mine to Auto-mount when I log in. I don’t think this works until you have restarted Bio-Linux.


You may also find a preconfigured VirtualBox BioLinux image, but at the time I wrote this it wasn’t the latest version (v5). It might be worth checking.
Many thanks to Steve Moss who introduced me to VirtualBox, helped me install this, and showed me some useful stuff.
Vested interest? I am on the NERC Biomolecular Analysis Facility (NBAF) steering committee, which has a role in oversight of NBAF-W who created Bio-Linux. I don’t feel in any way biased by this, but hey, you decide.
Jul 182011

There have been several obituaries for Horace Judson recently [1][2], and today Larry Moran in an excellent Sandwalk blog post talked about the lack of knowledge of the history of their field by molecular biologists

modern researchers are completely unaware of the history of their field. That’s partly because the work on bacteria and bacteriophage—where the basic concepts were often discovered—is no longer taught in biochemistry and molecular biology courses. This leads to the false idea, as expressed in the press release, that all new discoveries in eukaryotes are truly new concepts that nobody ever thought of before. The solution to this problem is to make all students read The Eighth Day of Creation.

I liked the quote from John Hawks too

I suppose we could rephrase Santayana: Those who ignore history feel privileged to reinvent it.

Judon wrote the truly epic book “The Eighth Day of Creation: Makers of the Revolution in Biology” which describes in detail the development of molecular biology from extensive interviews with its early pioneers. It’s a great read, his writing style is easy and absorbing, and the content fascinating. Despite not having yet finished the book, I can recommend it very highly indeed. What? Wait, you haven’t finished the book yet? How good can it really be? It’s a great book, but one that suffers from poor publishing by Cold Spring Harbor Press. Let me get my excuses out of the way now; I’m really busy, have little time for reading things that aren’t journal articles, and have a big backlog of other books to read. Yet these aren’t the real reasons. The real reasons are that it is enormous and only comes as a paper copy. The book, at 714 pages, is very weighty and thick even as a paperback. It is about as thick as a single volume of this size can be, and of course the pages themselves don’t open out very flat. It is pretty heavy and I have decided not to take it on holiday with me based on this alone. That is a shame, as holidays are when I catch up on reading.

There is a simple solution however – release it as an eBook. I would love to read this as a Kindle book on my iPad and be able to take it anywhere and just dip into it. It wouldn’t matter then how long it was. What is more I would be able to look stuff up when sitting in seminars and journal clubs, just quickly checking the history of a topic. Lastly I would like to be able to highlight and comment on sections. I have an absolute phobia of writing in books, I just can’t do it. Somehow (almost religiously) I know it is just plain wrong, even though I can’t think of a single reason why. I have no such qualms about marking up an eBook however, highlighting sections and adding notes. These notes and highlighted sections are searchable and easily found again- very useful indeed.

Although I really agree with Larry Moran’s concluding sentence “The solution to this problem is to make all students read The Eighth Day of Creation” I think that the chances are remote without good modern publishers helping the process along. Do something useful today, go to the Amazon webpage of Eighth Day of Creation and click on the link (usually just under the picture) to request a Kindle version from the publisher.