Dec 042012

 I have in front of me a copy of the book “Nucleotide sequences 1984 Part 1 A compilation from the GenBankTM and EMBL data libraries” published by IRL Press. Wow, what a surreal book for anyone used to dealing with sequence databases today. The idea that DNA sequences would be printed out, in an actual book made of paper, and put on a shelf for people to consult, takes some getting used to. To say that it is an idea that has passed is something of an understatement. I bought it for almost nothing as a curio, and it is going to sit proudly on my office shelves. I might even buy Part 2 to go with it.

The sequences range from 1967 to late 1983. The paper is not very white and slightly absorbant, not due to age I just think it was just published that way. It weighs 1.55kg and isn’t a large book. I’ve put a gallery of images below with the book next to a DNA double helix for scale! OK there is a baseball too, a strange collection of things just came to hand, apparently. Quite a number of sequences are very short (<100bp) and remind me of second gen sequence reads! Despite my incredulity at the start of this post, some of the ideas concerning open access to data, which are referred to in this book’s Introduction are very contemporary. The international sequence databases really have been important torch bearers for open access to research data for the last few decades.

There are some nice quotes in the Introduction

While computerized management of the data is needed to provide accuracy, easy maintenance, and electronic access, it is also important to publish the complete database in printed form. This first annual printed compendium effectively makes the entire collection of information available to every member of the scientific community who wishes to use it, including investigators without access to computers.

One of the goals of the collaboration between GenBank and EMBL is continued movement toward common standards and conventions for the two databases.

This compendium, drawn from the American and European databases, is the first printed compilation of substantially all nucleic acid sequences reported between 1967 and late 1983.

As combined in this compendium, the two databases contain a total of nearly three million bases from over 4000 reported sequences.

Yeast and fungal sequences are in the Plant Sequences section

The individual entries within each section are arranged alphabetically by entry name.

The records seem to be closer to EMBL format than GenBank, although Appendix E (which is in part 2) “illustrates how the format used in the compendium relates to the formats used in the two databases“. The sequences are grouped into mammalian, other vertebrate, invertebrate, plant, and organelle sequence lists. There is also a table of contents, one record per line, giving the length of the sequence and what page it is on.

The first sequence in the entire book is “APE (CHIMPANZEE) ALU TYPE DNA ACCESSION NUMBERS: J00322″ and the last is “YEAST (S. CEREVISIAE) MITOCHONDRIAL VAR1 GENE 3′ FLANK . ACCESION NUMBERS: K00385″

Google books seems to have scanned in the entirety of both volumes, but I couldn’t get it to work for me. What a fantastic book.

Jan 182010

Last January I made a list of (science) new year resolutions and made some predictions for the coming year. Thought I’d have a look back…

2009 Resolutions
* Read more. I used to read at least a paper a day during my PhD. Some PDF counting last year showed me I had averaged 3 per week over the last 10 years. I think I could get back to 1 per day with a bit of determination. I must concentrate the effort a little bit more though, no more reading up on snail biogeography just because I’ve found a cool one at the beach.
I’ve read slightly less papers, but many more academic blogs. Not yet sure what I think to this strategy, it may fall into the “snail biogeography” category (above), but on the other hand I have learned a lot, some of it even related to my research areas.

* Sort out my electronic lab book system
Success I think. My ELN is running very well. I have implemented it with students, mostly successful. I’m very pleased with the whole ELN thing.

* Add more to Wikipedia, especially species, and get into the habit of taking and posting images to Wikimedia.
I’m starting to get addicted to Wikipedia, and I’ve even suggested starting a Biology Reviews course for undergrads based on writing a Wikipedia page.

* Reread the Origin of Species (its been too many years)
Fail. I read about half of it over Christmas but then holiday ended and I stopped having any book time. Really enjoyed the first half though.

* Celebrate Darwin year!
Yes indeed

Resolutions for 2010

  1. Take 1 day per week purely for science (rather than bureaucracy)
  2. Teach myself some Second Generation Sequencing informatics
  3. Sequence my first genome
  4. Blog more (I have moved all the small “posts” to FriendFeed this year, but that is still no excuse)

2009 Predictions
Stuff that didn’t happen
* Creationists will exploit PR better than scientists to get their stories into mainstream newspapers and onto TV and we will see a new telegenic and ‘reasonable’ face of evolution-bashing.
I am very glad to say that this was wrong. A good year for evolution on TV.

Stuff that I didn’t notice happen and thankfully probably didn’t
* Some apparently maladaptive (to the casual public observer) part of the human body or disease susceptibility will be touted as a demonstration that evolution does not work.
* A famous, possibly well-meaning, UK politician will advocate ‘teaching the controversy’ (i.e. creationism alongside evolution in science lessons).
* Some evolutionary biologist you have actually heard of with a new paper disagreeing with some minutiae of evolutionary biology (maybe in some aspect of population genetics) will be put forward as a critic of evolution on a really slow news day.
* Steve Jones ‘evolution has stopped’ will resurface yet again and get more air time and column inches than all evolutionary biology research published in 2009 put together

Stuff that was almost right
* The Pope will give a speech extolling the power and vision of God in bringing his laws of evolution by natural selection into Darwin’s stubborn mind. I hope he remembers to mention Wallace too!
Close. “The Vatican has admitted that Charles Darwin was on the right track when he claimed that Man descended from apes. A leading official declared yesterday that Darwin’s theory of evolution was compatible with Christian faith, and could even be traced to St Augustine and St Thomas Aquinas. “In fact, what we mean by evolution is the world as created by God,” said Archbishop Gianfranco Ravasi, head of the Pontifical Council for Culture.” The Times, Feb 11 2009

Predictions for 2010

  1. New sequencing technologies will launch and emphasize why we should be calling 454/Illumina second (not ‘next’) generation sequencing.
  2. BBC reporters will continue to call DNA sequencing “mapping” in all possible situations until, finally, biologists agree to change their terms and alter all the textbooks
  3. Large scale sequencing and evolutionary analysis of flu will (continue to) make a really powerful case for evolution to the public, and there will be an evolution-centric TV documentary on flu
May 032009

I’ve moved out of my office while falling plaster and cracks in the wall (last years earthquake damage!) are repaired. While packing up I made a decision to see if I could get rid of most of the paper in my office (and not replace it). This is both for environmental reasons and also because I can never find anything when its a paper copy but a search through my hard drive is almost instantaneous.

The first thing I’ve done is get rid of almost all journal articles in paper form. I knew there would be a few exceptions to this where I have rare articles, but I intended that everything else would be kept as PDFs or not at all. I use the excellent Papers software for my literature.
This recycling worked quite well. I got rid of almost two filing cabinets, but I was expecting more. It turned out that some collections of papers, particularly those I use with students, just work better as physical copies. This is partly because some classic papers we need to go through together. I also found that some papers I am working with a lot (either because I need to read them 10 times to understand them or because they are important sources for something I’m writing) I find more comfortable as paper copies. In the end I kept more than I thought I would, but maybe this will change when I start the unpacking cull.

In all I estimate that I recycled about 600kg of paper (about the same as a large cow!). This truly amazing amount wasn’t mostly journal articles, but catalogs, old teaching material, all my back issues of Nature, Evolution, Molecular Phylogenetics and Evolution, Systematic Biology, TIG, manuals for old equipment, files full of old grant applications and folders of old data printouts.

I now need to buy a decent scanner and make sure I don’t start to restock my herd.

Jan 112009

Just got back from a trip visiting family in Spain. Relying on my iPod touch for net access but I was unable to blog in any sensible way at all. Couldn’t create posts on Blogger for some reason I’m not quite sure of. I have a few posts I wrote anyway…
My new year resolutions include…

  • Read more. I used to read at least a paper a day during my PhD. Some PDF counting last year showed me I had averaged 3 per week over the last 10 years. I think I could get back to 1 per day with a bit of determination. I must concentrate the effort a little bit more though, no more reading up on snail biogeography just because I’ve found a cool one at the beach.
  • Sort out my electronic lab book system (I’m still testing, future post on this).
  • Add more to Wikipedia, especially species, and get into the habit of taking and posting images to Wikimedia.
  • Reread the Origin of Species (its been too many years)
  • Celebrate Darwin year!