Sep 282011

There is a really interesting take on the ethics of human genomics from Dienekes’ Anthropology Blog prompted by the aboriginal genome recently released. I can’t say I disagree with anything. Potential bad ethical outcomes of genetic sampling are very rarely clearly explained and just left hanging in the air as something that must be true. If 23andme and the other genomic testing companies have taught us anything it is that huge numbers of people want to know about their genomes. They are interested in their ancestry and not at all concerned by the supposed dangers of knowing something more about themselves. I remember when the southern African genomes were released seeing interviews with one guy who had been sequenced (Desmond Tutu, one of the other genomes, had gone to meet him, I seem to remember). He was really proud that his part of human diversity was being represented. Good for him. I doubt very much that this is a rare view, and I find it slightly patronising that although we know there is no real concern we assume a priori that non-Western peoples might be concerned. Do we also assume that they will be concerned photographs may steal their souls? Even if this were true don’t we have a duty to explain and teach much more than we have a duty to pander to possibly non-existent fears?

I can’t help agree but with Dienekes’ concern over the worrying power of unelected bodies to represent the community.

I am glad that the “Land and Sea Council” gave Willerslev its content. But, seriously, who are they to decide whether the hair sample should be used or not?

It could be argued that Haddon’s unknown hair donor did not authorize a particular use of his hair sample. But, it is ludicrous to expect people from the past to anticipate all the potential uses that their tissues may have in the future. Nor is there any evidence that the anonymous donor authorized some council representing 5,000 future Aboriginal Australians, including a few of his distant relatives to prevent it from being used.

I would take it even further though, in that even elected bodies such as governments do not have automatic rights to determine such ethical issues over their citizens. They are elected to collect taxes, fix roads and the like. If they do wish to set out ill-defined ‘ethical’ restrictions they should start putting them in their election manifesto immediately.

I do very little science that could be of ethical concern to anyone, yet ethics committees still manage to make my life worse. Their actions are often nonsensical, and occasionally even unethical. They often seem to be mostly constituted to protect organisations from criticism rather than to consider actual ethics. My university has ethical restrictions for all animals, not just those mandated by UK law (vertebrates). So, do we get rid of those fish parasites or not? One fish, lots of parasites, do we treat them equivalently? I love nematodes, but even I find equating them to be quite hard core ethics! I was once told that before I could run a student practical class I had to get a medical declaration from all students to their health status including infectious diseases and whether they were pregnant or not. I tried to point out that since I couldn’t use any of this information for any reason it seemed actually unethical to demand the students to tell me such personal information via these ethics forms. Appreciation of irony is not common among ethical committees.

Jul 262011

I have a project going at the moment to examine changes in intron diversity, size and location in animal genomes. I am always a bit frustrated with the way introns are treated in many genome characterisation papers- “the genome contained Y introns with mean intron size Xbp” is usually all we get. This sort of summary stat can hide all manner of interesting trends. One measure that is often useful is intron density but unfortunately there doesn’t seem to be any standardised way to use this measure. Density is often measured as ‘introns per gene’, which is a reasonable shorthand, although since genes vary in length very considerable both within and between genomes it makes quantitative analysis very difficult indeed. I have seen ‘introns per 10kb’! This is OK, but what number to choose? What if each study chooses a different number? ‘Introns per nucleotide’ will standardise this better, and although the number will be very small, we seem to manage just fine with small mutation rate numbers and the like. But the more I think about it the less simple this seems.

Introns per bond

Something that is often overlooked when calculating the number of introns per nucleotide is that introns do not insert into nucleotides but rather the phosphodiester bonds between them. I would suggest therefore that the most accurate and effective way to specify density would be introns per bond. It seems reasonable that counting nucleotides is a convenient shorthand for this, but actually this shorthand leads to small but persistent errors. This is an unfortunate consequence of genome annotation restricting itself to nucleotides but genomic processes sometimes targeting bonds.

In the cartoon of a gene above CDS represents the protein coding region and UTR stands for the 5′ and 3′ untranslated regions. The dashes between nucleotides represent phosphodiester bonds joining the nucleotides. There are 6 nucleototides in the 5′-UTR, 9 nucleotides in the CDS and 5 nucleotides in the 3′-UTR. What would happen if we were to insert an intron in this sequence at the boundary of one of the gene regions? In which gene region would it be counted?

Coding regions almost always begin with a codon specifying a methionine residue- the start codon ATG. Nothing preceding this A nucleotide is counted as part of the CDS. Coding regions finish with a termination codon, TGA in the example above. This A nucleotide is the end of the CDS. By usual practice therefore any intron inserting into the bond between the T and the A at the 5′ end of the CDS would not be counted as part of the CDS, nor would any intron inserting into the bond between the A and the A at the 3′ end of the CDS. This is quite reasonable in many ways, but defining UTRs by reference to the CDS (after the last nucleotide, before the first nucleotide) means that the CDS has one less bond per nucleotide than do the UTRs! The 5′-UTR here has 6 nucleotides and 6 bonds, the 3′-UTR has 5 nucleotides and 5 bonds, but the CDS has 9 nucleotides and only 8 bonds where an inserting intron would be labelled as a ‘CDS intron’.

Does this matter?

Both yes and no. Counts using introns/nucleotide will be very similar to introns/bond. I am not claiming that work needs to be repeated or that substantial errors put into question previous work. But there are two issues here

  1. We should do it right. Understanding the actual insertion process requires us to use the right language. We should label introns as inserting between nucleotides to avoid confusion. You may not be confused, but try writing a script that counts introns when everything is labelled by nucleotide position.
  2. We can’t yet be sure what difference correct counts make in large data sets. The age of genomics is here. We can study hundreds of thousands of introns from lots of species and treat this mass of data statistically. The numbers of introns/nucleotide and introns/bond may look similar to our eyes, but trusting our savannah ape brains to make the right call is a risky strategy with big numbers.
The lab now uses these ‘per bond’ counts in our genomic intron scripts, which will be released when the first paper is out. I think it would be great if there was a biological standard for intron density, maybe we should even give it a unit- a Gilbert perhaps could equal one intron/10-3 bonds?
This post may be cited using the DOI: