The selection was primarily based on the journal's citation and data for the number of downloads following a full year of publication. In a second round, the best papers were compiled and voted on by the Human Immunology editorial board and Klasberg and his colleagues were the happy winners.
The paper, entitled "Patterns of non-ARD variation in more than 300 full-length HLA-DPB1 alleles", traces the full length of the HLA-DPB1 gene. Normally, only DPB1 exon 2 is sequenced for HLA typing. Exon 2 encodes the antigen recognition domain (ARD), the region that is important for donor matching. The rest of the gene usually remains unnoticed - but not in this case. "We as scientists, of course, wanted to know more about this gene. We wanted to understand the role of its diversity outside of the ARD," says Dr. Gerhard Schöfl, Head of Bioinformatics, enthusiastically.
Klasberg and the team have started looking deeper into the whole-gene sequence data. They work with two types of data: Illumina short-read sequencing data and PacBio long-read sequencing data.
"This is the perfect combination to get very accurate and clean nucleotide sequences. These methods complement each other perfectly," Klasberg explains. "Illumina sequencing has a very low error rate and you can mostly trust the sequence. The reason you still need long-read single-molecule sequencing - even though it has a relatively high error rate - is because of the gene’s structure and length. With long homomorphic segments or large repetitive sequences, correctly phasing the two alleles can otherwise be difficult to manage."
The results of the publication reveal new differences between the two previously known clades of the gene. 'Clade' is a term used in evolutionary science. In this case, it describes a group of alleles descended from the same common ancestral gene allele. The two clades have a functional difference that is deemed important in allogeneic stem cell transplantation: one has high levels of expression whereas the other has low levels of expression.
"However, we don't yet know why their expression levels differ so much," Schöfl says. The published results could help solve this puzzle in the future. Several hypotheses have been proposed.
Microsatellite region
The scientists found that a known microsatellite region, a region that carries a large number of tandem repeats, differs in length between the two clades. It is already known that tandem repeats can have an effect on the expression level of a gene.
175 polymorphic positions
In addition, they also found 175 polymorphic sites within the second half of the DPB1 gene that are specific to each clade. Polymorphic means that a position can carry different DNA bases. A specific base only occurs in one of the two clades and therefore defines the clade. Possibly, these sites could also play a part in the expression regulation of the gene.
Transcription factor CTCF binding motif
Finally, Klasberg et al. also identified diverging polymorphisms in a CTCF element in intron 2. CTCF elements are sites that are able to bind the transcription factor CTCF which in turn regulates gene expression. The element carries three polymorphic sites that are different for the two evolutionary clades. This could also be a reason of different expression patterns between clades.
William H. Hildebrand, PhD, D(ABHI) President, also complimented the authors, “Congratulations on this impressive achievement! We look forward to seeing more
papers submitted by you in the future.”
Indeed, this will not be the last publication on the HLA class II DPB1 gene. Moreover, there will also be follow-up studies on the other two class II genes. "We already have the data for HLA-DRB1 and HLA-DQB1. The analyses are underway," says Schöfl.
We can all look forward to what new things will be discovered about MHC class II genes in the near future.