Press Releases
Jul. 24, 2007

First 100% Complete Eykaryotic Genome

— A Modern Miracle Accomplished by Japanese Spirit and Skill —


Excluding bacteria, all of the organisms such as humans are composed of eukaryotic cells. Despite rapidly accumulating genomic-sequence data, until now all published eukaryotic genome sequences - including our own previously published sequence for the genome of the primitive red alga Cyanidioschyzon merolae 10D (Matsuzaki et al. 2004 Nature 428: 653-657) - have been incomplete. In the present paper, our Japanese research community established the first 100% complete nuclear genome sequence, based on completely filling contig gaps and end-sequencing of chromosomal ends in C. merolae. We have unambiguously established that the C. merolae genome contains the smallest known histone-gene cluster, a unique telomeric repeat, and an extremely low abundance of transposons. These constitute to the simplest set of genomic features found in any free-living eukaryote yet studied. We believe that this study constitutes an important step toward understanding the fundamental genomic attributes of the eukaryotes. In addition, although it is a basic tenet of modern science that all genomes are composed of linear or circular sequences of the four kinds of deoxyribonucleotides, no eukaryotic nuclear genomes have been previously demonstrated as “unambiguous sequencings of DNA molecules.” This problem has been resolved here with the 100% complete nuclear genome of C. merolae, approximately 50 years after the Watson & Crick’s DNA model. With the mitochondrial and plastid genome sequences previously determined by the Japanese community, all genetic information in a eukaryotic cell is described as concrete sequences of nucleotides constituting 20 linear (nuclear) and 2 circular (organelle) DNA. Therefore, this manuscript will also represent an important epoch in human history.


Figure 1

Figure 1. Epifluorescence microscopy of the ultrasmall, hot-spring red alga Cyanidioschyzon merolae 10D.

All previously reported eukaryotic nuclear genome sequences have been incomplete, especially in highly repeated units and chromosomal ends. Because repetitive DNA is important for many aspects of biology, complete chromosomal structures are fundamental for understanding eukaryotic cells. Our earlier, nearly complete genome sequence of the ultrasmall, hot-spring red alga Cyanidioschyzon merolae 10D (Fig. 1) revealed several unique features, including just three ribosomal DNA copies, very few introns, and a small total number of genes. However, because the exact structures of certain functionally important repeated elements remained ambiguous, that sequence was not complete. Obviously, those ambiguities needed to be resolved before the unique features of the C. merolae genome could be summarized, and the ambiguities could only be resolved by completing the sequence. Therefore, we aimed to complete all previous gaps and sequence all remaining chromosomal ends, and now report the first nuclear-genome sequence for any eukaryote that is 100% complete.


Our present complete sequence consists of 16 546 747 nucleotides covering 100% of the 20 linear chromosomes from telomere to telomere. These 20 linear nuclear chromosomes plus the two circular organellar DNA molecules (Ohta et al. 1998, 2003) comprise the entire genome of the organism, and contain 16 728 945 base pairs (Table 1). The 100% complete nuclear genome represents the simple and unique chromosomal structures of the eukaryotic cell. We have unambiguously established that the C. merolae genome contains the smallest known histone-gene cluster, a unique telomeric repeat for all chromosomal ends, and an extremely low number of transposons.

Genome / Chromosome No. of nucleotides(bp) Shape of chromosome No. of protein-coding genes
Nucleus [1]      
1 422 616 Linear 102
2 457 013 Linear 125
3 481 791 Linear 144
4 513 455 Linear 140
5 528 682 Linear 161
6 536 163 Linear 131
7 584 452 Linear 173
8 739 753 Linear 213
9 810 151 Linear 231
10 839 707 Linear 247
11 852 849 Linear 236
12 859 119 Linear 258
13 866 983 Linear 249
14 852 727 Linear 256
15 902 900 Linear 265
16 908 485 Linear 261
17 1 232 258 Linear 355
18 1 253 087 Linear 360
19 1 282 939 Linear 384
20 1 621 617 Linear 484
Total 16 546 747   4 775
Unassigned 0   0
Plastid [2] 149 987 Circular 208
Mitochondrion [3] 32 211 Circular 34
Total of 3 genomes 16 728 945   5 017

Table 1. Key features of the 22 chromosomes constituting the three genomes of the hot-spring red alga Cyanidioschyzon merolae 10D

Nozaki et al. (2007)
Ohta et al. (2003)
Ohta et al. (1998)

Conclusions and Perspectives

Figure 2

Figure 2. Comparison of the nuclear genomes of Cyanidioschyzon, Ostreococcus (an ultra-small green alga) and Arabidopsis (a flowering plant). Asterisks above the generic names indicate telomere repeat sequences. Number of protein-coding genes in the nuclear genome of Schizosaccharomyces pombe increased to 5004.

By virtue of these attributes and others that we had discovered previously (Matsuzaki et al. 2004), C. merolae appears to have the simplest nuclear genome of the non-symbiotic and mon-pathogenic eukaryotes (Fig, 2). Three kinds of genomes are found in many eukaryotic cells: nuclear, mitochondrial, and plastid. Based on the present nuclear genome data and the previously published mitochondrial and plastid genome sequences (Ohta et al. 1998, 2003), all major types of eukaryotic genetic information are present in C. merolae. In addition, as revealed by the present 100% complete genome, C. merolae contains unusually simple sets of genes and sequences (Fig. 2). For example, because almost all protein-coding nuclear genes of C. merolae lack introns (Fig, 2), the complete sequence of the genome provided here can be used directly to deduce the sequences of all of its proteins, which will make it extremely valuable for future proteomics research. Therefore, C. merolae represents an ideal model organism for studying the fundamental relationships among the chloroplast, mitochondrial and nuclear of genomes. The complete nuclear genome sequence reported here will greatly improve the precision of biological analyses of C. merolae, including studies of chromosome structure and gene structure/annotation. Furthermore, because C. merolae inhabits hot springs (45°C), most of its proteins must be unusually heat-stable, and so its proteome may well provide important new insights into the structural basis for heat stability of proteins.

Literature Cited

  1. Matsuzaki et al. (2004) Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428:653-657.
  2. Ohta et al. (1998) Structure and organization of the mitochondrial genome of the unicellular red alga Cyanidioschyzon merolae deduced from the complete nucleotide sequence. Nucleic Acids Res. 26: 5190-5298.
  3. Ohta et al. (2003) Complete sequence and analysis of the plastid genome of the unicellular red alga Cyanidioschyzon merolae. DNA Research.10: 67-77.

Information for Publication of the Paper

BMC Biology (The Open Access Publisher "BioMed Central") volume 5: 28.
Published on 10 July 2007. "Provisional PDF" can be downloaded freely from the website of BMC Biology, and formal versions will be published soon. Abstract is available in PubMed.
The paper has been highlighted as Research highlights "First complete eukaryotic genome" in the websites of BMC Biology and the Open Access Publisher "BioMed Central".
A 100%-complete sequence reveals unusually simple genomic features in the hot-spring red alga Cyanidioschyzon merolae.
Nozaki, H., Takano, H., Misumi, O., Terasawa, K., Matsuzaki, M., Maruyama, S., Nishida, K., Yagisawa, F., Yoshida, Y., Fujiwara, T., Takio, S., Tamura, K., Chung, S. J., Nakamura, S., Kuroiwa, H., Tanaka, K., Sato, N. & Kuroiwa, T. A