A single byte (or 8 bits) can represent 4 DNA base pairs. In order to represent the entire diploid human genome in terms of bytes, we can perform the following calculations: 6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!
How big is a genome file?
A binary alignment/map (BAM) file — which contains the sequences, base qualities, and alignments to a reference sequence — for a 30x whole genome is about 80-90 gigabytes in size. The BAM files for a modest sample size (1,000) might consume 80 terabytes of disk space.
You May Like Also