Genome assembly refers to the process of taking many small pieces of genetic sequence and merging them together into a coherent whole that represents an organism's entire genome. This is a major focus of the bioinformatics field, and a variety of genome projects exist for this purpose. Genome assembly has been used to begin analyzing the genomes of many species, including humans, plants, animals, and bacteria.
Analyzing an organism's genes is a long process, and genome assembly is one of the first steps. Many other analysis methods are built on successful assembly, and identification of genes cannot progress without it. Even before genes are found a successful genome assembly can still generate a lot of useful information for later analysis, including the size of the genome, its structure, and its general composition.
The process of genome assembly is like putting a jigsaw puzzle together without having a picture or useful shapes as a guide. When confronted with the first genome pieces, called raw reads, there are rarely any indications where a particular piece goes, or even how it's oriented. Every piece is coded similarly with the four DNA bases, abbreviated A, C, G, and T. The genome could be compacted into one large chromosome or split into many. There is also no guarantee that some of the raw reads are not duplicates of the same genome area, which would mean that less unique information exists than it appears at first glance.
General knowledge of genome structure is invaluable when starting the assembly process. Although genomes between species are markedly different, there are certain rules that specific genome types follow, and these can be applied when putting another genome of that same type together. For example, if a certain type of organism always has a particular pattern nearby where genes are found, one could reasonably assume, when assembling another organism similar to it, that finding such a pattern would signal a gene nearby. On a larger scale, many bacterial genomes have one circular chromosome, so it would be reasonable to anticipate that all the raw reads of a new bacteria would somehow fit together on one chromosome. Applying general genetic knowledge in this way can allow a researcher to start making sense of potentially hundreds of thousands of pieces of data.
There are many other methods that can be used in genome assembly, including computational predictions and manual comparisons. No matter the method, genome assembly is a large job which is often time-consuming and difficult. Since it is the basis for many future genetic analyses on an organism, there is little room for error.