DNA Microarray and Bioinformatics Technologies : A mini-review

This bibliographic review gives a concise overview of how DNA microarrays became one of the most important recent technologies. In brief, DNA sequences acting as probes are fixed on tiny slides in arrays to identify the presence of tagged fluorescently nucleic acids from a certain sample. The nucleic acids having complementary sequences with probes will hybridize on the array slides and because they are labeled, they are detected easily using specialized scanner and software. According to the way DNA microarrays are manufactured, there are three main types including spotted arrays, in situ synthesized arrays, and self-assembled arrays. Bioinformatics, which is the use of mathematical and computational methods, plays a crucial role in this technology as it is used for designing specific primers and probes suitable for each sample that is investigated. Moreover, although DNA microarrays are widely used to determine gene expression levels, they are also used in single nucleotide polymorphism (SNP) genotyping. For this application itself, more than one approach can be used including allele discrimination using hybridization, array primer extension assay (APEX), and Illumina infinium assay. The APEX approach was discussed in more detail as it was found that to be useful in screening for inherited genetic diseases such as Stargardt disease. Finally, this review was ended with short discussion of some DNA microarray limitations arising hybridization kinetics as well as difficulties in designing arrays for highly variable genomes.


INTRODUCTION
An "array" means a particular, ordered display of a certain thing, and this term is used to address a remarkable recent technology where DNA sequences are fixed on microscope slides or chips in arrays by forming chemical bonds.The purpose of these "DNA arrays" is to identify if tagged fluorescently nucleic acids in a certain biological sample are present or not along with their abundance [1].These nucleic acids hybridize with the DNA molecules on the array slides through Watson-Crick duplex configuration and because they are labeled, they are detected easily.As a result, thousands of DNA molecules bonded on the micro-slide are tested simultaneously [2,3].Bioinformatics, which means the use of computer science in managing the biological information, play a crucial role in this technology as it provides the post-analysis of DNA from target sample such as sequencing for example [4,5].In fact, it is meaningless to carry out DNA microarray experiments without the use of bioinformatics.
The early history of DNA arrays has begun with the idea of colony hybridization method suggested by Hogness and Grunstein in 1975.These scientists have randomly cloned DNA into E. coli plasmids and used nitrocellulose filter to cover agar Petri dishes plated with transformed cells [6].The colonies' DNA was denatured and fixed on the filters after lysis generating collection of DNA cloned fragments.Consequently, these were hybridized with radio-labeled probe of interest which enabled them to screen thousands of colonies contained the cloned DNA easily [6].Based on this approach, in 1979, another scientist called Gergen managed to construct a mechanical device that replicated numerous micro-titer plates on agar eventually generating arrays of 1728 colonies only in a 26 × 38 cm area.His protocol also included the transfer of colonies on filter paper followed by lysis of cells, denaturation and fixing of the DNA on the filter, which permitted the creation of DNA arrays on filters that was re-used many times [3,6].This was a great advantage as saved a lot of time and effort.
Later, the protocols developed by Gergen were automated using robotic systems by Patrick Brown and his research group in 1995.They managed to measure the gene expression of 48 Arabidopsis thaliana genes using cDNA derived from PCR products and printing the array using a constructed robot [6,7].By the next decade, florescent detection has became part of the DNA array technology in which gave another advantage as the visualization of hybridized DNA was easier and read quickly by the system [2,8,9].Moreover, further research was carried on this remarkable technology and gradually, its protocols were improved such as the use of very long sequences in the beginning did not yield much success [7].Thus, using shorter oligos of 25-60 bp has resulted in higher specificity as these were designed to bind to target sites in the gene that were most divergent from other sites or genes [6,9].In this bibliographic review, we will be discussing basic principle of DNA microarrays and their main types, the role of bioinformatics in this technology, and one of its most important applications as well Proc.Nat.Res.Soc., 2, 02010 (2018) as their limitations.

EXPERIMENTAL PRINCIPLES OF DNA MICROARRAYS
As mentioned earlier, the whole technique is based on matching unknown and known DNA samples via Watson-Crick base pairing principle [1].The known DNA samples are known as 'probes' and are spotted and fixed on microscope glasses or silicon chips in thousands.These can be oligonucleotides, cDNA, or even just DNA [2,4].On the other hand, the unknown DNA samples are the ones we want analyze their gene expression level for example and are tagged using reporter molecules like fluorophores which replaced radioactive molecules due to their potential health risks [1].If our target sites in the genes are found, they will form complementary bonding with the probes and emit fluorescence signal which can be read by specialized camera and computer systems.Nevertheless, we will discuss in more detail the main steps in measuring gene expression levels in a biological sample using DNA microarray technology [2,7].This includes sample preparation and tagging, hybridization, washing, image acquisition and normalization.

Sample Preparation and Tagging
First, mRNA has to be extracted from the biological sample of interest and purified.A control must be included in the experiment as well (e.g.diseased tissue vs. healthy tissue).Next, the tagging involves performing reverse transcription reaction to synthesize complementary DNA (cDNA) strand [10].In this method, poly T primer is attached to mRNA to start the reverse transcription process from the polyadenylation signal at the 3' un-translated site (UTR) of the mRNA.A proportion of the nucleic acids: dATP, dGTP, dCTP, and dTTP added in this reaction are incorporated with a florescent dye (e.g.only dCTP labeled with Cy) via covalent bonding [11].Diseased and healthy samples can be also tagged with different dyes such as Cy3 (Excited by a green laser) and Cy5 (Excited by a red laser) to distinguish between them and are used in the same microarray (Figure 1).Thus, cDNA probes complementary to the tagged transcripts will hybridize and eventually visualized as colored spots under the camera.One reason why reverse transcription was initiated at 3'UTR is that it is considered as the most variable site in genes, which gives an advantage of better specificity when designing probes [12,13].
Moreover, another way in which the cDNA can be labeled using reverse transcriptase is by the use of DNA polymerase I Klenow fragment and random priming [14].In this reaction, the cDNA that is primed with random primers is extended using Klenow fragment along with the presence of tagged dCTPs.The product of the reaction will include short labeled transcripts that are complementary to both strands of the gene [13,14].This helps researchers to check for cross-hybridization on the arrays.

Hybridization
In this step, the DNA probe on the micro-slide glass and the tagged target cDNA will pair according to Watson-Crick configuration [4].This can be accomplished either manually or using robotics system.In the first approach, the array is placed in a special chamber where the researcher injects the solution containing the target cDNA onto the array under sterile conditions and incubates it at certain temperature for 12 to 24 hours [1,15].According to the second approach, everything is performed by a programmed robot which saves time and effort, performs the protocol at specialized station, and grants a better control over the temperature that is usually between 45 and 65°C [16].It is also important to mention that hybridization is affected by many conditions such as salt concentration, temperature, formamide concentration, humidity, and amount of target solution [17].For example, higher temperature and lower salt concentration will increase stringency meaning that only specific strands will hybridize.In order to limit or prevent cross-hybridization, a repetitive DNA sequence and poly T or poly A can be added to mask the genomic repeat sequence and the polyadenylation sites on the cDNA respectively [18].

Washing
Eliminating excess hybridization solution from the microarray is one of the reason why this step is crucial as it makes sure that only tagged target cDNA that we want to measure is bound on the microarray [19].In addition, washing raises the stringency by limiting cross-hybridization.Researchers can use low salt concentration solutions that contain 0.1× SDS, and 0.1× standard saline citrate (SSC).Many automated hybridization stations may include a washing cycle as part of the whole process [20].

Image Acquisition
This can be considered as final step in the experimental process where an image of the results is taken.Because the target bound cDNA is tagged with florescent dyes, these can be excited by a light of a suitable wavelength giving certain colors, as shown in Figure 2 [1,21].Thus, microslides are placed under scanner having two lasers (For example to excite two different dyes for diseased and healthy tissues) to be read [4,8].For better accuracy, the optics are shifted on the whole slide to read every point on the microarray as well as setting pixel size (Represents size of the physical space) to be same as the laser spot to ensure that light read is not coming from neighboring spots on the microarray [1,21].

Normalization
Keeping in mind that there might have been some errors arising from image acquisition, one could carry a process to correct for bias within microarrays before final analysis [5].This is known as normalization and it serves as 'calibration' to remove systematic variations between samples.Some of the several inconsistencies could include different scanner settings, hybridization properties, and dye efficiencies [22].These have a great effect on experimental results and therefore may result in misleading conclusions about DNA analysis.Although there are various normalization methods, LOWESS is commonly used and it detects systematic variations by linear regression as function of the log10 (In case of intensity) and balance the observed ratio with best fit average long2 (In case of ratio) [23].After normalization, expression ratio, which is the normalized value of expressed gene over that of the control, can be calculated using the following formula: T i = R i /G i , where i represents the gene, and R (Red) and G (Green) representing target and control respectively.If we use T i =log 2 (R i /G i ) instead, this will expands the dynamic range of gene expression level signals [5,23].

COMMON TYPES OF DNA MICROAR-RAYS
DNA microarray can be classified according to the type of technologies that are making them.Although there two main technologies: spotted microarrays and in situ synthesized microarrays, another type called self-assembled microarray can be mentioned as well.

Spotted Microarrays
This method was proposed by Derisi in 1996 and it is considered to be the first microarray technology.It is mainly based on robotic spotters that are able to spot DNA from a single micro-titer dish into numerous glass slide microarrays through dipping a slotted pin [2,6].However, the DNA probes are synthesized before spotting.The use of glass slides allows the sample to be easily fluorescently labeled.As mentioned earlier, detection using fluorescent tagging provided a number of advantages compared with the chem-illuminescent or radioactive labeling used for filter based microarrays [24].These advantages include: fluorescent labeling is quite sensitive, less expensive, and requires less complicated protocol than other two labeling methods [25].Moreover, fluorescent labeling allows researchers to label two or more samples using different colors for each and co-hybridize the tested samples on the same microarray [26].

In Situ Synthesized Microarrays
This was initially proposed by Fodor in 1991 and unlike spotted microarrays, the DNA probes are built base by base on the glass microarray surface [3].Each nucleotide added to the probe has a protective group on its 5' position to avoid adding more than one base during each step of synthesis.Then, using acid or light approach, the protective group can be converted to a hydroxyl group [27].In 1994, Fodor and his colleagues developed the Affymetrix technology which is based on the latter approach.This can be done by photolithography technique in which light is directed only at some areas in the microarray with help of a mask [28].However, each step of synthesis requires a unique mask and they are very expensive to produce.One main of the Affymetrix technology over spotted microarrays is that synthesizing DNA sequences directly on the surface requires fewer reagents [3,27].
In 1996, another group of researchers lead by Blanchard has come up with inkjet array technology.Their methodology uses chemicals to convert the protective groups and at each step of synthesis, nucleotides are fired onto the target spot using nozzles, the same ones found in inkjet printers [29,30].A major advantage of this technology over the Affymetrix one is that a computer input system controls the synthesis of the oligonucleotides in each microarray.Although this is very flexible, it is less efficient when making great number of identical microarrays [29,30].

Self-assembled Microarrays
This is another technology that is based on synthesizing DNA on minute polystyrene or silicon beads of about 3 μm [6,9].These are then deposited randomly on silicon wafers or optical fibers containing arrays of micro-wells in which each catches a certain particle and hold it in place via strong adhesion forces, as illustrated in Figure 3 [2].A researcher at Tufts University in the United States called David Walt invented this unique approach, which was later licensed by the Illumina Company, which has a leading position in manufacturing arrays and oligonucleotide analysis [9].
In addition, it is possible and essential to map the bead location by 'decoding' method as different beads can be used in which each is for certain oligonucleotide sequence [6,31].This process involves series of hybridization as well as washing steps, allowing fluorescently tagged complementary sequences to attach to their specific sequence on the bead and consequently track the location of the bead type [2,9,31].Another advantage is that there is much control for each microarray feature [9,31].Later, the Illumina Company manufactured etched glass surface to hold beads instead of silicon option microarrays [6].Although these three main types of microarray technologies were discussed above do not cover all DNA microarray history, they do dominate the methodology and other new advances in their manufacturing.

The Role of Bioinformatics in DNA Microarrays
Bioinformatics is use of mathematical and computational methods in managing the biological information, and it is one of the most recent fields of biological research [1].It is widely useful at different stages of microarray interpretation and design e.g.sequence search, alignment, and analysis (e.g.melting temperature prediction using closest thermodynamic model) as well as oligonucleotide design for probe applications and PCR primers [1,5].For instance, the website https://www.ncbi.nlm.nih.gov/pmc/ which belongs to National Center for Biotechnology Information can be considered as public domain for bioinformatics that publishes series of databases relevant to biomedicine and it is an essential source for bioinformatics tools such as GenBank and PubMed [3,6].Accordingly, using available sequence information for a microbe of interest, one can perform genome systematic analysis to discover optimum probe and primer choices under specified conditions [2].Other analysis could include sequence alignment when many strains are expected to be found in the sample and computerized analysis to detect if PCR primers self-anneal and form hairpins in order to maximize the amplification yield [1].
Using these tools, the researcher can also compare a designed probe, which is a short sequence, against genome sequence, which is a longer one.The reason for this is to confirm that there is no important similarity exists or to find a close match [3,7].This can be performed using local alignment search algorithms such as LAlign and BLAST, which are fast and freely available.In addition, in multiple alignment all sequences are aligned to each other and then a sequence that provides the best alignment is chosen [1].In contrast to local alignment, multiple alignment uses a set of rules to generate an alignment score.The programs look for high scoring segments between the input sequences.These are extremely helpful in microarray design such if sequences representing the target organisms are entered, the alignment can refer to conserved regions having slight sequence variability and also to diverged ones [2,3].
Conserved regions can be used to synthesize primers that could anneal to all targets and thus allowing amplification of several targets using a one primer set [5].Primers are designed using computer software which can efficiently list all primer sets that meet a certain design criteria for the entered target sequence.One of the most common used software is Oligo Design, and it automatically evaluate 3'end stability [1], melting temperature, and self-annealing as the program looks for a target gene sequence.Another tool for this software is that entered primer set is evaluated for potential primer dimers and if it finds the amplicon [1,5].
Designing the probes is the next step after identifying target genes and this depends on a number of variables such as melting temperature which itself depends on probe's length [1,4,6].For short oligonucleotides probes, the length may vary from 20 to 100 nucleotides long whereas for cDNA sequences or PCR products, it can vary from 100 to 1000 nucleotides long.The length of the probe is chosen according to hybridization specificity and selectivity as well as the microarray cost [1].
Moreover, the melting temperature also depends on the GC content, DNA concentration and salt concentration.Usually, all of the probes on a particular microarray must preferably have the same melting temperature [2,4].However, this might be problem for in situ array printing technologies (Use fixed length probes).Once again, temperature matched probes are designed from the start with aid of bioinformatics such that computerized exploration of a gene, with specified temperature constraints, will generate a set of probe choices that are expected to yield consistent intensity hybridization if the target gene is present in the sample [1].

APPLICATIONS OF DNA MICROAR-RAYS
The technology of DNA microarrays is widely used to measure levels of gene expression.This was discussed in detail in this review in terms of methodology and different approaches used by researchers when dealing with different sources of DNA and RNA [7].Other applications can include genotyping where scientists can detect single nucleotide polymorphisms (SNPs), which are type of genetic variation among people due to a difference in single nucleotide [32].
Many human diseases such as β-thalassemia, sickle-cell anemia, and cystic fibrosis are caused by SNPs.These are also useful in characterizing allelic diversity of certain genes and mapping of genomic loci [33].In this case; allele specific oligonucleotides are used as probes and this is can be accomplished using different assays.First, using Affymetrix microarrays, one can discriminate alleles by placing oligonucleotides that are complementary to each allele on the microarray and labeled target DNA is hybridized to it [34].
However, the position of the variant nucleotide is placed (Around 25 bp) in the center of the oligonucleotide as this has most impact on hybridization (Figure 4).Usually, using multiple microarray positions for each allele gives better signal and more accurate results [2,35].Secondly, using APEX (Array primer extension), DNA is attached on the microarray through the 5' end whereas the 3' end is one nucleotide short of the SNP.When the target DNA is hybridized to the microarray, the oligonucleotides is extended using single nucleotide "dye terminator" sequencing reaction [36].
Last of all, using another assay similar to APEX called Illumina's Infinium assay, the oligonucleotides extension takes place on beads and added nucleotides are tagged with specific haptens instead of florescent molecules.These are small molecules which can bind to fluorescently tagged proteins and thus eventually detected [37].All of these techniques have been very successful and are extremely used in the field of SNP genotyping.

Array Primer Extension (APEX) Assay
As mentioned previously, one of the various approaches in genotyping single nucleotide polymorphisms (SNPs) is using Array Primer Extension Assay, also known as APEX [38].This is more beneficial than other laboratory assay as it contributes to lower reagent costs due to small reaction volumes, e.g.small slides.The main principle of APEX is that oligonucleotides are placed on the microarray glass slide through their 5' end and complementary PCR amplified fragment from DNA sample is annealed to the oligonucleotides [39].
Subsequently, DNA polymerase extends the 3' ends of primers with dye tagged nucleotides (e.g.ddNTPs) via sequence specific single nucleotide extension [32].Then,  Proc.Nat.Res.Soc., 2, 02010 (2018) during the washing step, ddNTPs that are not incorporated and sample DNA fragments are removed, as shown in Figure 5.After signal detection, the nucleotide being typed is the dye tagged nucleotide (e.g.T) bound to the oligonucleotide on the microarray slide [3,39].
On the other hand, the previously discussed microararay genotyping approach can be used for the ABCA4 gene, which causes an inherited retinal disease known as Stargardt disease that leads to progressive vision loss [40].This gene is extremely hard target for diagnostic applications as it contains 50 exons and there are more than 450 known mutations [3,40].In a study involving 136 confirmed Stargardt samples, which were also screened by single-strand chain polymorphism (SSCP), the microarray screening managed to detect many additional alleles (70% of all disease-associated alleles) as opposed to SSCP screening which had detected only 55% of all disease associated alleles [3,40].This means this method was efficient in screening for known variants similar to direct sequencing and therefore can be used to pre-screen patients with suspected ABCA4 associated retinal diseases.

LIMITATIONS OF DNA MICROARRAYS
Even though DNA microarrays have been very useful in various applications, just like any other invention, their technology has a number of limitations.First of all, because of hybridization kinetics, the signal given at a microarray location is not linearly proportional to species concentration that are hybridizing to the microarray unless at limited range of concentrations [6,15].This means that at low concentrations there will be no binding and at higher ones the microarray will become saturated.
Secondly, if researchers are dealing with several DNA sequences that are related (e.g.mammalian genomes), it is hard and sometimes even impossible to design microarray as they might end up binding to same probe on the microarray [32].For example, a probe designed to detect gene X, can also detect other genes that have highly similar sequence to it.This is a huge problem for genes with multiple splice variants and gene families [41].Last of all, each microarray is designed to detect only certain DNA sequences [42].In case of gene expression analysis, if a gene is not yet included to the genome it will not be detected on the microarray [42].
Furthermore, microarrays are also designed based on genome information of a reference strain when dealing with highly variable genomes, which are characteristic to bacterial species [43].For a given isolate of the same species which may possess more than one strain, these microarrays may miss a large fraction of the genes [43].

CONCLUSION
Last but not least, despite the few limitations of DNA microarrays, one cannot disregard their important applications such as measuring levels of gene expression and genotyping in a various fields of biological and biomedical sciences.Based on simple principles such as hybridization, florescence labeling, and thanks to the role of bioinformatics that it plays in this technology, the DNA microarrays has been developed into numerous types.
Thanks to scientific research, some of these types such as APEX were later suited for certain applications due to a few differences and adjustments in the protocols that gave advantages over other conventional laboratory methods which were costly and time consuming.Although there might be other technologies such as DNA sequencing that could replace DNA microarrays in the future in case of genotyping, working with these tiny 'chips' over that past 20 years has resulted in essential discoveries regarding other research questions and purposes.Therefore, it is worth focusing on developing this technology that could be beneficial in other scientific fields.

Notes
The author declares no competing financial interest.

Figure 1 .
Figure 1.A summary of the DNA microarray protocol where the cDNA of the control sample (Healthy tissue) is labelled using green dye Cy3 and the experimental sample (Diseased tissue) is labelled using red dye Cy5 [4].

Figure 2 .
Figure 2. Microarray scanner that excites dyes incorporated into hybridised molecules on the microarray surface.The fluorescence of the dye is measured by a PMT and then converted to a digital signal [1].

Figure 3 .
Figure 3.An illustrated summary of self-assembled microarray technology protocol.This is also known as bead based microarrays [2].