Metagenomics (MG) can be defined as the study of combined genomes in an environment containing containing a mixture of organisms. The evidenced advantage of metagenomics are best understood through a global initiative for the surveillance of emerging and re-emerging infectious diseases. The aim being to analyse and understand the transmission pathways and reservoirs of pathogens. It is intended to be a sequence-based analysis of genomes contained with an environmental sample. This blog is meant to provide you with a brief insight into the factors that are taken into consideration prior to undertaking a metagenomic study, introduction to 16S rRNA gene profiling, Whole genome sequencing and the dynamics of a single cell genome. Attached you would find a glossary for ease of access to understand by definition the meaning, as and when implied in the course of this blog.
MG can be viewed as a flowchart of a project from the development of a research questions to the final data reporting and sharing stages. Each stage can be categorized into three separate divisions, namely: (1) Design phase, (2) Data extraction phase, (3) Data analysis phase.
The design phase aims to connect a researcher from identifying a core research question to guiding him through to developing a study design and acquiring the samples (up to whatever count that he/she may deem as appropriate) for isolation the genome and analysis of the variables. The design phase is the most crucial step and no experiment can be carried out without developing a strong study design and research question. This phase requires the candidate to have attained the necessary approvals for testing and isolation of DNA/RNA from animal cells prior to initiating the experiment and mandates that the candidate have adhered strictly to the local standards with consented approval from regulatory bodies (and patients, with feedbacks). The handling and storage of samples is as deemed appropriate by the researchers in accordance with the timeline of their project to prevent damage to the acquired samples or the degradation of the DNA. MG is greatly useful in the study of antimicrobial resistance genes (AMR).
The extraction phase is the utilisation of the sample isolates to prepare a library and then sequence them based on short or long reads of the sequence. A form of read sequencing is shotgun metagenomics, which implies the sequencing of random DNA fragments in a microbial community as an alternative to amplicon* sequencing. Once the final analysis is complete, analyse the data for potential contamination or any other errors. It is then moved forward to develop a statistical model for further investigation under the analysis division of metagenomics.
The final division of data analysis can be split in two based on the approach taken by the researcher. Either methods lead to the acquisition of large subsets of data that are arranged categorically to generate a larger dataset, metadata or epidata. While some procedures directly move from sampling to data storage, like in the event a pre-discovered disease is catalogued wherein the incidence is reported and data from the same is archived. While in other instances, wherein a new variant or infectious disease is discovered, the data is moved further for epidemiological analysis for reporting and documentation.
This article will only deal with the preliminary facts of metagenomics and discuss the challenges in brief.
The use of MG is most common in AMR surveillance and is used commonly to detect and quantify genes associated with AMR. AMR hinders with treatment strategies often leading to complications that could lead to death of a patient if not identified and treated early. Additionally, AMRs are known to increase duration of treatment and overall costs. AMR's can be (1) Intrinsic, wherein the production of the enzymes that inactivates the drugs are innate or affect the affinity of drug targetting or prevent the drug from accessing the target, or (2) Acquired, Wherein mutational changes lead to the uptake of new genes that alters the microbial resistance. In a bid to improve early surveillance of AMR, to guide policy changes and to understand the impact of AMR treatment, MG is a strong and valuable tool that is cost effective and reduces the overall time by increasing data sharing and cross oceanic collaborations through a standardised language of data sharing. MG additionally also enables the pooling of known samples which can provide better surveillance results against the true status of population. A concern here is of specificity vs sensitivity.
In analytical research, High precision and high accuracy is valuable but a fear I am certain any researcher would share with me is having a low accuracy, high precision result. In simpler terms, when answering a research question, it is better to know that your answers are wrong because they are not where they should be despite multiple attempts showing a scattered result. But if you found that your results were consistently at the same point and unable to prove that it is the right or the wrong result can leave a researcher blue faced. Identifying a false positive or a false negative is crucial in that regard and metagenomics helps by providing different strategies to identify likely results by weighing the specificity against sensitivity in such scenarios. Specificity (SP) can be defined as the divisible product of the sum of true negatives(TN) and false positives(FP) against the True Negative (TN) [SP=TN/(FP+TN)]. While sensitivity (SY) can be defined as the True Positive divided by the sum of true positive(TP)and false negatives(FN) [SY=TP/(TP+FN)]. Specificity is identifiable in the absence of the true status of population while sensitivity is observed in the presence of the samples of interest.
A recommended segment of MG analysis is Sequencing platforms. Here they are discussed as two sets of short read techniques and long read techniques.
Short read techniques: Example-Ion Torrent, Roche 454 and Illumina sequencers. All of which are designed to sequence short fragments of nucleic acids using a "sequencing by synthesis" strategy. A crucial difference why Illumina is preferred over 454 and Ion torrent is because they use a method of releasing a signal when encountering a nucleotide addition. The signal strength determines the type of nucleotide. Often when similar signal are received the devices interprets them as homopolymers (similar nucleotides in close proximity). When the frequency or repeats increases, the homopolymers cause the signals to get saturated causing the machine to either over- or under-estimate the number of similar nucleotides in that sequence, in that order. Roche 454 is no longer in use and Illumina is the preferred choice of NGS for high throughput sequencing with an error rate of approximately 0.5% with a read capacity of less than 500bp.
Long read techniques: Example-PacBio,OxNano. PacBio uses a fixed DNA polymerase, an enzyme that helps catalyse the synthesis of DNA molecules from molecular precursors of DNA. They are essential in the replication machinery of a DNA when creating the double helix. The immobile polymerase enzyme allows the transparent, deep bore well to analyse the amplicon for colorimetric differences and read long strands of nucleic acid in real time by associating specific colours to a corresponding nucleotide. the camera on the upper end of the well detects and reports them for display. The OxNano on the other hand is a pocket drive sized device that can read very long sequences of up to 900kilobasepairs (KBs). They utilise small DNA fragments embeded inside the pore, beneath the motor protein to read opened DNA helixes and report the sequences. they have an error margin of up to 8% as opposed to PacBios 15% (in a single pass).
The scientific community has come a long way in the last 5 decades with modern machienry for diagnostics and visualisation playing a key role in aiding clinicians, academics and researchers alike on the prospects of developing better surveillance strategies to target infectious diseases and treat other non infectious diseases. Metagenomics will play a key role in the years to come with an increased focus on achieving high standards of resolution and clarity in biomedical devices and making their availability at cheaper costs a "unum numero prioritas"
Glossary:
- Sensitivity: Ability of a test to accurately identify
- Specificity: Ability of a test to accurately exclude
- Accuracy: To measure true levels
- Precision: Ability of a test to give similar results in concordant determinations of a study on the same sample
- Genetic: Study of individual genes
- Genomic: Study of structure, functions and evolution of genes
- Metagenome: Recovered genetic samples
- Antibiotics: substance produced by micro-organisms to inhibit or kill other microorganisms in diluted concentrations.
- Antimicrobial: Negatively affect microbial life (antibiotics are antimicrobial but not vice versa)
- 16S rRNA Gene: A small subunit used to reconstruct phylogenies due to their slower rates of evolution of specific regions of a conserved genome.
- Library preparation: Preparation of genetic material into a more compatible form for sequencing instruments to operate with.
- Long read sequencing: Sequences that are greater than 10 kilo base-pairs
- Short read sequencing: Sequences that are shorter than 600 base pairs
- NGS: Modern sequencing technologies to read many DNA sequence fragments in parallel with high throughput potential and increased accuracy in comparison to sanger sequencers.
- Aerestrup, 2016
- Quince, 2017
- Knudsen, 2016
- Kirstahler, 2017
- Kirstahler, 2018
- Wright, 2007
- Sharma, 2016
- Forbes,2017
- Van Dijk, 2014
- Jones, 2015
- Bowers, 2015
- Special thanks to Professor Sunje Johanna Pamp, University of Denmark
Comments
Post a Comment