dna
Table of Contents
DNA and genes
see also:
Introduction
- the human genome consists of over three billion nucleotide pairs which make up the 46 chromosomes in each nucleated cell
- an average chromosome contains 130 million nucleotide pairs which if put into a line would be 4cm long!
- to fit all of this DNA into the nucleus of a microscopic cell, the DNA must be tightly wound around proteins
- it is also organized so that specific segments can be accessed as needed by a specific cell type
- first level of organization, or packing, is the winding of DNA strands around histone proteins
- histones package and order DNA into structural units called nucleosome complexes, which can control the access of proteins to the DNA regions
- If a gene is to be transcribed, the nucleosomes surrounding that region of DNA can slide down the DNA to open that specific chromosomal region and allow access for RNA polymerase and other proteins, called transcription factors, to bind to the promoter region and initiate transcription
- as DNA is negatively charged, changes in the charge of the histone will change how tightly wound the DNA molecule will be
- unmodified, the histone proteins have a large positive charge;
- by adding chemical modifications like acetyl groups, the charge becomes less positive
- epigenetic modification of histone protein genes thus can affect DNA transcription
- If a gene is to remain turned off, or silenced, the histone proteins and DNA have different modifications that signal a closed chromosomal configuration
- the histone proteins movement is dependent on signals found on the histone proteins.
- These signals are “tags” – in the form of phosphate, methyl, or acetyl groups – that open or close a chromosomal region
- these tags are not permanent, but may be added or removed as needed
- cohesin is a protein that forms a ring-shaped complex which wraps and alters the DNA molecule shape. It moves through the DNA and creates specific loops in the genetic material which determine the architecture of the genome and gene expression. The NIPBL protein (cohesion loading factor) protein, bound to the MAU2 protein, enables cohesion to bind to specific points on the DNA known as gene enhancers. These genomic regions are DNA sequences where binding to transcription factors, such as members of the nuclear receptor superfamily, takes place. The NIPBL protein interacts with both the MAU2 protein and the glucocorticoid receptor (GR) - This NIPBL-MAU2-GR ternary complex modulates the transcription, since it facilitates the interaction of the glucocorticoid receptor (GR) with NIPBL and MAU2, which is the cohesion-loading factor. When the GR interacts with these two proteins, it alters the structure of chromatin and affects the process of gene expression 1)
- the majority of DNA is made up of non-coding regions
- these do not result in the creation of proteins however, when mutated, do have the potential to create novel “orphan” proteins and it has been shown that a large proportion of these can have folding structures which are important for them to have a function
- over half of our genomes consists of thousands of remnants of ancient viral DNA, known as transposable elements (TE)
- TE sequences contain cis-regulatory elements that enable recruitment of transcription factors (TFs) and chromatin remodelers, which in turn regulate and initiate transcriptional activity.
- many of these are reactivated during the first hours and days following fertilization and become part of the EGA transcriptome
- a hallmark of preimplantation development is embryonic genome activation (EGA), during which the embryo transitions from inherited maternal transcripts to genes transcribed from its own genome. EGA coincides with extensive reprogramming of both parental chromatins, as histone modifications are reestablished and TFs regain binding.2)
- it seems each mammalian species expresses distinct types of these elements 3)
- ~8% of our genome comprises sequences called Human Endogenous Retroviruses (HERVs), which are products of ancient viral infections that occurred hundreds of thousands of years ago.
- a set of specific HERVs expressed in the human brain contribute to psychiatric disorder susceptibility. In 2024, it was discovered five robust HERV expression signatures associated with psychiatric disorders, including two HERVs that are associated with risk for schizophrenia, one associated with risk for both bipolar disorder and schizophrenia, and one associated with risk for depression 4)
- 24 hr rhythmicity in gene expression is ubiquitous: it is present in all eukaryotic and some prokaryotic species and in nearly all tissues, organs, and cells of these organisms. This rhythmicity in gene expression generates variation in molecular and physiological processes in central and peripheral tissues to optimize adaptation to the organism’s temporal niche, e.g., nocturnal or diurnal. This rhythmicity is driven by a core circadian molecular mechanism but also by environmental, systemic, and behavioral factors, which may affect either the core circadian machinery or the expression of genes outside the core circadian machinery. Factors include light-dark and endocrine cycles, “sleep-wake” cycle, feeding-fasting cycles, immune challenges, alterations between the upright and supine posture.
- microgravity (eg. 60 days of constant bed rest with head-down tilt) severely disrupts rhythmic gene expression in humans and even 10 days later, 3/4 of the transcriptome incuding core circadian genes were still affected 5)
- assessing “Identity by Descent” (IBD) segments can detect up to 6th degree relatives (second to third cousins would be, or a great great great great grand parent)
- regeneration of limbs and organs
- this appears to be orchestrated by the Hand2 gene which in humans becomes generally inactivated after embryogenesis but which is responsible for regenerating limbs in salamanders
CRISPR gene editing
- CRISPR = “clustered regularly interspaced short palindromic repeats” found in bacteria as part of their adaptive immunity system
- in 2010, the basic function and mechanism of CRISPR-cas system has become clear.
- this system is comprised of a genetic locus with non-repetitive, spacer sequences and adjacent 6–20 genes that encode CRISPR-associated (cas) proteins.
- use in genome editing6)
- cas-9 in particular is being used to by scientists to selectively edit an organism's DNA as it allows genetic material to be added, removed, or altered at particular locations in the genome
- They create a small piece of RNA with a short “guide” sequence that attaches (binds) to a specific target sequence in a cell's DNA, much like the RNA segments bacteria produce from the CRISPR array.
- This guide RNA also attaches to the Cas9 enzyme. When introduced into cells, the guide RNA recognizes the intended DNA sequence, and the Cas9 enzyme cuts the DNA at the targeted location, mirroring the process in bacteria.
- Once the DNA is cut, they use the cell's own DNA repair machinery to add or delete pieces of genetic material, or to make changes to the DNA by replacing an existing segment with a customized DNA sequence.
- VERVE-101 permanently deactivates the PCSK9 gene in the liver by using CRISPR to change just one letter in its DNA thereby seemingly permanently halving the production of LDL cholesterol in patients with heterozygous familial hypercholesterolemia (HeFH) after one dose but with some safety concerns that will need to be addressed - this was the 1st trial of such technology in humans 7)
- a Jan 2024 paper demonstrated a new CRISPR technique that restored sight in blind mice by editing 20% of their retinal cells using a re-engineered eVLP (engineered virus-like particle to deliver the gene editing tools) and made tweaks to two parts of their prime editor: the Cas9 protein - the prime editor cargo(protein-RNA complexes) must be efficiently packaged into eVLPs when the particles form but must also be efficiently released from the particles after target cell entry, they also tested their new system in the brains of mice and found that it edited nearly 50% of the cells in the target region without making any unwanted edits - compared to a standard prime editor packed into the original eVLP, the new combo was 170 times more efficient at editing human cells and could be used to correct almost 90% of genetic conditions 8)
- bacterial roles:
- adaptive immune role:
- it is an adaptive immune system of bacteria and archaea, which protects the bacteria from invaders, including bacteriophages or phages and mobile genetic elements (MGEs).
- it degrades foreign genetic elements in three steps9):
- spacer sequence after recognition is integrated into the CRISPR array
- expression of CRISPR RNA (crRNA), in which pre-CRISPR RNA (pre-crRNA) are transcribed by RNA polymerase (RNAP) and then cleaved into the small crRNA (aka “guide RNAs”) by specific endoribonucleases
- crRNAs then recognize and form base pair specific to foreign RNA or DNA with almost perfect complementarity leading to the cleavage of the crRNA-foreign nucleic acid complex
- if there is any mutation in proto-spacer adjacent motif (PAM) or mismatch between spacer and invader’s DNA, the cleavage does not occur and the host is susceptible to infection
- bacterial pathogenicity role:
- the system also controls endogenous transcription and is involved in the regulation of bacterial pathogenicity
- for example, it may repress bacterial lipoproteins and hence improve the ability of the bacteria to escape activation of Toll-like receptor 2 (TLR2) complement inflammatory response mechanism in host phagocytic cells
- N. meningitides use cas9 for attachment to host cell surface and intracellular replication
- Campylobacter jejuni uses cas9 for attachment as well as for invasion in epithelial cells
- antimicrobial resistance role
- regulation of BLP can promote envelope integrity and resistance to certain antibiotics
- role in promoting genetic heterogeneity and new bacteria species
- it is divided into three subtypes, all having cas1 and cas2 proteins, since these two proteins play a key role in spacer:
- CRISPR-cas system type I
- has cas3 gene
- present in most bacteria and archaea
- CRISPR-cas system type II
- has cas9 gene
- present only in bacteria and is the simplest type
- CRISPR-cas system type III
- has cas10 gene
- most commonly present in archaea
- in 2023, a new search algorithm has identified 188 kinds of new rare CRISPR systems in bacterial genomes 10)
- they found several new variants of known Type I CRISPR systems, which use a guide RNA that is 32 base pairs long rather than the 20-nucleotide guide of Cas9. Because of their longer guide RNAs, these Type I systems could potentially be used to develop more precise gene-editing technology that is less prone to off-target editing.
- the new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems.
- they could also one day be used as diagnostics or serve as molecular records of activity inside cells
DNA replication
- there is a complex process to ensure DNA is replicated accurately when cells are dividing
- nearly all cells divide into just two cells, each with their own replication of the DNA
- some bacteria such as Corynebacterium matruchotii undergo multiple fission dividing into multiple cells at once and this presumably is what allows colonies that form dental plaque biofilms to grow up to a half a millimeter per day 11)
- in 1 and 2 cell embryos, DNA is replicated by uniform forked slower replication and these have higher copy error rates 12)
- by 8 cells and onwards, DNA is replicated sequentially in different regions with faster fork movement
- a molecular complex called CTF18-RFC in humans and Ctf18-RFC in yeast loads a “clamp” onto DNA to keep parts of the replication machinery from falling off the DNA strand - this is the leading strand clamp loader. There is also a the lagging strand clamp loader (RFC). 13)
- unzipping DNA's ladder-like structure, resulting in two strands called the leading and lagging strands
- a molecular replication machinery including DNA polymerases then assembles the missing halves of the strands, turning a single DNA helix into two
- DNA polymerases
- pathology:
- at least 40 diseases, including many cancers and rare disorders, have been linked to problems with DNA replication
DNA to RNA
- RNA is usually a single-stranded molecule (ssRNA) and unlike DNA, its sugar backbone is ribose, not deoyxribose, and whereas the complementary base to adenine in DNA is thymine, in RNA, it is uracil
- several types of RNA are synthesized within the cell by RNA polymerases based upon the base sequences in DNA
- messenger RNA (mRNA)
- to convey genetic information (using the bases of guanine, uracil, adenine, and cytosine, denoted by the letters G, U, A, and C) that directs synthesis of specific proteins
- non-coding RNA
- performs the function itself rather than synthezing protein to do it
- ribosomal RNA (rRNA)
- the structures which read the base sequences on mRNA and then synthesizes protein
- transfer RNA (tRNA)
- delivers amino acids to the ribosome for incorporation into protein strands
- precursor tRNAs (pre-tRNAs) require nucleolytic removal by endoribonuclease RNase P of 5′-leader and 3′-trailer sequences for maturation, which is essential for proper tRNA function
- HARP protein forms a star-shaped complex of 12 protein molecules, making it capable of cutting both the 5' and 3' ends of tRNA 14)
- small nuclear RNA (snRNAs)
- found within the splicing speckles and Cajal bodies of the cell nucleus in eukaryotic cells - spliceosomes catalyse splicing, an integral step in eukaryotic precursor messenger RNA maturation
- primary function is in the processing of pre-messenger RNA (hnRNA) in the nucleus
- they also aid in the regulation of transcription factors (7SK RNA) or RNA polymerase II (B2 RNA), and maintaining the telomeres
- are always associated with a set of specific proteins, and the complexes are referred to as small nuclear ribonucleoproteins (snRNP, often pronounced “snurps”)
- eg. U1 spliceosomal RNA, U2 spliceosomal RNA, U4 spliceosomal RNA, U5 spliceosomal RNA, and U6 spliceosomal RNA
- mutations in these may result in diseases such as spinal muscular atrophy, dyskeratosis congenita, Prader–Willi syndrome, and medulloblastoma
- small nucleolar RNAs (snoRNAs)
- a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly ribosomal RNAs, transfer RNAs and small nuclear RNAs.
- small interfering RNAs (siRNAs)
- part of the RNA interference (RNAi) pathway
- microRNA (miRNA)
- see also micro RNAs (miRNAs)
- small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides
- base-pair to complementary sequences in mRNA molecules
- are involved in RNA silencing and post-transcriptional regulation of gene expression
- 90 families of miRNAs have been conserved since at least the common ancestor of mammals and fish, and most of these conserved miRNAs have important functions
- at least 40% of miRNA genes may lie in the introns or even exons of other genes
- 6% of human miRNAs show RNA editing which increases the diversity and scope of miRNA action beyond that implicated from the genome alone
- guide RNA (gRNA)
- a short sequence of RNA that functions as a guide for the Cas9-endonuclease or other Cas-proteins that cut the double-stranded DNA and thereby can be used for CRISPR gene editing
- PIWI-interacting RNA (piRNA)
- can silence jumping genes (aka transposons or transposable elements (TEs) ) and can do so variably so that new transposon mutations can all so be silenced 15)
- piRNA malfunctions have been linked to conditions such as human male infertility
- alternative splicing
- a genetic process where different segments of genes are removed, and the remaining pieces are joined together during transcription to messenger RNA (mRNA)
- this allows the evolutionary creation new protein isoforms but also, perhaps more importantly, has a role in regulating gene expression levels
Epigenetics
- DNA epigenetics organises the available genes and RNA epigenetics dynamically adjusts their use and offers incredibly precise regulation of gene activity, essential to the development of organisms and the harmonious functioning of cells 16)
- this mechanism is particularly important in key stages like cells' development or their specialisation into different types
- these modifications also occur in response to the exposome
- see also gene regulation
DNA damage and repair
- DNA-protein cross-links (DPCs)
- these are highly toxic lesions in which proteins become covalently attached to DNA, blocking essential processes such as replication and transcription and thus these need to be removed to ensure the cell's genomic stability and ability to divide
- DPCs can result from:
- natural cellular activities:
- Reactive Oxygen Species (ROS)
- reactive aldehydes:
- endogenous aldehydes (e.g., formaldehyde, malondialdehyde) act as bridges, cross-linking protein amino groups to DNA bases
- topoisomerase trapping:
- during DNA replication or transcription, TOP1 or TOP2 enzymes can become covalently trapped on DNA when trying to relieve tension, forming Top-DPCs
- exogenous issues:
- UV radiation
- excessive alcohol drinking increasing aldehydes
- cytotoxic drugs
- toxins
- exposure to chromium (Cr), nickel (Ni), and formaldehyde
- the protease SPRTN was the first enzyme identified to resolve these lesions by cleaving the protein component from DNA - SPRTN repairs DPCs not only during replication (S phase) but also mitosis (M phase)
- inherited inactivating mutations in SPRTN cause Ruijs-Aalfs progeria syndrome (RJALS), a rare disorder marked by premature aging and early-onset liver cancer
- if this repair process is inadequate, DPCs can leak into the cytoplasm which activates the cGAS-STING innate immune pathway through recognition of cytosolic DNA and micronuclei and this activation can result in cell death 17)
dna.txt · Last modified: 2026/01/30 22:58 by gary1