• Ingen resultater fundet

Cellular Information Processes

The Eukaryotic Cell

2.1 Cellular Information Processes

The cell is a reactive system. This ability is intimately connected with a highly evolved information processing capability. The underlying processing plant is implemented by a highly connected system of interacting bio-polymers — the class ofpolymers that are produced by living organisms.

Molecular Complementarity. The notion ofmolecular complementarity is central in all chemical interactions. Like most relationships, the bonds formed by atomic and molecular entities are based on the mutual satisfaction of interests.

Some of the formed relationships are very strong and may form stablemolecules from otherwise separate entities. This is the case forcovalent bonds caused by the sharing of electrons between atoms ofcomplementary valence.

Other relationships are weaker and may form stable or less stable molecular complexes. Ionic bonds are caused by electrostatic forces between atoms of complementary charge, i.e., one iselectronegative and the otherelectropositive.

Hydrogen bonds are caused by (weaker) electrostatic forces between an elec-tronegative atom and an (eletropositive)) hydrogen atom bound in a dipolar constellation to another electronegative atom. Finally, van der Waals interac-tion is a weak unspecific attractive force between atoms in close proximity.

Water, the solvent of the cell, is an example of a dipole; it exhibits a small positive charge near the hydrogen atoms and a small negative one near the oxygen. Thewater solubility of molecules is determined by theirelectrochemical properties: Charged molecules are generally soluble, whereas the solubility of uncharged moleculesis determined by their ability to form hydrogen bonds with water molecules. Hydrophobic (water insoluble) molecules tend to associate tightly when submerged in water. This is called thehydrophobic effect.

2.1.1 Bio-molecular Agents and their Interactions

For the small molecules typically found in ordinary solution chemistry it is easy to predict the effects of these various forces. This is different, however, for bio-molecular agents that are mostly long polymeric chains composed of smaller monomeric units with individual electrochemical properties. The complicated structure of these entities results in a very involved notion of molecular comple-mentarity that induces a high degree of binding specificity. This is one of the features that most distinguishbiochemistry from typical solution chemistry.

Figure 2.1: Protein-protein interaction.

Three types of bio-polymers play a central role in cellular information processing:

Proteins. The proteins are the main actors of the cellular metabolism, i.e., the set of chemical reactions that occur in the living cell. They are sequences ofamino acids, i.e., simple acids consisting of aresidue, anamino group, and a carboxylate group. About 20 different amino acids occur in living organisms.

The properties of any protein are determined by the sequence of amino acids from which it is composed. This sequence constitutes the primary structure of the protein.

In turn, the properties of each individual amino acid are determined by its acid residue, which may be non-polar,basic,acid, or uncharged polar. The diversity of these properties allows non-trivial interaction patterns to emerge within the primary structure. In particular, hydrogen bonds may form between certain residues and cause stable localised foldings, such as α-helices, β-sheets, and turns. The resulting spatial arrangements constitute thesecondary structure of the protein.

The properties of the resulting chain of secondary structure elements are largely determined by the properties of the individual substructures. Some sections may behydrophilic (water soluble), and others hydrophobic. When submerged in water, proteins seek the most stable conformation and invariably fold to hide hydrophobic sections and expose the hydrophilic ones. This is an example of the hydrophobic effect. Internal formation of strong or, more commonly, weak bonds further stabilises the folded structures. The resulting 3-dimensional conformations constitute the tertiary structure of the protein.

The properties of a protein that has folded into a stable conformation are deter-mined by the resultingsolvent-accessible surface. The physical and electrochem-ical contours of the surface are characteristic for the protein. Two folded poly-mers may exhibit surface areas of (nearly) complementary contour and these binding sites then allow them to interact biochemically. If the binding sites

match well they allow the formation of many stabilising bonds. The molecules then have a highaffinity for one another and may form stable structures called complexesorcoordination compounds. The number and organisation of subunits in such a compound constitutes thequaternary structure. If the interaction is a briefreaction and the protein is acatalyst of change in the other molecule we call it anenzyme. In contrast, if the protein is changed by the interaction we call it asubstrate.

DNA. Deoxyribonucleic Acids are the main carriers of cellular hereditary in-formation. They are sequences ofnucleotides, i.e., monomeric units consisting of a base, a sugar, and one or more phosphate groups. InDNAthe sugar is always adeoxyribose and the bases areAdenine,Guanine,Cytosine, andThymine.

Every cell has a repository of hereditary information stored in DNA. The small-est unit of heredity is the gene. The collection of all genes present in a cell is thegenomeand, under normal circumstances, this is an extremely stable entity.

Figure 2.2: DNA double helix [Jon07].

This stability owes to the structure of DNA. The polymer sequence is formed as the sugar phosphates are linked up sequentially to form thebackbone. From this backbone the bases protrude as a sequence of stubs. The bases are prone to the formation of hydrogen bonds between pairs (A-T and C-G). Thus, comple-mentary strands canbase-pair and link up inanti-parallel to form a very stable duplex shaped like a double helix. Genetic information is invariably stored in this form.

It is usual to represent DNA as strings over the corresponding four letter al-phabet: A,C,G,T. The sequence is directional; one end is denoted by 5 and the other by 3 – denotations that reveal the positions of recognisable terminators on the sugar-ring of the first and last molecule synthesised, respectively. Thus, DNA and RNA are invariably synthesised in the direction of 5 →3, which is calleddownstream. The opposite direction isupstream.

RNA. Ribonucleic Acids are the main facilitators of interaction between pro-tein and DNA/RNA. They are nucleic acids where the sugar is a ribose and the bases are adenine, cytosine, guanine, andUracil.

Structurally, RNA is very similar to DNA. In contrast to DNA, however, it mostly occurs as comparatively short single-stranded chains of nucleotides and, due to the extra hydroxyl group of ribose, it is more prone to hydrolysis. Con-sequently RNA is relatively unstable and tends to fold into more stable (3-dimensional) conformations in order to stabilise. Folded RNA molecules often forms mixed ribonucleoprotein (RNP) complexes with proteins.

Thus, RNA strands can act both as information carriers (coding RNA) and functional units (non-coding RNA):

Coding RNAis synonymous withmessenger RNA(mRNA). This type of mole-cules carry transcripts of genes that encode proteins from the genome to the ribosomes, where the proteins are produced in accordance with the transcribed information.

The major types of non-coding RNA also play important roles in the transfer of information from DNA to proteins. Ribosomal RNA(rRNA) molecules con-stitute the main part of the ribosomes, the enzymes that read mRNA in order to produce proteins. Tworibosomal ribonucleoproteins (rRNPs), known as the large and thesmall subunit, respectively, form a functional ribosome. Transfer RNA(tRNA) molecules are the adaptors that select and hold individual amino acids in place for the ribosomal processing.

Other types of non-coding RNA serve in various regulatory capacities through-out the information transfer process.

2.1.2 The Central Dogma

The Central Dogma of Molecular Biology, first pronounced by Crick in 1958 [Cri58], states that the molecular flow of information is from DNA via tran-scription to RNA and from RNA via translation to protein(s). As shown in Figure 2.3 there are known exceptions, but these are associated with abnormal conditions.

CASE: Transcription of Genes. Each gene encodes either a set ofprotein isoforms or a non-coding RNA string. The first step in actually producing these entities is thetranscription of the corresponding DNA into RNA.

Figure 2.3: Flow of information in biological systems.

Figure 2.4: The transcription of genes [Jon07].

The stretch of DNA that is the gene consists of two regions. Thecoding region is what describes the actual product(s) and upstream from that is thepromoter region.

A number of enzymes are involved in the transcription process, which has three phases: During initiation an RNA polymerase attaches at the promoter and meltsthe DNA locally. Duringelongationthe polymerase synthesises aprimary transcript RNA from thesense strand (the one that codes the gene) by chaining of nucleoside tri-phosphates (NTPs). Finally, during termination the nascent (growing) RNA strand and the polymerase are released from the DNA.

Transcription is heavily regulated. A number of regulatory regions, located in the upstream or downstream area of the promoter, accommodate the binding of transcription factors. These affect the affinity of the promoter for RNA polymerase and may beactivators orrepressors, depending on their influence.

Figure 2.5: Post-transcriptional processing of pre-mRNA into mRNA.

The gene usually contains both coding regions (exons) and non-coding regions (introns). The introns are removed post-transcriptionally from the primary transcript, called precursor mRNA (pre-mRNA), by a process called splicing.

Regulated variations in this process allows a single gene to code a family of mRNA.

Translation of RNA. Translation begins when the small ribosomal subunit assembles on the mRNA and seeks out a start codon. Once the start codon is found the large subunit assembles and translation commences by stepwise elongation. When a stop codon is encountered the subunits disassemble and translation terminates.

Each elongation step consumes a tRNA molecule. At one end the tRNA has a nucleotide sequence (anti-codon) that can base-pair to a matching three-nucleotide fragment (codon) of mRNA. At the other end they have a binding domain that matches one of the twenty available amino acid monomers. This implicitly defines thegenetic code.

Proteins are folded and subjected to various enzyme-induced modifications as they emerge from the ribosome during translation.