PROTEINS... Work Horses of Cell Metabolism

entire complement of an organisms proteins:
                             yeast         6,000 proteins
                             human     100,000 proteins

 We'll look at how Structure gives rise to Function
   a)  structure:
            primary, secondary, teritary, & quarternary
   b)  protein folding mechanisms - chaperones
   c)  degradation/turnover - proteosomes
   dmolecular motors
   eenzyme kinetics
             reading pages 63-78         





Proteins - classified by functions...
   (not structurally) 

Enzymes - catalytic activity and function


Transport Proteins - bind & carry ligands
Storage Proteins - ovalbumin, gluten, casein, ferretin
Contractile  (Motor):  can contract, change shape, elements of  cytoskeleton  (actin, myosin, tubulin)
Structural  (Support):  collagen of  tendons & cartilage, elastin of ligaments (tropoelastin),  
                              keratin of hair, feathers, & nails, fibroin of silk & webs
Defensive  (Protect):  antibodies (IgG),  fibrinogen & thrombin, snake venoms,  bacterial toxins
Regulatory  (Signal):  regulate metabolic processes, hormones, transcription factors & enhancers,
                           growth factor proteins
Receptors (detect stimuli):  light & rhodopsin, membrane receptor proteins and acetylcholine or insulin.
                                                                                                        ecb panel 4.1 pg 120  





Nomenclature - classes of proteins
        Historically based on


SIMPLE PROTEINS: on hydrolysis include only amino acids:
1. Albumins - soluble in water (distilled),  globular,  most enzymes
     2. Globulins - soluble in dilute aqueous solutions; insoluble in pure distilled water
      3. Prolamins - insoluble in water; soluble in 50% to 90% simple alcohols
       4. Glutelins - insoluble in most solvents; soluble in dilute acids/bases
        5. Protamines - not based upon solubility; small MW proteins
with 80% Arginine & no Cysteine 
  6. Histones - unique/structural: complexed w DNA,  high # basic aa's - 90% Arg, Lys, or His
    7. Scleroproteins - insoluble in most solvents

                           fibrous structure - architectural proteins of cartilage & connective tissue
Collagen = high Glycine, Proline, & no Cysteine when boiled makes gelatin
Keratins - proteins of skin & hair
high basic aa's (Arg, His, Lys), but w Cys







    Complex Proteins:
     on hydrolysis yield amino acids + other molecules

Ribonuclease (purple)  &  RNA (green)

     lipoproteins -   (+ lipids)
              blood, membrane, &   transport proteins
     glycoproteins -  (+ carbohydrates)
              antibodies, cell surface proteins
     nucleoproteins -  (+ nucleic acids)
              ribosomes & organelles
             Common terminology:
                            dipeptide = 2 amino acids    tripeptide = 3 amino acids
                            peptide = short chain of amino acids (20-30)      
                            polypeptide = many amino acids (up to 4,000)
                            protein =  polypeptide with well defined 3D structure                                  






Structure of Proteins 

the Variety of Protein Structures may be INFINITE...
         average protein has 300-400 amino acid's  &  has a MW of 30kD to 45kD
          a PROTEIN of 300 amino acids made with 20 different kinds 
          of amino acids can have 20300 different linear arrays of aa's 
[10390 different proteins]

     1st protein sequenced was Beef  Insulin
          by Fred Sanger - 1958 Nobel Prize winner     
          2 polypeptides [21/30 aa's]       Humulin   &  ADA

          to date about 100,000 human proteins are suspected
          only about 10,000 structures have been identified...  
                                        E. coli make about 3,000 proteins.


  HIV Petidase





               4 levels of protein structure are recognized

  primary      - linear sequence of aa's
  secondary  - regular, recurring orientation of aa
                          in a peptide chain due to
  tertiary      - complete 3-D shape of a peptide
                          due to
weak electrostatic forces
  quaternary - spatial relationships between different
                          polypeptides or subunits









Primary sequence…
         Linear sequence of amino acids in a polypeptide                            
                  repeated peptide bonds form the back bone of the polypeptide chain
                  R side groups project outward on alternate sides
         Chain... one end of polypeptide chain has a free (unlinked) amine group:  N-terminus
                     other end has a free (unlinked) carboxyl group:
         Size… a protein's size is specified by its mass (MW in Daltons = 1 amu) 
                   average MW of a single amino acid
113 Da
                               thus if a protein is determined to have a mass of 5,763 Da 
  51 amino acids
                   average yeast protein  =  52,728 Da  [52.7 kDa] with about 466 amino acids                   
       Protein Primary Sequence today is determined by reading the GENOME Sequence      
         Protein function is derived from the 3D structure (conformation) specified by
                   the primary amino acid sequence and the local environs interactions.






some consequences... of Primary Sequence…    
         Polymorphism... proteins may vary in primary sequence but still 
                      exhibit the same catalytic activity.    ex: peroxidase... H2O2  -->  2 H2O + O2 
                                                    inter-specific:   between species  [have diff.  aa sequences] 
                                              intra-specific:   within a species   [ liver vs. kidney ] 
         Invariants... don't vary significantly in aa sequence 
                                                examples:   ubiquitin (proteosomes)   &    histones (chromosomes) 
         Site Specificity… unique sequences determine intra-cellular location & function
                                      signal sequences of protein targeting, prosthetic binding sites, etc… 
         Families of Proteins: different structure but with related functions 
                              evolved from a single ancestral protein, up to 30%+ commonality of sequence...
                           serine proteases (trypsin chymotrypsin elastase) all have SER at active site.
          Homologous Proteins… similar characteristics: structurally similar; may  perform the same
                  cellular function, often in different species &
related by common evolutionary history;
ex: cytochrome-C: in duck & chickens = 2 variants  &  in yeast & horses = 48 variants 
          Mutation - change in primary amino acid sequence = defective protein - SICKLE CELL





Secondary structure
- well defined periodic structure: makes up 60% of a protein's structure

   Alpha helix*   described by Linus Pauling 1954 Nobel    using  X-ray*diffraction technique   

rigid rodlike cylinder around long axis core

R-groups radiate outward
3.6 aa per 360o turn
single repeat turn of helix (360o) = 0.54 nm
forms right handed helix - (counterclockwise)
helix formed from H-bond interactions
    H of N  (of one aa)   &   -C=O  (of 4th aa)
¼ of aa's in globular proteins occur in alpha helix
flexible - wool is stretchable (breaks H-bonds)
                               mcb fig 3.4*                                                click*







Secondary structure- BETA SHEET     fig 3.5*      (ecb 4.10) 

            short segments (5-8 residues) connect laterally by H-bonds of pleated sheets, e.g.,

a linear extended ZIG-ZAG pleated sheet formed by H-bonds -  intra- & inter-chain 


              can be parrallel and antiparallel - figure*
             resist pulling (tensile) forces  =  strength of silk fibers    
model = silk protein fibroin
              non- α/β regions  =  hinges, turns, loops, etc  =  flexibility          ribbons & sheets*

              turns - a region of 3 or 4 amino acids that redirect backbone;          mcb 3.6*  





  Structural MOTIFS: regular 3D conformations or folds within secondary
or tertiary structure common to many different proteins...

    indicative of a particular 3-D architecture & associated with specific function...
    same structure is present in different proteins that have similar functions;
    recurring arrangements of
α-helix and/or β-sheets in unrelated proteins....  such as:

     EF hand...  two short helices connected by a loop; a Ca+2 ion binder region of hydrophilic
                     residues present in over 100 Ca+2 binding proteins... aka                       fig 3.9b
          helix loop helix... commonly bind gene transcription factors to DNA

     zinc finger...  1 a and 2 β  strands with antiparallel orientations.
                         forms fingers bound by Zn ion that often link to DNA (RNA)            fig 3.9c
     coiled coil...  
a helicies, where the hydrophobic amino acids in one helix wind together
                           forming a coil with others; also called leucine zippers due to high
                           [leu]: also common to transcription factors.                                   
fig 3.9a*
                    Prints:  a protein fingerprint database of conserved protein motifs







Tertiary level
        level most responsible for 3-D orientation of proteins in space 
               is the thermodynamically most stable conformation of a protein... and is due to
                         – weak non-covalent interactions  [
figure* ]  
                         - hydrophobic interior & hydrophilic exterior favors globular shapes
                         - via H-bonds (ecb fig 4.31*)
                         - & S-S bridges  fig 16.19*         
[ecb fig 4.29]
               results in Protein Folding into specific 3D shapes & unique binding sites   
ecb fig 4.9  
        some examples of 3D structure in proteins: 
Lysozyme        MW 14,600 enzyme; egg white & human tears  pdb-lysozyme

                                        124 aa's with  4 S-S;  that hydrolyses polysaccharies
                                                   in bacterial cell walls = bactericidal agent   catalog  
                 Myoglobin         MW 16,700 -
animal muscle protein - stores O2  pic
                 Cytochrome -C  MW 12,400 - 
heme binding single   pic
                                                     polypeptide of 100 aa's in ETS of mitochondria 
         Ribonuclease   MW 13,700 enzyme of 124 aa w 4 S-S  pic





  DOMAINS - distinct modules or structural elements of the tertiary level of protein structure...
                      compact folded regions in a polypeptide of 100-150 amino acids, often self-forming,
                      self-stabilizing, that often fold independently.         
 ecb 5.12* & ecb 5.13*
  3 classes of domains:
     functional domain - region with particular activity characteristic of a protein CATALYSIS:
                ex: tyrosine kinase* activity domain of human insulin that add P~ to other molecules. pic
     structural domain - region of 40+ aa's in a stable 2nd or 3rd-ary conformation, often repeatable.
                         ex: 1. hemagglutinin: -  a surface protein on influenza viruses, that is made of 3
           mcb 3.10a*      quaternary identical subunits composed of 2 polypeptides (HA1 & HA2);
                                  each HA peptide has two domains... a globular domain and a fibrous domain  
                              2. EGF (Epidermal Growth Factor) domain - a small soluble peptide hormone
           mcb 3.11*         that binds to embryonic cells in skin/connective tissue & promotes cell division.
                                  EGF is generated by proteolytic hydrolysis as a domain from several other
                                  proteins, all of which have an EGF domain as a structural part.
     topological domain - distinctive spatial relationships to rest of a protein;
                                  ex: membrane proteins with extrinsic cytoplasmic domain   (CD4 protein pic)
                             and intrinsic membrane spanning domain.









PROTEIN FAMILIES -  proteins with a common evolutionary ancestry...
              some proteins have many identical or chemically similar amino acids in identical
              sequence positions
; each may contain domains that closely resembles that of other proteins.

              Proteins with common ancestors are known as homologs & belong to a "family".
         protein family: proteins with evolutionary relationships
                                (>30% aa sequence homology or common descent)
                     ex: 1.  serine proteases  ecb fig 4.21* - proteolytic enzymes with nearly identical
                                                            amino acid sequences all with a SER at the active site 
         protein superfamily: proteins with a probable common evolutionary origin that generally
                                                            contain one or more common motifs or domains
              family relations often best displayed by taxonomic cladistics (tree diagrams)
                       globins - gene slowly diverged into animal and plant lineages                  mcb 3.13a*
                                                        myoglobin - monomeric oxygen binder of muscle
                                                        hemoglobin - tetrameric oxygen binder of blood   mcb 3.13b*

                     Today, computer modeling is used to predict function of yet unisolated proteins
              by comparing known sequence homologies     sequence analysis = 2ndary structure  end 8





 QUARTERNARY structure:
multiple polypeptides each with a 3-D conformations = final shape
                         ex:   hemoglobin (pic*),     RNA polymerase,     ASP-trans-carbamylase

Some Common Quarternary Level Protein Shapes...
dimers -   self recognizing symmetrical regions - bind 
together @ identical binding sites
                  [ Catabolic Activator Protein
* ]   homodimers - 2 identical subunits
  heterodimers - non-identical subunits (as in PDH)
filaments  - polymers of protein subunits each bound together in an identical way
                             forming a
ring or helix      see ecb 4.24
colied-coil - 2 parallel helicies forming a stiff filament, linked via  
                             a stripe of hydrophobic aa's.     gk 4.16   [ keratin- ecb 4.16
* ]
tetramers -  4 identical subunits... ex: neuraminidase - ecb 4.22
* and hemoglobin*

    Multi-Enzymes Complexes : hemagglutinin A - a trimer of 3 identical polypeptide units  fig 3.10b*
                                             pyruvate dehydrogenase    picture*   &    pic
                                       ATP-synthase                   figure*  







 Multimeric proteins can have very complex...  Quartnerary Structure

and  form very large Macromolecular Assemblies... 
                 ( > 1m Da in mass ),    30-300 n
m in size,    &    10-100 individual peptides

                 examples include: 
                                            viral capsids, some cytoskeletal complexes, molecular machines,
                                            and mRNA transcription complex (some 60 proteins - fig 3.12*)

                 examples of such Molecular Machines can be seen in mcb/5e - table 3.1*
                                                                  we will look at some of these in greater detail later...

                 summary figure of protein structure - mcb 3.2*












Protein 3D Conformation is critical to Biological Function...     

 DENATURATION  loss of 3-D conformation by heat,  pH, organic solvents, detergents

                                             (anything to disturb tertiary/quaternary level forces)

                    fig 4.7 p125 ecb

              RENATURATION - regaining of biological activity via self-assembly









protein shape & conformation...
                      3-D SPATIAL ORIENTATION that is MOST thermodynamically STABLE
                                      and has the lowest free energy expenditure (forms spontaneously)
      3 most common conformations HELIX - a spiral staircase-like shape
                                                         FIBER - elongated bound monomers
- roughly a sphere
    the Native Conformation of most enzyme proteins is GLOBULAR:
an interior pocket of hydrophobic amino acids
                                      an exterior surface of hydrophilic amino acids  
maximizes the number H-bonds that form   fig 5.5*
    the PHYSICAL forces include mostly weak electrostatic bonds*:
                   non-covalent bonds, H-bonds,  hydrophobic  &  hydrophilic
                   interactions, & covalent bonds (as peptide bonds & disulfide bonds)
                          results in a great variety of protein shapes & sizes - ecb 4.9 pg 127*





How does 3D protein folding come about?             "FUNCTION follows FORM"
 peptide bond is PLANAR (partial double bond character) as are all the atoms bonded to it
 all occur is
same plane* & thus there is no free rotation = restricts protein conformations
  µ the native folded conformation is most stable, i.e., in lowest free energy state, often
       dictated by R-group properties (size, hydrophobicity) hydrophilicity, ionic strength
       folding involves: changes in 3D conformations:
                - by orderly steps in a sequential way, each step facilitating the next -
                - first 2
0 structure (a & β), then structural motifs & assembly of complex domains,
                         followed by 30 level forces and/or 40 shapes.            
 fig 3.15*
       Unless protected during folding, proteins
would interact with all the other molecules in a cell.
Cells makes 2 sets of proteins that facilitate folding: 
Molecular Chaperones - which bind and stabilize newly made unfolded proteins preventing these
                         proteins from self aggregating and/or being denatured before folding.
Chaperonins - which makeup a small folding chamber into which unfolded proteins are moved
                         to provide a proper environment favoring native folding of a protein.



CHAPERONES - are families of proteins to help "properly fold" a new protein...
          multiple chaperones bind to newly made proteins and include:
               Hsp70 (of cytosol & mitoplasm);     BiP (of the E.R.);  &     DnaK (of bacteria).
   1st discovered via heat shock treatment [
via temperature elevation 25o --> 32oC]
        by Ferruccio Ritossa
(1962 - Italy) in heat shocked fruit flies =  Chromosome puffs
   all cells make
heat shock proteins (HSPs); but mutant bacteria didn't make Hsp's
nor did they assemble normal proteins.


   HSP's are ubiquitous to all cells - produced in response to stress (heat, infection, etc...)
         and they act as "Chaperones" for other proteins by:
                           1. inhibiting undesirable interactions with other proteins
                           2. promoting desirable interactions
                                          help form stable bonds between protein partners
                                          in establishing proper conformation & preventing aggregations.









  Classes of Heat Shock Proteins:    HSP -40, -60, -70, -90  &  -100.
                     HSP are named according to the molecular weights (70 = kilodaltons)

HSP-40 binds new protein amino acid chains & carries it to Hsp-70

Hsp-70 grabs proteins by an open cleft when ATP is bound to it
                 OPEN conformation has hydrophobic pocket for new unfolded protein... 
                 in its
ADP conform closes around protein and aids native folding...

HSP-90 receives partially folded proteins from Hsp-70's and other chaperones...
                    helps join polypeptides into larger quaternary proteins forming
multi-subunit proteins, such as cellular receptors.                 figure*

             protein folding animations*view @ home                      protein folding video*








 HSP-60  also know as CHAPERONINS  or foldase -
                      is a small folding
CHAMBER of HSP's into which unfolded proteins are
                      moved to provide a proper environment favoring native folding...       figure

Molecular Machine: made of chaperone proteins  hsp70's & 60's form barrel shaped structure
                made of 14 polypeptides (from GroEL gene) in 2 donut rings with a cap (from GroES gene)
                that opens an inner chamber, where a cell's new protein enters & is folded.
        barrel chamber has 2 conformations: tight & relaxed
        new peptide is inserted into cavity of GroEL chamber & conformational changes favor native
protein folding; ATP hydrolysis = relaxed state & release of native 3D-protein  mcb6e-fig 3.17*
            fig A  &  fig B  [A.L. Horwich: PNAS 96:11-37, 1999] 

  HSP 100  - also known as unfoldase; also has a multi-subunit ring structure;
                     along with HPS-70 disassembles degraded proteins.









Misfolded Proteins  &  Disease
  PRION:     a defective protein agent (PrPsc) due to mis-coded gene (PRNPc)
                           native prion protein is PrPc & resides on nerve cell surfaces...
                           defective protein PrPsc accumulates forming aggregates that lead to CJD & SE's

CJD:     Creutzfeld-Jacob disease, genetic based or acquired - (by eating "mad cow" tissue)
                                fatal neurological disease due to presence of misfolded PRPc protein.
               Spongiform Encephalopathy (SE) - vacuolation (holes) in brain nerve tissue  

 Both PRION proteins can have identical aa sequence,
    but may fold differently
       [are conformers = proteins differ only in conformation]
.  normal (PrPc) protein...
         mostly α-helix foldings - remains soluble
 B. abnormal PrPsc protein...
         45% β-sheet - insoluble & protease insensitive
         produces cell surface aggregates that kill cells
           mechanism of action chart
          McGraw-Hill Online Learning - Raven et al 7th edition





  PROTEIN DEGRADATION (Digestion/Turnover)..  getting rid of misfolded proteins

    cells often contain specialized mechanisms or pathways to digest cell proteins...
           1.  to rapidly turnover proteins with short half-lives
           2.  to recognize & eliminate damaged or misfolded proteins that can lead to diseases
                           as Huntington's, Alzeheimer's, and Creutzfeldt-Jacob disease. 
    a. many proteins are degraded in cytosol using PROTEASES to cut (hydrolyze) peptide bonds

    b. some proteins are degraded in the LYSOSOMES via phagocytosis,

    c. but most proteins are degraded by large complexes of proteolytic enzymes in structures
             known as PROTEASOMES in process known as Ubiquitin-mediated Proteolysis (UMP).

                    short half-life proteins hold a signal sequence targeting proteins for UMP
                    and misfolded proteins seem to be recognized for degradation by the UMP.

                                                         protein digestion by proteasomes*






Discovered by Alfred Goldberg & Martin Rechsteiner in 1980's
  PROTEOSOMES... a large MOLECULAR MACHINE (mcb6e fig 3.29*)
     Average mammalian cell holds between 20,000 & 30,000 proteasomes.

     Each proteasome is a
barrel shaped complex (2,400kD) made of 3 parts...
1) a Regulatory Cap of 16-18 proteins (6 with ATPase activity)
19S cap only lets in only ubiquitinized proteins,
) a barrel core of 4 stacked protein rings w protease activities, &
3) a Base Cap.

  Protein Digestion... begins when cells add a small polypeptide (ubiquitin) to protein to be degraded.
ubiquitin: globular protein of 76 aa (virtually identical aa sequence in bacteria, yeast, or mammals)
    3 ubiquitin ligase enzymes [ E1, E2, E3 ] add Ubiquitin to proteins to be degraded,
ubiquitinized protein is targeted for entry into a Proteasome's central interior chamber, where
     proteases with
chymotrypic, tryptic, & caspase-like proteolytic activity cleave the protein
     into peptides.  The
ubiquitin is recycled.                     
     Ubiquitin mediated proteolysis*  &    protein life cycle animation (chaperonin & proteasomes)*









 Protein Engineering...  
      producing novel proteins, with unique shapes, via artificial means...

      uses PROTEOMICS... to make artificial proteins of desired sequence..
                          vaccine proteins - which can bind to viral surface and inactivates it
                          simplistic idea - but it's hard to make connection from 1o to 3o structure
  1.  modify existing proteins via site directed mutatgenesis...
            isolate a gene,  alter its sequence in precise way, clone the protein product...
                          - can be used to study effect of one amino acid change on 3D-folding
                          - often done with clinically useful proteins to enhance efficiency (Km)
  2.  structure based drug design...
            make drug molecules with high binding affinity to known proteins
[to remove it] 
            use computers to design 'virtual' drug to fit into a protein rendering it inactive
  3.  bionanotechnology...
the idea is to exploit molecule's assembly skills to build new nanodevices. Instead of
            domesticating plants and animals, it's time to domesticate molecules. Biology may be
            able to design nanodevices that build themselves.
    +   next lecture                                                                           





  a-b Barrels...   and   Coiled Coils          How do we these structures exist?

3 procedures are commonly used to help determine peptide structures:

  1.  Mass Spectrometry : detects exact MASS of small peptides.
            - a purified protein is treated with trypsin to produce peptides
                          trypsin cleaves polypeptides on COOH side of LYS & ARG residues]
            - peptides are dried onto metal plate, blasted with laser, vaporizing them as peptide ions,
            - peptide ions flow through electric field & time it takes to pass a detector
                     is a function of their charge & mass
(fig 4.11) which can be related to genomes.

   2.  X-Ray Crystallography : determines 3-D shape of molecules mathematically.
            - 1st crystallize a purified protein (a large, highly ordered, conformational array)
            - atoms of crystal scatter a beam of X-rays
(fig 4.12) forming a diffraction pattern
            - with up to 25,000 spots, a computer program interprets the patterns atom structure.


   3.  NMR Spectroscopy : magnetic signal indicate distances between atoms.
            - nuclei of atoms are "magnetic"... and magnetism is influenced by adjacent atoms
            - solution of protein is placed in strong magnetic field; then bombarded with radio waves
            - Hydrogen nuclei generate NMR signals
(fig 4.13) indicating distances between atoms
            - allows computation of 3D structure of molecules




DOMAINS - a structural element that is self-stabilizing & often folds independently
                      of the rest of the protein, found in a variety of other proteins;

       100 to 250 aa's of alpha ribbons & beta sheets making modular units  fig 4.19*  &  fig 4.20*
a b - Barrel   sheets fold to make staves of a "barrel" within a core of the protein 
figure*  and  triose-P-isomerase   
a b - Saddle   sheets fold to make a core that looks like a saddle 
fig 4.20*   and   LDH (lactic dehydrogenase)  
4-helix Bundle   4 a helicies connected by 3 bends form pocket for CoE  & metal ion binding sites 
           ex: fig 4.20*   and   cytochrome-B562 & TMV coat protein 
b b - Sandwich   criss-cross patchwork of b sheets; forms a hydrophobic interior pocket
          ex: fig 4.20*    and immunoglobulin, insecticyanin, and  antitrypsin 
b b - Barrel   b sheets occur in a circle each connected by helix link   fig 4.32*
   ex: PYR kinase glyaldehyde-P-isomerase,  Rubisco,   immunoglobulin antibody