LOCUS       ADRCG                   2907 bp    DNA     linear   VRL 14-MAR-1996
DEFINITION  Adenovirus type 2, complete genome.
ACCESSION   J01917 REGION: 18838..21744
VERSION     J01917.1  GI:209811
KEYWORDS    DNA polymerase; DNA-binding protein; RNA polymerase III;
            alternative splicing; coat protein; complete genome; genome-linked
            protein; glycoprotein; overlapping genes; polymerase; terminal
            repeat; unidentified reading frame; virus-associated RNA.
SOURCE      Human adenovirus 2
  ORGANISM  Human adenovirus 2
            Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus;
            Human adenovirus C.
COMMENT     [1]  RNA.
            [2]  sites; cds start for the hexon protein.
            [4]  sites; 5' terminus of VA I RNA.
            [6]  mRNA and DNA.
            [11]  cDNA to hexon mRNA.
            [12]  several fragments over this span;.
            [13]  mRNA and DNA.
            [19]  cDNA.
            [16]  sites; acceptor splice site for fiber mRNA.
            [18]  several leader fragments over this span.
            [27]  sites; splice sites for E1b mRNAs.
            [28]  sites; cds start for E3 19K glycoprotein.
            [29]  sites; cds start for 15K, IX and fiber polypeptides. [37]
            sites; cap site for E4 mrnas.
            [39]  sites; splice sites in E2a mRNA.
            [31]  fragments over this span.
            [34]  sites; splice site in 52,55K-pept mRNA.
            [30]  sites; splice sites in IVa2 mRNA, Ad5.
            [43]  cDNA and DNA.
            [42]  sites; splice site for 'i' leader.
            [46]  mRNA and DNA.
            [41]  sites; E1a mutational analysis.
            [55]  sites; splice sites for 33K mRNA.
            [56]  sites; cds start for E4 11K-pept, ad5.
            [60]  sites; cds start for the 13.6K-pept.
            [61]  sites; splice sites for 72K and 100K mRNAs.
            [50]  sites; splice sites for leaders; poly-A sites. [52]  sites;
            splice sites for E1a mRNAs.
            [58]  sites; splice sites in E2 mRNA.
            [53]  sites; H2ts1 mutation between 57.0% and 69.0%. [49]  H2ts125
            [68]  sites; cds start for E1a proteins.
            [69]  sites; splice sites in E4 region.
            [71]  sites; splice sites in E4 region; poly-A site for E4 mRNAs.
            [63]  sites; cds start for 57K-pept.
            [63]  sites; splice sites in E4 region; poly-A site for E4 mRNAs.
            [63]  sites; splice sites in E1b region.
            [(in) Doerfler,W. (Ed.);Adenovirus DNA: 1-51;Martinus Nijhoff
            Publishing, Boston ]  review; bases 1 to 35937.
            [75]  sites; recombination analysis of ad2 and ad5. [74]  sites;
            splice sites in major late mRNA.
            [73]  sites; IVa2 transcription start.
            [72]  sites; transcription start for EIa mRNAs.
            [70]  sites; E3 11.6 -K protein.
            [76]  sites; L3 mRNA polyadenylation site.
            [78]  sites; L3 mRNA polyadenylation site.
            Communicated on tape by R. Roberts.  That tape and [(in) Doerfler,
            W. (Ed.);Adenovirus DNA: 1-51;Martinus Nijhoff Publishing, Boston ]
            are the
            immediate sources of the annotation herein.
            A consensus sequence for the l-strand of the genome is shown.
            Population heterogeneity as distinct from strain variation is known
            (35937 +/- 9 bp) [67]; both are annotated as 'variation' below. For
            site differences with adenovirus type 5, see loci beginning <ad5>
            which are arranged in the library according to the map coordinates
            of <ad2> where one map unit corresponds to 360 bases throughout
            (see [44],[67]).  For mutational changes in the ad2 sequence, see
            the appropriate references above.
            The origin of replication is located in the first fifty bases from
            each end.
            Transcription is leftward off the l-strand and rightward off the
            r-strand; in the former case, the annotation shows '(c)' for
            complementary strand.  Complex splicing events give rise to perhaps
            fifty or more distinct mRNA transcripts at early, intermediate and
            late times after infection, many of which are still being
            characterized; in particular, some transcripts are known from
            electron microscopy which are not yet characterized at the sequence
            level.  To date nine mRNA start sites (cap sites) have been
            identified, and these define the general units of mRNAs under which
            all known transcripts are classified.
            From the r-strand, the early transcripts are E1a, E1b and E3.  The
            28 kb late transcript called herein 'major late mRNA' comprises
            five families, L1 through L5, of 3' co-terminal mRNAs.  L1, and to
            a lesser extent L2, can be expressed at early and intermediate
            times [34].  Transcripts from this region contain a common
            tripartite leader sequence at their 5' ends: the three segments of
            this leader are encoded at bases 6039-6079, 7101-7172 and
            9634-9723.  At early and intermediate times, an extra leader
            segment, the 'i' leader, is frequently present (bases 7942-8381).
            The IX message, the only unspliced message in ad2, is intermediate,
            and its termination overlaps that for E1b on the same strand and
            that for IVa2, and most likely E2b, on the opposite strand. From
            the l-strand, or the 'comp strand', early expression derives from
            the E2a, E2b and E4 families of mRNAs, although there can be late
            transcription from E2a.  The E2b cap sites, splice sites and
            termination sites have not been determined at the sequence level.
            From electron microscopy there is evidence that the E2b mRNAs may
            originate at the E2a early cap site at 27092 (c) and terminate at
            the poly-A addition site found for the IVa2 mRNA at 4050 (c) [44].
            IVa2 is an intermediate message.  The promoters for these nine
            classes of mRNAs can be localized and characterized to the
            following extent [32]:
                mRNA          cap site      possible promoter region
               ------        ----------    -----------------------------
                E1a             498         tatttata at 468-474
                E1b            1699         tatataat at 1669-1676
                IX             3576         tatataa at 3545-3551
                major late     6039         tataaaa at 6008-6014
                E3            27609         tataa at 27580-27584
                E4            35609 (c)     tatatata at 35641-35633 (c)
                E2a early     27092 (c)     no obvious sequence for 100 bases
                E2a late      25956 (c)     tacaaattt at 25985-25977 (c)
                IVa2           5826 (c)     no obvious sequence for 100 bases
            The mRNA responsible for the 13.6K protein encoded at 7968 has not
            been identified.  The VA I and VA II transcripts are unique in that
            they are generated by RNA polymerase III; for a discussion of these
            low molecular weight RNAs-- the modulation of their start points,
            their promoters, their heterogeneity and their similarity to tRNA--
            see [3],[4],[5],[26] and <ad5a2>.
            The proteins known to be encoded from these mRNAs are given in the
            Features table below, though the details of translation and
            processing have not been fully determined.  In cases such as the
            IIIa peptide or the 11K peptide, the exact span of the coding
            awaits elucidation of the mRNA splicing.  Some of these products
            share reading frames and therefore manifest partial homologies. The
            following table summarizes the unidentified reading frames ('URF')
            of 100 or more amino acids:
                initiator    terminator      frame    protein encoded
                -----------  ----------     -------   -----------------
                 6280           6600           1      11.6K URF
                17284          17763           1      17.4K URF
                23782          24138           1      12.9K URF
                24481          24867           1      14.2K URF
                26044          26826           1      28.6K URF(contains the
                                                      N-terminus of 33K cds)
                30973          32778           1      63.9K URF(contains the
                                                      fiber cds)
                10421          10834           2      14.4K URF
                20504          20935           2      15.7K URF
                27899          28222           2      12.4K URF
                30059          30451           2      14.5k URF
                33956          34456           2      18.8K URF
                 9294           9800           3      17.7K URF
                23526          26525           3      110.2K URF(contains the
                                                      100K-pept cds)
                30444          30830           3      14.7K URF
                34470          34808           3      12.7K URF
                complementary strand
                35532          35146           1      14.3K URF
                34077          33193           1      34.1K URF
                11109          10744           1      12.8K URF
                 9030           8383           1      22.8K URF
                 6780           6442           1      12.8K URF
                31604          31290           2      10.7K URF
                31211          30852           2      13.5K URF
                18707          18159           2      18.9K URF
                14861          14424           2      16.4K URF
                14114          13728           2      13.5K URF
                11618          11250           2      13.6K URF
                 1712 1194 2 18.1K URF
                35113          34703           3      15.3K URF
                34342          33998           3      13.3K URF
                 5674           5327           3      12.2K URF Additionally
            there are numerous unidentified reading frames of less than 100
            amino acid residues; and further small modifications of a few of
            the coding sequences are possible.
            [7] missing data project.
FEATURES             Location/Qualifiers
     source          1..2907
                     /organism="Human adenovirus 2"
                     /mol_type="genomic DNA"
     gene            complement(<1..>2907)
     mRNA            complement(<1..>2907)
     prim_transcript <1..>2907
     gene            <1..>2907
     prim_transcript <1..>2907
     gene            <1..>2907
     prim_transcript <1..>2907
     gene            <1..1650
     intron          <1..>2907
                     /note="precedes 100K mRNA"
     intron          <1..2812
                     /note="precedes 23K mRNA"
     CDS             1..2907
                     /note="virion component II"
                     /product="hexon protein"
     old_sequence    77..78
     old_sequence    82
     old_sequence    780
     old_sequence    829
     old_sequence    986
     old_sequence    1590
     old_sequence    1650
        1 atggctaccc cttcgatgat gccgcagtgg tcttacatgc acatctcggg ccaggacgcc
       61 tcggagtacc tgagccccgg gctggtgcag tttgcccgcg ccaccgagac gtacttcagc
      121 ctgaataaca agtttagaaa ccccacggtg gcacctacgc acgacgtaac cacagaccgg
      181 tcccagcgtt tgacgctgcg gttcatccct gtggaccgcg aggataccgc gtactcgtac
      241 aaagcgcggt tcaccctggc tgtgggtgac aaccgtgtgc ttgatatggc ttccacgtac
      301 tttgacatcc gcggcgtgct ggacaggggg cctactttta agccctactc cggcactgcc
      361 tacaacgctc tagctcccaa gggcgctcct aactcctgtg agtgggaaca aaccgaagat
      421 agcggccggg cagttgccga ggatgaagaa gaggaagatg aagatgaaga agaggaagaa
      481 gaagagcaaa acgctcgaga tcaggctact aagaaaacac atgtctatgc ccaggctcct
      541 ttgtctggag aaacaattac aaaaagcggg ctacaaatag gatcagacaa tgcagaaaca
      601 caagctaaac ctgtatacgc agatccttcc tatcaaccag aacctcaaat tggcgaatct
      661 cagtggaacg aagctgatgc taatgcggca ggagggagag tgcttaaaaa aacaactccc
      721 atgaaaccat gctatggatc ttatgccagg cctacaaatc cttttggtgg tcaatccgtt
      781 ctggttccgg atgaaaaagg ggtgcctctt ccaaaggttg acttgcaatt cttctcaaat
      841 actacctctt tgaacgaccg gcaaggcaat gctactaaac caaaagtggt tttgtacagt
      901 gaagatgtaa atatggaaac cccagacaca catctgtctt acaaacctgg aaaaggtgat
      961 gaaaattcta aagctatgtt gggtcaacaa tctatgccaa acagacccaa ttacattgct
     1021 ttcagggaca attttattgg cctaatgtat tataacagca ctggcaacat gggtgttctt
     1081 gctggtcagg catcgcagct aaatgccgtg gtagatttgc aagacagaaa cacagagctg
     1141 tcctatcaac tcttgcttga ttccataggt gatagaacca gatatttttc tatgtggaat
     1201 caggctgtag acagctatga tccagatgtt agaatcattg aaaaccatgg aactgaggat
     1261 gaattgccaa attattgttt tcctcttggg ggtattgggg taactgacac ctatcaagct
     1321 attaaggcta atggcaatgg ctcaggcgat aatggagata ctacatggac aaaagatgaa
     1381 acttttgcaa cacgtaatga aataggagtg ggtaacaact ttgccatgga aattaaccta
     1441 aatgccaacc tatggagaaa tttcctttac tccaatattg cgctgtacct gccagacaag
     1501 ctaaaataca accccaccaa tgtggaaata tctgacaacc ccaacaccta cgactacatg
     1561 aacaagcgag tggtggctcc cgggcttgta gactgctaca ttaaccttgg ggcgcgctgg
     1621 tctctggact acatggacaa cgttaatccc tttaaccacc accgcaatgc gggcctccgt
     1681 tatcgctcca tgttgttggg aaacggccgc tacgtgccct ttcacattca ggtgccccaa
     1741 aagttttttg ccattaaaaa cctcctcctc ctgccaggct catatacata tgaatggaac
     1801 ttcaggaagg atgttaacat ggttctgcag agctctctgg gaaacgatct tagagttgac
     1861 ggggctagca ttaagtttga cagcatttgt ctttacgcca ccttcttccc catggcccac
     1921 aacacggcct ccacgctgga agccatgctc agaaatgaca ccaacgacca gtcctttaat
     1981 gactaccttt ccgccgccaa catgctatac cccatacccg ccaacgccac caacgtgccc
     2041 atctccatcc catcgcgcaa ctgggcagca tttcgcggtt gggccttcac acgcttgaag
     2101 acaaaggaaa ccccttccct gggatcaggc tacgaccctt actacaccta ctctggctcc
     2161 ataccatacc ttgacggaac cttctatctt aatcacacct ttaagaaggt ggccattacc
     2221 tttgactctt ctgttagctg gccgggcaac gaccgcctgc ttactcccaa tgagtttgag
     2281 attaaacgct cagttgacgg ggagggctac aacgtagctc agtgcaacat gaccaaggac
     2341 tggttcctgg tgcagatgtt ggccaactac aatattggct accagggctt ctacattcca
     2401 gaaagctaca aggaccgcat gtactcgttc ttcagaaact tccagcccat gagccggcaa
     2461 gtggttgacg atactaaata caaggagtat cagcaggttg gaattcttca ccagcataac
     2521 aactcaggat tcgtaggcta cctcgctccc accatgcgcg agggacaggc ttaccccgcc
     2581 aacgtgccct acccactaat aggcaaaacc gcggttgaca gtattaccca gaaaaagttt
     2641 ctttgcgatc gcaccctttg gcgcatccca ttctccagta actttatgtc catgggcgca
     2701 ctcacagacc tgggccaaaa ccttctctac gccaactccg cccacgcgct agacatgact
     2761 tttgaggtgg atcccatgga cgagcccacc cttctttatg ttttgtttga agtctttgac
     2821 gtggtccgtg tgcaccagcc gcaccgcggc gtcatcgaga ccgtgtacct gcgcacgccc
     2881 ttctcggccg gcaacgccac aacataa