BINF630/BINF530/BIOL580/BINF401 Spring 2024.
Homework 1. Due March 14, 2024.

The report should be submitted by email to both instructors as a Word or PDF file with the filename "b630_24_hw1_Your_Name.doc or .pdf". The string "b630_24_hw1_Your_Name" should be also included in the message subject line.

1. Identify the protein encoded by DNA sequence corresponding to your G-number using two different approaches. Find the entries for this protein in two different databases.

>Last digit of G-number: 0, 1, 2 
ATGCAGGCTCAACAGTACCAGCAGCAGCGTCGAAAATTTGCAGCTGCCTTCTTGGCATTCATTTTCATAC
TGGCAGCTGTGGATACTGCTGAAGCAGGGAAGAAAGAGAAACCAGAAAAAAAAGTGAAGAAGTCTGACTG
TGGAGAATGGCAGTGGAGTGTGTGTGTGCCCACCAGTGGAGACTGTGGGCTGGGCACACGGGAGGGCACT
CGGACTGGAGCTGAGTGCAAGCAAACCATGAAGACCCAGAGATGTAAGATCCCCTGCAACTGGAAGAAGC
AATTTGGCGCGGAGTGCAAATACCAGTTCCAGGCCTGGGGAGAATGTGACCTGAACACAGCCCTGAAGAC
CAGAACTGGAAGTCTGAAGCGAGCCCTGCACAATGCCGAATGCCAGAAGACTGTCACCATCTCCAAGCCC
TGTGGCAAACTGACCAAGCCCAAACCTCAAGCAGAATCTAAGAAGAAGAAAAAGGAAGGCAAGAAACAGG
AGAAGATGCTGGATTAA

>Last digit of G-number: 3, 4, 5, 6
ATGAAAGTCCTGCTTTGTGACCTGCTGCTGCTCAGTCTCTTCTCCAGTGTGTTCAGCAGTTGTCAGAGGG
ACTGTCTCACATGCCAGGAGAAGCTCCACCCAGCCCTGGACAGCTTCGACCTGGAGGTGTGCATCCTCGA
GTGTGAAGAGAAGGTCTTCCCCAGCCCCCTCTGGACTCCATGCACCAAGGTCATGGCCAGGAGCTCTTGG
CAGCTCAGCCCTGCCGCCCCAGAGCATGTGGCGGCTGCTCTCTACCAGCCGAGAGCTTCGGAGATGCAGC
ATCTGCGGCGAATGCCCCGAGTCCGGAGCTTGTTCCAGGAGCAGGAAGAGCCCGAGCCTGGCATGGAGGA
GGCTGGTGAGATGGAGCAGAAGCAGCTGCAGAAGAGATTTGGGGGCTTCACCGGGGCCCGGAAGTCGGCC
AGGAAGTTGGCCAATCAGAAGCGGTTCAGTGAGTTTATGAGGCAATACTTGGTCCTGAGCATGCAGTCCA
GCCAGCGCCGGCGCACCCTGCACCAGAATGGTAATGTGTAG

>Last digit of G-number: 7, 8, 9
ATGGCCTCCGGTGTGGCTGTCTCTGATGGTGTCATCAAGGTGTTCAACGACATGAAGGTGCGTAAGTCTT
CAACGCCAGAGGAGGTGAAGAAGCGCAAGAAGGCGGTGCTCTTCTGCCTGAGTGAGGACAAGAAGAACAT
CATCCTGGAGGAGGGCAAGGAGATCCTGGTGGGCGATGTGGGCCAGACTGTCGACGACCCCTACGCCACC
TTTGTCAAGATGCTGCCAGATAAGGACTGCCGCTATGCCCTCTATGATGCAACCTATGAGACCAAGGAGA
GCAAGAAGGAGGATCTGGTGTTTATCTTCTGGGCCCCCGAGTCTGCGCCCCTTAAGAGCAAAATGATTTA
TGCCAGCTCCAAGGACGCCATCAAGAAGAAGCTGACAGGGATCAAGCATGAATTGCAAGCAAACTGCTAC
GAGGAGGTCAAGGACCGCTGCACCCTGGCAGAGAAGCTGGGGGGCAGTGCCGTCATCTCCCTGGAGGGCA
AGCCTTTGTGA

2. Briefly describe the function of your protein and identify biologically important sites/regions in this protein.

3. Find nine proteins homologous to your protein. The proteins should be from the different species belonging to the following groups: monkeys, cattle, cats, rodents, bats, birds, fish, worms, and yeast. At least five of these groups should be represented in your list of proteins (mark them in your report). Create a table containing several most important characteristics of these proteins.

4. Build multiple sequence alignments of ten sequences identified in Q1 and Q3 using two different MSA algorithms. Generate a tree for each alignment. Compare (qualitatively) the alignments and the trees and interpret the results of these comparisons.

5. Find known protein sequence motifs or patterns in your protein. Explain the statistical results of using these motifs to identify relevant proteins in a protein sequence database.

6. FOR BINF630, BINF530 AND BIOL580 STUDENTS ONLY: Locate one highly conserved 10 residue region and one of the least conserved 10 residue regions in one of the alignments from Q4. Write regular expressions for both regions. Search a protein sequence database using these two regular expressions. Report and interpret the results of these two searches.