From phd@EMBL-Heidelberg.de Wed Nov 25 10:24:25 1998 Date: Tue, 24 Nov 1998 17:45:25 +0100 From: Protein Prediction To: eric.beitz@uni-tuebingen.de Subject: PredictProtein The following information has been received by the server: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ reference predict_h25873 (Tue Nov 24 17:43:21 MET 1998) from eric.beitz@uni-tuebingen.de password(###) resp MAIL orig HTML prediction of: -secondary structure (PHDsec)-solvent accessibility (PHDacc)- return msf format # no description MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSIATLAQSVGHISGAHSNPAVT LGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLLENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRR RRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD RMKVWTSGQVEEYDLDADDINSRVEMKPK ________________________________________________________________________________ Result of PROSITE search (Amos Bairoch): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ please quote: A Bairoch, P Bucher & K Hofmann: The PROSITE database, its status in 1997. Nucl. Acids Res., 1997, 25, 217-221. ________________________________________________________________________________ -------------------------------------------------------- -------------------------------------------------------- Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001 Pattern-DE: N-glycosylation site Pattern: N[^P][ST][^P] 42 NQTL 250 NFSN Pattern-ID: GLYCOSAMINOGLYCAN PS00002 PDOC00002 Pattern-DE: Glycosaminoglycan attachment site Pattern: SG.G 135 SGQG Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005 Pattern-DE: Protein kinase C phosphorylation site Pattern: [ST].[RK] 157 TDR 398 TDR Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006 Pattern-DE: Casein kinase II phosphorylation site Pattern: [ST].{2}[DE] 118 SLLE 383 SRVE Pattern-ID: MYRISTYL PS00008 PDOC00008 Pattern-DE: N-myristoylation site Pattern: G[^EDRKHPFYW].{2}[STAGCN][^P] 30 GSALGF 92 GLSIAT 179 GLLLSC 288 GAIVAS 407 GITSSL 544 GVNSGQ 722 GLSVAL 917 GINPAR 1141 GSALAV Pattern-ID: PROKAR_LIPOPROTEIN PS00013 PDOC00013 Pattern-DE: Prokaryotic membrane lipoprotein lipid attachment site Pattern: [^DERK]{6}[LIVMFWSTAG]{2}[LIVMFYSTAGCQ][AGS]C 77 PAVTLGLLLSC Pattern-ID: MIP PS00221 PDOC00193 Pattern-DE: MIP family signature Pattern: [HNQA].NP[STA][LIVMF][ST][LIVMF][GSTAFY] 74 HSNPAVTLG ________________________________________________________________________________ Result of ProDom domain search (Corpet, Gouzy, Kahn): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - please quote: ELL Sonnhammer & D Kahn, Prot. Sci., 1994, 3, 482-492 ________________________________________________________________________________ --- ------------------------------------------------------------ --- Results from running BLAST against PRODOM domains --- --- PLEASE quote: --- F Corpet, J Gouzy, D Kahn (1998). The ProDom database --- of protein domain families. Nucleic Ac Res 26:323-326. --- --- BEGIN of BLASTP output BLASTP 1.4.7 [16-Oct-94] [Build 17:06:52 Oct 31 1994] Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. Query= prot (#) ppOld, no description /home/phd/server/work/predict_h25873 (269 letters) Database: /home/phd/ut/prodom/prodom_34_2 53,597 sequences; 6,740,067 total letters. Searching..................................................done Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N 390 p34.2 (45) MIP(6) AQP1(4) GLPF(4) // PROTEIN INTRIN... 270 2.0e-32 1 45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z. 90 3.2e-13 2 45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WAT... 136 6.0e-13 1 304 p34.2 (61) AQP2(10) GLPF(6) MIP(5) // PROTEIN CHANN... 121 9.2e-11 1 45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE I... 80 1.2e-07 2 45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN. 80 1.2e-05 2 2027 p34.2 (15) GLPF(9) AQP3(2) // PROTEIN FACILITATOR ... 60 3.4e-05 2 45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATO... 63 0.024 1 45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5. 61 0.044 1 >390 p34.2 (45) MIP(6) AQP1(4) GLPF(4) // PROTEIN INTRINSIC CHANNEL WATER AQUAPORIN TONOPLAST MEMBRANE FOR PLASMA LENS Length = 88 Score = 270 (125.3 bits), Expect = 2.0e-32, P = 2.0e-32 Identities = 47/67 (70%), Positives = 56/67 (83%) Query: 156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVG 215 T D+RR +GGSAPL IG SVALGHL+ I YTGCG+NPARSFG AV+T NF+NHW++WVG Sbjct: 22 TDDKRRGSVGGSAPLPIGFSVALGHLIGIPYTGCGMNPARSFGPAVVTGNFTNHWVYWVG 81 Query: 216 PFIGSAL 222 P IG+ L Sbjct: 82 PIIGAVL 88 Score = 95 (44.1 bits), Expect = 2.3e-06, P = 2.3e-06 Identities = 20/33 (60%), Positives = 23/33 (69%) Query: 136 GQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSA 168 GQ L +EIIGT QLV CV ATTD +RR G + Sbjct: 1 GQNLVVEIIGTFQLVYCVFATTDDKRRGSVGGS 33 >45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z. Length = 96 Score = 90 (41.8 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13 Identities = 18/36 (50%), Positives = 25/36 (69%) Query: 166 GSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201 G AP+AIGL++ L HL++I T +NPARS A+ Sbjct: 25 GFAPIAIGLALTLIHLISIPVTNTSVNPARSTAVAI 60 Score = 63 (29.2 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13 Identities = 11/25 (44%), Positives = 14/25 (56%) Query: 210 WIFWVGPFIGSALAVLIYDFILAPR 234 W FWV P +G + LIY +L R Sbjct: 71 WFFWVVPIVGGIIGGLIYRTLLEKR 95 >45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WATER CHANNEL PROTEIN FOR RENAL COLLECTING DUCT) (ADH WATER CHANNEL) (AQUAPORIN 2) (COLLECTING DUCT WATER CHANNEL PROTEIN) (WCH-CD). Length = 49 Score = 136 (63.1 bits), Expect = 6.0e-13, P = 6.0e-13 Identities = 23/42 (54%), Positives = 34/42 (80%) Query: 50 VKVSLAFGLSIATLAQSVGHISGAHSNPAVTLGLLLSCQISI 91 +++++AFGL I TL Q++GHISGAH NPAVT+ L+ C +S+ Sbjct: 8 LQIAMAFGLGIGTLVQALGHISGAHINPAVTVACLVGCHVSV 49 >304 p34.2 (61) AQP2(10) GLPF(6) MIP(5) // PROTEIN CHANNEL WATER AQUAPORIN INTRINSIC DUCT COLLECTING FOR TONOPLAST WCH-CD Length = 43 Score = 121 (56.1 bits), Expect = 9.2e-11, P = 9.2e-11 Identities = 24/43 (55%), Positives = 31/43 (72%) Query: 70 ISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAIL 112 ISG H NPAVT+GLL+ + LRAV YI AQ +GA+ +A+L Sbjct: 1 ISGGHINPAVTIGLLIGGRFPFLRAVFYIAAQLLGAVAGAALL 43 >45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE INTEGRAL PROTEIN. Length = 69 Score = 80 (37.1 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07 Identities = 17/54 (31%), Positives = 32/54 (59%) Query: 149 LVLCVLATTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVL 202 L++ V++ R +G A +A+G+++ L +A +G +NPARS G A++ Sbjct: 13 LLMFVISGVATDDRAIGQVAGIAVGMTITLNVFVAGPISGASMNPARSIGPAIV 66 Score = 34 (15.8 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07 Identities = 8/18 (44%), Positives = 11/18 (61%) Query: 136 GQGLGIEIIGTLQLVLCV 153 GQ L IEII + L+ + Sbjct: 1 GQSLAIEIIISFLLMFVI 18 >45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN. Length = 119 Score = 80 (37.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05 Identities = 15/34 (44%), Positives = 24/34 (70%) Query: 1 MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALG 34 M +EI+ FWR++++E LA ++VFI G+A G Sbjct: 55 MQAEIRTLEFWRSIISECLASFMYVFIVCGAAAG 88 Score = 39 (18.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05 Identities = 9/17 (52%), Positives = 12/17 (70%) Query: 53 SLAFGLSIATLAQSVGH 69 +LA GL++ATL Q H Sbjct: 103 ALASGLAMATLTQCFLH 119 >2027 p34.2 (15) GLPF(9) AQP3(2) // PROTEIN FACILITATOR GLYCEROL UPTAKE AQUAPORIN DIFFUSION UPTAKE/EFFLUX PEPX 5'REGION ORF1 Length = 55 Score = 60 (27.8 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05 Identities = 17/46 (36%), Positives = 20/46 (43%) Query: 156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201 T D GG PL +G V + TG INPAR FG + Sbjct: 10 TDDGNNVPSGGLHPLMVGFLVMGIGMSLGGTTGYAINPARDFGPRI 55 Score = 37 (17.2 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05 Identities = 7/10 (70%), Positives = 8/10 (80%) Query: 149 LVLCVLATTD 158 L+ CVLA TD Sbjct: 2 LIACVLALTD 11 >45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATOR PROTEIN. Length = 26 Score = 63 (29.2 bits), Expect = 0.025, P = 0.024 Identities = 13/23 (56%), Positives = 18/23 (78%) Query: 205 NFSNHWIFWVGPFIGSALAVLIY 227 ++S WI VGP IG+ALAVL++ Sbjct: 1 DWSYAWIPVVGPVIGAALAVLVF 23 >45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5. Length = 27 Score = 61 (28.3 bits), Expect = 0.045, P = 0.044 Identities = 11/19 (57%), Positives = 18/19 (94%) Query: 50 VKVSLAFGLSIATLAQSVG 68 ++++LAFGL+I TLAQ++G Sbjct: 8 LQIALAFGLAIGTLAQALG 26 Parameters: E=0.1 B=500 V=500 -ctxfactor=1.00 Query ----- As Used ----- ----- Computed ---- Frame MatID Matrix name Lambda K H Lambda K H +0 0 BLOSUM62 0.322 0.138 0.394 same same same Query Frame MatID Length Eff.Length E S W T X E2 S2 +0 0 269 269 0.10 69 3 11 22 0.22 33 Statistics: Query Expected Observed HSPs HSPs Frame MatID High Score High Score Reportable Reported +0 0 59 (27.4 bits) 270 (125.3 bits) 14 14 Query Neighborhd Word Excluded Failed Successful Overlaps Frame MatID Words Hits Hits Extensions Extensions Excluded +0 0 5349 3124825 609708 2510548 4569 2 Database: /home/phd/ut/prodom/prodom_34_2 Release date: unknown Posted date: 12:24 PM MET DST May 06, 1998 # of letters in database: 6,740,067 # of sequences in database: 53,597 # of database sequences satisfying E: 9 No. of states in DFA: 564 (111 KB) Total size of DFA: 226 KB (256 KB) Time to generate neighborhood: 0.03u 0.00s 0.03t Real: 00:00:00 Time to search database: 9.80u 0.03s 9.83t Real: 00:00:10 Total cpu time: 9.90u 0.06s 9.96t Real: 00:00:10 --- END of BLASTP output --- ------------------------------------------------------------ --- --- Again: these results were obtained based on the domain data- --- base collected by Daniel Kahn and his coworkers in Toulouse. --- --- PLEASE quote: --- F Corpet, J Gouzy, D Kahn (1998). The ProDom database --- of protein domain families. Nucleic Ac Res 26:323-326. --- --- The general WWW page is on: ---- --------------------------------------- --- http://www.toulouse.inra.fr/prodom.html ---- --------------------------------------- --- --- For WWW graphic interfaces to PRODOM, in particular for your --- protein family, follow the following links (each line is ONE --- single link for your protein!!): --- http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=390 ==> multiple alignment, consensus, PDB and PROSITE links of domain 390 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=390 ==> graphical output of all proteins having domain 390 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45663 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45663 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45663 ==> graphical output of all proteins having domain 45663 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45611 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45611 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45611 ==> graphical output of all proteins having domain 45611 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=304 ==> multiple alignment, consensus, PDB and PROSITE links of domain 304 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=304 ==> graphical output of all proteins having domain 304 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45607 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45607 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45607 ==> graphical output of all proteins having domain 45607 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45606 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45606 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45606 ==> graphical output of all proteins having domain 45606 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=2027 ==> multiple alignment, consensus, PDB and PROSITE links of domain 2027 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=2027 ==> graphical output of all proteins having domain 2027 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45615 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45615 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45615 ==> graphical output of all proteins having domain 45615 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45638 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45638 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45638 ==> graphical output of all proteins having domain 45638 --- --- NOTE: if you want to use the link, make sure the entire line --- is pasted as URL into your browser! --- --- END of PRODOM --- ------------------------------------------------------------ ________________________________________________________________________________ --- Database used for sequence comparison: --- SEQBASE RELEASE 34.0 OF EMBL/SWISS-PROT WITH 59021 SEQUENCES The alignment that has been used as input to the network is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ --- ------------------------------------------------------------ --- MAXHOM multiple sequence alignment --- ------------------------------------------------------------ --- --- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY --- ID : identifier of aligned (homologous) protein --- STRID : PDB identifier (only for known structures) --- PIDE : percentage of pairwise sequence identity --- WSIM : percentage of weighted similarity --- LALI : number of residues aligned --- NGAP : number of insertions and deletions (indels) --- LGAP : number of residues in all indels --- LSEQ2 : length of aligned sequence --- ACCNUM : SwissProt accession number --- NAME : one-line description of aligned protein --- --- MAXHOM ALIGNMENT HEADER: SUMMARY ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME aqp1_rat 100 100 269 0 0 269 P29975 PROXIMAL TUBULE) (AQUAPOR aqp1_mouse 98 99 269 0 0 269 Q02013 PROXIMAL TUBULE) (AQUAPOR aqp1_human 93 97 269 0 0 269 P29972 PROXIMAL TUBULE) (AQUAPOR aqp1_bovin 90 95 269 1 2 271 P47865 PROXIMAL TUBULE) (AQUAPOR aqp1_sheep 90 94 269 2 3 272 P56401 PROXIMAL TUBULE) (AQUAPOR aqpa_ranes 78 89 268 2 5 272 P50501 AQUAPORIN FA-CHIP. aqp2_dasno 49 73 109 1 7 109 P79164 PROTEIN) (WCH-CD) (FRAGME aqp2_bovin 49 73 109 1 7 109 P79099 PROTEIN) (WCH-CD) (FRAGME aqp2_canfa 48 72 109 1 7 109 P79144 PROTEIN) (WCH-CD) (FRAGME aqp2_rabit 48 73 109 1 7 109 P79213 PROTEIN) (WCH-CD) (FRAGME aqp2_elema 47 72 109 1 7 109 P79168 PROTEIN) (WCH-CD) (FRAGME aqp2_horse 47 72 109 1 7 109 P79165 PROTEIN) (WCH-CD) (FRAGME aqp2_proha 47 73 109 1 7 109 P79229 PROTEIN) (WCH-CD) (FRAGME mip_rat 46 73 259 1 7 261 P09011 LENS FIBER MAJOR INTRINSI aqp2_oryaf 46 72 109 1 7 109 P79200 PROTEIN) (WCH-CD) (FRAGME mip_mouse 46 73 261 1 7 263 P51180 LENS FIBER MAJOR INTRINSI mip_ranpi 45 73 261 1 7 263 Q06019 LENS FIBER MAJOR INTRINSI mip_bovin 45 73 261 1 7 263 P06624 LENS FIBER MAJOR INTRINSI mip_human 45 73 261 1 7 263 P30301 LENS FIBER MAJOR INTRINSI mip_chick 45 72 110 1 1 112 P28238 LENS FIBER MAJOR INTRINSI aqp5_rat 44 71 262 2 8 265 P47864 AQUAPORIN 5. aqp5_human 44 71 262 2 8 265 P55064 AQUAPORIN 5. aqp2_human 44 72 261 2 8 271 P41181 PROTEIN) (WCH-CD). aqp4_human 43 70 266 2 5 323 P55087 AQUAPORIN 4 (WCH4) (MERCU aqp4_rat 43 70 266 2 5 323 P47863 AQUAPORIN 4 (WCH4) (MERCU aqp4_mouse 43 69 265 3 6 322 P55088 AQUAPORIN 4 (WCH4) (MERCU aqp2_rat 42 71 261 2 8 271 P34080 PROTEIN) (WCH-CD). aqp2_mouse 42 71 261 2 8 271 P56402 PROTEIN) (WCH-CD). wc2a_arath 42 67 248 4 12 287 P43286 PLASMA MEMBRANE INTRINSIC aqp6_human 42 68 260 2 9 282 Q13520 AQUAPORIN 6 (AQUAPORIN-2 wc2c_arath 41 66 248 4 12 285 P30302 INTRINSIC PROTEIN) (WSI-T wc2b_arath 41 66 248 4 12 285 P43287 PLASMA MEMBRANE INTRINSIC wc1c_arath 41 65 238 4 10 286 Q08733 (TMP-B). wc1b_arath 41 65 238 4 10 286 Q06611 (TMP-A). tipw_lyces 40 65 237 4 10 286 Q08451 (RIPENING-ASSOCIATED MEMB wc1a_arath 40 64 238 4 10 286 P43285 PLASMA MEMBRANE INTRINSIC tipw_pea 40 64 237 4 11 289 P25794 RESPONSIVE PROTEIN 7A). tipa_arath 38 64 250 3 9 268 P26587 TONOPLAST INTRINSIC PROTE aqua_atrca 38 64 246 4 10 282 P42767 AQUAPORIN. dip_antma 38 65 242 2 4 250 P33560 PROBABLE TONOPLAST INTRIN aqpz_ecoli 37 59 220 4 17 231 P48838 AQUAPORIN Z (BACTERIAL NO tip2_tobac 37 64 242 2 4 250 P24422 TONOPLAST INTRINSIC PROTE tip1_tobac 37 64 242 2 4 250 P21653 TONOPLAST INTRINSIC PROTE tipg_arath 33 62 241 2 4 251 P25818 TONOPLAST INTRINSIC PROTE bib_drome 33 60 260 4 10 700 P23645 NEUROGENIC PROTEIN BIG BR tipr_arath 33 62 243 2 4 253 P21652 TONOPLAST INTRINSIC PROTE tipa_phavu 33 62 246 2 4 256 P23958 TONOPLAST INTRINSIC PROTE tipg_orysa 32 62 240 2 5 250 P50156 TONOPLAST INTRINSIC PROTE --- --- MAXHOM ALIGNMENT: IN MSF FORMAT MSF of: /home/phd/server/work/predict_h25873-22040.hssp from: 1 to: 269 /home/phd/server/work/predict_h25873-22040.msfRet MSF: 269 Type: P 24-Nov-98 17:44:5 Check: 3448 .. Name: predict_h258 Len: 269 Check: 8331 Weight: 1.00 Name: aqp1_rat Len: 269 Check: 8331 Weight: 1.00 Name: aqp1_mouse Len: 269 Check: 7552 Weight: 1.00 Name: aqp1_human Len: 269 Check: 6501 Weight: 1.00 Name: aqp1_bovin Len: 269 Check: 7067 Weight: 1.00 Name: aqp1_sheep Len: 269 Check: 7582 Weight: 1.00 Name: aqpa_ranes Len: 269 Check: 4844 Weight: 1.00 Name: aqp2_dasno Len: 269 Check: 8933 Weight: 1.00 Name: aqp2_bovin Len: 269 Check: 9649 Weight: 1.00 Name: aqp2_canfa Len: 269 Check: 8990 Weight: 1.00 Name: aqp2_rabit Len: 269 Check: 8787 Weight: 1.00 Name: aqp2_elema Len: 269 Check: 9381 Weight: 1.00 Name: aqp2_horse Len: 269 Check: 8993 Weight: 1.00 Name: aqp2_proha Len: 269 Check: 8855 Weight: 1.00 Name: mip_rat Len: 269 Check: 9773 Weight: 1.00 Name: aqp2_oryaf Len: 269 Check: 8554 Weight: 1.00 Name: mip_mouse Len: 269 Check: 9723 Weight: 1.00 Name: mip_ranpi Len: 269 Check: 5937 Weight: 1.00 Name: mip_bovin Len: 269 Check: 1430 Weight: 1.00 Name: mip_human Len: 269 Check: 372 Weight: 1.00 Name: mip_chick Len: 269 Check: 4658 Weight: 1.00 Name: aqp5_rat Len: 269 Check: 9033 Weight: 1.00 Name: aqp5_human Len: 269 Check: 6547 Weight: 1.00 Name: aqp2_human Len: 269 Check: 6209 Weight: 1.00 Name: aqp4_human Len: 269 Check: 2589 Weight: 1.00 Name: aqp4_rat Len: 269 Check: 4412 Weight: 1.00 Name: aqp4_mouse Len: 269 Check: 2845 Weight: 1.00 Name: aqp2_rat Len: 269 Check: 5748 Weight: 1.00 Name: aqp2_mouse Len: 269 Check: 6526 Weight: 1.00 Name: wc2a_arath Len: 269 Check: 4866 Weight: 1.00 Name: aqp6_human Len: 269 Check: 9404 Weight: 1.00 Name: wc2c_arath Len: 269 Check: 6187 Weight: 1.00 Name: wc2b_arath Len: 269 Check: 7328 Weight: 1.00 Name: wc1c_arath Len: 269 Check: 8575 Weight: 1.00 Name: wc1b_arath Len: 269 Check: 9544 Weight: 1.00 Name: tipw_lyces Len: 269 Check: 9283 Weight: 1.00 Name: wc1a_arath Len: 269 Check: 598 Weight: 1.00 Name: tipw_pea Len: 269 Check: 9253 Weight: 1.00 Name: tipa_arath Len: 269 Check: 6544 Weight: 1.00 Name: aqua_atrca Len: 269 Check: 2848 Weight: 1.00 Name: dip_antma Len: 269 Check: 9619 Weight: 1.00 Name: aqpz_ecoli Len: 269 Check: 5641 Weight: 1.00 Name: tip2_tobac Len: 269 Check: 490 Weight: 1.00 Name: tip1_tobac Len: 269 Check: 622 Weight: 1.00 Name: tipg_arath Len: 269 Check: 3231 Weight: 1.00 Name: bib_drome Len: 269 Check: 7687 Weight: 1.00 Name: tipr_arath Len: 269 Check: 4476 Weight: 1.00 Name: tipa_phavu Len: 269 Check: 5563 Weight: 1.00 Name: tipg_orysa Len: 269 Check: 3537 Weight: 1.00 // 1 50 predict_h258 MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV aqp1_rat MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV aqp1_mouse MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV aqp1_human MASEFKKKLF WRAVVAEFLA TTLFVFISIG SALGFKYPVG NNQTAVQDNV aqp1_bovin MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV aqp1_sheep MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV aqpa_ranes MASEFKKKAF WRAVIAEFLA MILFVFISIG AALGFNFPIE EKANQtqDIV aqp2_dasno ......SVAF SRAVLAEFLA TLIFVFFGLG SALSWPQALP S.......VL aqp2_bovin ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQALP S.......VL aqp2_canfa ......SVAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL aqp2_rabit ......SIAF SRAVFAEFLA TLLFVFFGLG SALNWPSALP S.......TL aqp2_elema ......SIAF SRAVFSEFLA TLLFVFFGLG SALNWPQALP S.......VL aqp2_horse ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQAMP S.......VL aqp2_proha ......SIAF SRAVLSEFLA TLLFVFFGLG SALNWPQALP S.......VL mip_rat ...ELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL aqp2_oryaf ......SIAF SKAVFSEFLA TLLFVFFGLG SALNWPQALP S.......GL mip_mouse .MWELRSASF WRAIFAEFFA TLFYVFFGLG ASLRWA.... ...PGPLHVL mip_ranpi .MWEFRSFSF WRAVFAEFFG TMFYVFFGLG ASLKWAAGPA .......NVL mip_bovin .MWELRSASF WRAICAEFFA SLFYVFFGLG ASLRWA.... ...PGPLHVL mip_human .MWELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL mip_chick .......... .......... .......... .......... .......... aqp5_rat MKKEVCSLAF FKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL aqp5_human MKKEVCSVAF LKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL aqp2_human .MWELRSIAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL aqp4_human AFKGVWTQAF WKAVTAEFLA MLIFVLLSLG STINWG...G TEKPLPVDMV aqp4_rat AFKGVWTQAF WKAVTAEFLA MLIFVLLSVG STINWG...G SENPLPVDMV aqp4_mouse AFKGVWTQAF WKAVSAEFLA TLIFVL.GVG STINWG...G SENPLPVDMV aqp2_rat .MWELRSIAF SRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL aqp2_mouse .MWELRSIAY CRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL wc2a_arath DGAELKKWSF YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TDAGGVdgIL aqp6_human MLACRLWKAI SRALFAEFLA TGLYVFFGVG SVMRWPTALP S.......VL wc2c_arath DAEELTKWSL YRAVIAEFVA TLLFLYVTVL TVIGYKIQSD TKAGGVdgIL wc2b_arath DADELTKWSL YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TKAGGVdgIL wc1c_arath EPGELSSWSF YRAGIAEFIA TFLFLYITVL TVMGVKRA.. PNMCASVGIQ wc1b_arath EPGELASWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ tipw_lyces EPGELSSWSF YRAGIAEFMA TFLFLYITIL TVMGLKRSDS LCSSV..GIQ wc1a_arath EPGELSSWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ tipw_pea EPSELTSWSF YRAGIAEFIA TFLFLYITVL TVMGVVRESS KCKTV..GIQ tipa_arath RADEATHPDS IRATLAEFLS TFVFVFAAEG SILSLDKLYW EHAAHAGTni aqua_atrca DMGELKLWSF WRAAIAEFIA TLLFLYITVA TVIGYKKETD PCASVGL..L dip_antma SIGDSFSVAS IKAYVAEFIA TLLFVFAGVG SAIAYNKLTS DAALDPAGLV aqpz_ecoli .........M FRKLAAECFG TFWLVFGGCG SAVLAAGFPE ....LGIGFA tip2_tobac SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV tip1_tobac SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV tipg_arath RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV bib_drome MQAEIRTLEF WRSIISECLA SFMYVFIVCG AAAGVGVGAS VSSVL....L tipr_arath RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV tipa_phavu RTDEATHPDS MRASLAEFAS TFIFVFAGEG SGLALVKIYQ DSAFSAGELL tipg_orysa SHQEVYHPGA LKAALAEFIS TLIFVFAGQG SGMAFSKLTG GGATTPAGLI 51 100 predict_h258 KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA aqp1_rat KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA aqp1_mouse KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAVMYIIA aqp1_human KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS IFRALMYIIA aqp1_bovin KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS VLRAIMYIIA aqp1_sheep KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAIMYIIA aqpa_ranes KVSLAFGISI ATMAQSVGHV SGAHLNPAVT LGCLLSCQIS ILKAVMYIIA aqp2_dasno QIALAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA aqp2_bovin QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAVFYVAA aqp2_canfa QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA aqp2_rabit QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA aqp2_elema QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRATFYLAA aqp2_horse QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA aqp2_proha QIAMAFGLAI GTLVQTLGHI SGAHINPAVT IACLVGCHVS FLRALFYLAA mip_rat QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA aqp2_oryaf QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRAIFYVAA mip_mouse QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA mip_ranpi VIALAFGLVL ATMVQSIGHV SGAHINPAVT FAFLIGSQMS LFRAIFYIAA mip_bovin QVALAFGLAL ATLVQAVGHI SGAHVNPAVT FAFLVGSQMS LLRAICYMVA mip_human QVAMAFGLAL ATLVQSVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYMAA mip_chick .......... .......... .......... .......... .......... aqp5_rat QISIAFGLAI GTLAQALGPV SGGHINPAIT LALLIGNQIS LLRAVFYVAA aqp5_human QIALAFGLAI GTLAQALGPV SGGHINPAIT LALLVGNQIS LLRAFFYVAA aqp2_human QIAMAFGLGI GTLVQALGHI SGAHINPAVT VACLVGCHVS VLRAAFYVAA aqp4_human LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIAA aqp4_rat LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYITA aqp4_mouse LISLCFGLSI ATMVQCLGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIIA aqp2_rat QIAVAFGLGI GILVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA aqp2_mouse QIAVAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA wc2a_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LPRALLYIIA aqp6_human QIAITFNLVT AMAVQVTWKT SGAHANPAVT LAFLVGSHIS LPRAVAYVAA wc2c_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA wc2b_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA wc1c_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYIVM wc1b_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVYYIVM tipw_lyces GVAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYMVM wc1a_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRALYYIVM tipw_pea GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAIFYMVM tipa_arath LVALAHAFAL FAAVSAAINV SGGHVNPAVT FGALVGGRVT AIRAIYYWIA aqua_atrca GIAWSFGGMI FVLVYCTAGI SGGHINPAVT FGLFLARKVS LLRALVYMIA dip_antma AVAVAHAFAL FVGVSMAANV SGGHLNPAVT LGLAVGGNIT ILTGLFYWIA aqpz_ecoli GVALAFGLTV LTMAFAVGHI SGGHFNPAVT IGLWAGGRFP AKEVVGYVIA tip2_tobac AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA tip1_tobac AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA tipg_arath AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA bib_drome ATALASGLAM ATLTQCFLHI SGAHINPAVT LALCVVRSIS PIRAAMYITA tipr_arath AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA tipa_phavu ALALAHAFAL FAAVSASMHV SGGHVNPAVS FGALIGGRIS VIRAVYYWIA tipg_orysa AAAVAHAFAL FVAVSVGANI SGGHVNPAVT FGAFVGGNIT LFRGLLYWIA 101 150 predict_h258 QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV aqp1_rat QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV aqp1_mouse QCVGAIVATA ILSGITSSLV DNSLGRNDLA HGVNSGQGLG IEIIGTLQLV aqp1_human QCVGAIVATA ILSGITSSLT GNSLGRNDLA DGVNSGQGLG IEIIGTLQLV aqp1_bovin QCVGAIVATA ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV aqp1_sheep QCVGAIVATV ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV aqpa_ranes QCLGAVVATA ILSGITSGLE NNSLGLNGLS PGVSAGQGLG VEILVTFQLV aqp2_dasno QLLGAVAGAA ILHEITPPDV RG........ .......... .......... aqp2_bovin QLLGAVAGAA LLHEITPPAI RG........ .......... .......... aqp2_canfa QLLGAVAGAA LLHEITPPHV RG........ .......... .......... aqp2_rabit QLLGAVAGAA LLHEITPAEV RG........ .......... .......... aqp2_elema QLLGAVAGAA LLHELTPPDI RG........ .......... .......... aqp2_horse QLLGAVAGAA LLHEITPPDI RR........ .......... .......... aqp2_proha QLLGAVAGAA LLHELTPPDI RG........ .......... .......... mip_rat QLLGAVAGAA VLYSVTPPAV RGNLALNTLH AGVSVGQATT VEIFLTLQFV aqp2_oryaf QLLGAVAGAA LLHELTPPDI RG........ .......... .......... mip_mouse QLLGAVAGAA VLYSVTPPAV RGNLALNTLH TGVSVGQATT VEIFLTLQFV mip_ranpi QLLGAVAGAA VLYGVTPAAI RGNLALNTLH PGVSLGQATT VEIFLTLQFV mip_bovin QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PGVSVGQATI VEIFLTLQFV mip_human QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PAVSVGQATT VEIFLTLQFV mip_chick .......... .......... .......... .......... .......... aqp5_rat QLVGAIAGAG ILYWLAPLNA RGNLAVNALN NNTTPGKAMV VELILTFQLA aqp5_human QLVGAIAGAG ILYGVAPLNA RGNLAVNALN NNTTQGQAMV VELILTFQLA aqp2_human QLLGAVAGAA LLHEITPADI RGDLAVNALS NSTTAGQAVT VELFLTLQLV aqp4_human QCLGAIIGAG ILYLVTPPSV VGGLGVTMVH GNLTAGHGLL VELIITFQLV aqp4_rat QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV aqp4_mouse QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV aqp2_rat QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV aqp2_mouse QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV wc2a_arath QCLGAICGVG FVKAFQSSYY TRYGGgnSLA DGYSTGTGLA AEIIGTFVLV aqp6_human QLVGATVGAA LLYGVMPGDI RETLGINVVR NSVSTGQAVA VELLLTLQLV wc2c_arath QCLGAICGVG FVKAFQSSHY VNYGGgnFLA DGYNTGTGLA AEIIGTFVLV wc2b_arath QCLGAICGVG FRQSFQSSYY DRYGGgnSLA DGYNTGTGLA AEIIGTFVLV wc1c_arath QCLGAICGAG VVKGFQPNPY QtgGGANTVA HGYTKGSGLG AEIIGTFVLV wc1b_arath QCLGAICGAG VVKGFQPKQY QagGGANTIA HGYTKGSGLG AEIIGTFVLV tipw_lyces QCLGAICGAG VVKGFMVGPY QrgGGANVVN PGYTKGDGLG AEIIGTFVLV wc1a_arath QCLGAICGAG VVKGFQPKQY QagGGANTVA HGYTKGSGLG AEIIGTFVLV tipw_pea QVLGAICGAG VVKGFEGKQR FGDLNgnFVA PGYTKGDGLG AEIVGTFILV tipa_arath QLLGAILACL LLRLTTNGMR PVGFR...LA SGVGAVNGLV LEIILTFGLV aqua_atrca QCAGAICGVG LVKAFMKGPY NqgGGANSVA LGYNKGTAFG AELIGTFVLV dip_antma QCLGSTVACL LLKFVTNGL. ..SVPTHGVA AGMDAIQGVV MEIIITFALV aqpz_ecoli QVVGGIVAAA LLYLIASGKT GFDAAASGFA sgYSMLSALV VELVLSAGFL tip2_tobac QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGFQGVV MEIIITFALV tip1_tobac QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGLQGVV MEIIITFALV tipg_arath QLLGSVVACL ILKFATGGLA VPAFG...LS AGVGVLNAFV FEIVMTFGLV bib_drome QCGGGIAGAA LLYGVTVPGY QGNLQAasHS AALAAWERFG VEFILTSLVV tipr_arath QLLGSVVACL ILKFATGGLA VPPFG...LS AGVGVLNAFV FEIVMTFGLV tipa_phavu QLLGSIVAAL VLRLVTNNMR PSGF...HVS PGVGVGHMFI LEVVMTFGLM tipg_orysa QLLGSTVACF LLRFSTGGLA TGTFGL.... TGVSVWEALV LEIVMTFGLV 151 200 predict_h258 LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA aqp1_rat LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA aqp1_mouse LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA aqp1_human LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA aqp1_bovin LCVLATTDRR RRDLGGSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS aqp1_sheep LCVLATTDRR RrdLGDSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS aqpa_ranes LCVVAVTDRR RHDVSGSVPL AIGLSVALGH LIAIDYTGCG MNPARSFGSA aqp2_dasno .......... .......... .......... .......... .......... aqp2_bovin .......... .......... .......... .......... .......... aqp2_canfa .......... .......... .......... .......... .......... aqp2_rabit .......... .......... .......... .......... .......... aqp2_elema .......... .......... .......... .......... .......... aqp2_horse .......... .......... .......... .......... .......... aqp2_proha .......... .......... .......... .......... .......... mip_rat LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA aqp2_oryaf .......... .......... .......... .......... .......... mip_mouse LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA mip_ranpi LCIFATYDER RNGRLGSVSL AIGFSLTLGH LFGLYYTGAS MNPARSFAPA mip_bovin LCIFATYDER RNGRLGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA mip_human LCIFATYDER RNGQLGSVAL AVGFSLALGH LFGMYYTGAG MNPARSFAPA mip_chick ........DR HDGRPGSAAL PVGFSLALGH LFGIPFTGAG MNPARSFAPA aqp5_rat LCIFSSTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA aqp5_human LCIFASTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA aqp2_human LCIFASTDER RGENPGTPAL SIGFSVALGH LLGIHYTGCS MNPARSLAPA aqp4_human FTIFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA aqp4_rat FTIFASCDSK RTDVTGSVAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA aqp4_mouse FTVFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA aqp2_rat LCIFASTDER RGDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA aqp2_mouse LCIFASTDER RSDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA wc2a_arath YTVFSATDPK RSavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA aqp6_human LCVFASTDSR QTS..GSPAT MIGISWALGH LIGILFTGCS MNPARSFGPA wc2c_arath YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA wc2b_arath YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAS wc1c_arath YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA wc1b_arath YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA tipw_lyces YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA wc1a_arath YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITATG INPARSLGAA tipw_pea YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA tipa_arath YVVYStiDPK RGSLGIIAPL AIGLIVGANI LVGGPFSGAS MNPARAFGPA aqua_atrca YTVFSATDPK RSavPILAPL PIGFAVFMVH LATIPITGTG INPARSFGAA dip_antma YTVYAtaDPK KGSLGVIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA aqpz_ecoli LVIHGATDKF APA..GFAPI AIGLALTLIH LISIPVTNTS VNPARSTAVA tip2_tobac YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA tip1_tobac YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA tipg_arath YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA bib_drome LCYFVSTDPM KKFMGNS.AA SIGCAYSACC FVSMPYLN.. ..PARSLGPS tipr_arath YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA tipa_phavu YTVYGtiDPK RGAVSYIAPL AIGLIVGANI LVGGPFDGAC MNPALAFGPS tipg_orysa YTVYAtvDPK KGSLGTIAPI AIGFIVGANI LVGGAFDGAS MNPAVSFGPA 201 250 predict_h258 VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV aqp1_rat VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV aqp1_mouse VLTRNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV aqp1_human VITHNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV aqp1_bovin VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV aqp1_sheep VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV aqpa_ranes VLTKNFTYHW IFWVGPMIGG AAAAIIYDFI LAPRTSDLTD RMKVWTNGQV aqp2_dasno .......... .......... .......... .......... .......... aqp2_bovin .......... .......... .......... .......... .......... aqp2_canfa .......... .......... .......... .......... .......... aqp2_rabit .......... .......... .......... .......... .......... aqp2_elema .......... .......... .......... .......... .......... aqp2_horse .......... .......... .......... .......... .......... aqp2_proha .......... .......... .......... .......... .......... mip_rat ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP aqp2_oryaf .......... .......... .......... .......... .......... mip_mouse ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP mip_ranpi VLTRNFTNHW VYWVGPIIGG ALGGLVYDFI LFPRMRGLSE RLSILKGARP mip_bovin ILTRNFTNHW VYWVGPVIGA GLGSLLYDFL LFPRLKSVSE RLSILKGSRP mip_human ILTGNFTNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSISE RLSVLKGAKP mip_chick VITRNFTNHW VFWAGPLLGA ALAALLYELA LCPRARSMAE RLAV.LRGEP aqp5_rat VVMNRFssHW VFWVGPIVGA MLAAILYFYL LFPSSLSLHD RVAVVKGTYE aqp5_human VVMNRFsaHW VFWVGPIVGA VLAAILYFYL LFPNSLSLSE RVAIIKGTYE aqp2_human VVTGKFDDHW VFWIGPLVGA ILGSLLYNYV LFPPAKSLSE RLAVLKGLEp aqp4_human VIMGNWENHW IYWVGPIIGA VLAGGLYEYV FCPDVEFKRR FKEAFSKaqT aqp4_rat VIMGNWENHW IYWVGPIIGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT aqp4_mouse VIMGNWANHW IYWVGPIMGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT aqp2_rat VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSAKSLQE RLAVLKGLEp aqp2_mouse VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSTKSLQE RLAVLKGLEp wc2a_arath VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV.. aqp6_human IIIGKFTVHW VFWVGPLMGA LLASLIYNFV LFPDTKTLAQ RLAILTGTVE wc2c_arath VIFnpWDDHW IFWVGPFIGA TIAAFYHQFV LRASGSKSLG SFRSAANV.. wc2b_arath VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV.. wc1c_arath IIYnaWDDHW IFWVGPFIGA ALAALYHQLV IRAIPFKSRS .......... wc1b_arath IIFnaWDDHW VFWVGPFIGA ALAALYHVIV IRAIPFKSRS .......... tipw_lyces IIYnaWNDHW IFWVGPMIGA ALAAIYHQII IRAMPFHRS. .......... wc1a_arath IIYnsWDDHW VFWVGPFIGA ALAALYHVVV IRAIPFKSRS .......... tipw_pea IVFngWNDHW IFWVGPFIGA ALAALYHQVV IRAIPFKSK. .......... tipa_arath LVGWRWHDHW IYWVGPFIGS ALAALIYEYM VIPTEPPTHH AHGVHQPLAP aqua_atrca VIyrVWDDHW IFWVGPFVGA LAAAAYHQYV LRAAAIKALG SFRSNPTN.. dip_antma VASGDFSQNW IYWAGPLIGG ALAGFIYGDV FITAHAPLPT SEDYA..... aqpz_ecoli IFQgaLEQLW FFWVVPIVGG IIGGLIYRTL LEKRD..... .......... tip2_tobac VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA..... tip1_tobac VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA..... tipg_arath VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHEQLP TTDY...... bib_drome FVLNKWDSHW VYWFGPLVGG MASGLVYEYI FNSRNRNLRH NKGSIDNDSS tipr_arath VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHTSSS NHRLLN.... tipa_phavu LVGWQWHQHW IFWVGPLLGA ALAALVYEYA VIPIEPPPHH HQPLATEDY. tipg_orysa LVSWSWESQW VYWVGPLIGG GLAGVIYEVL FISHTHEQLP TTDY...... 251 269 predict_h258 EEYDLDADDI NSRVEMKPK aqp1_rat EEYDLDADDI NSRVEMKPK aqp1_mouse EEYDLDADDI NSRVEMKPK aqp1_human EEYDLDADDI NSRVEMKPK aqp1_bovin EEYDLDADDI NSRVEMKPK aqp1_sheep EEYDLDADDI NSRVEMKPK aqpa_ranes EEYELDGDD. NTRVEMKPK aqp2_dasno .......... ......... aqp2_bovin .......... ......... aqp2_canfa .......... ......... aqp2_rabit .......... ......... aqp2_elema .......... ......... aqp2_horse .......... ......... aqp2_proha .......... ......... mip_rat SDSNGQPEGT GEPVELKTQ aqp2_oryaf .......... ......... mip_mouse SDSNGQPEGT GEPVELKTQ mip_ranpi AEPEGQQEAT GEPIELKTQ mip_bovin SESNGQPEVT GEPVELKTQ mip_human DVSNGQPEVT GEPVELNTQ mip_chick PAAAPPPEPP AEPLELKTQ aqp5_rat PEEDWEDHRE ERKKTIELT aqp5_human PDEDWEEQRE ERKKTMELT aqp2_human tDWEEREVRR RQSVELHSP aqp4_human KGSYMEVEDN RSQVETDDL aqp4_rat KGSYMEVEDN RSQVETEDL aqp4_mouse KGSYMEVEDN RSQVETEDL aqp2_rat tDWEEREVRR RQSVELHSP aqp2_mouse tDWEEREVRR RQSVELHSP wc2a_arath .......... ......... aqp6_human VGTGARAGAE PLKKESQPG wc2c_arath .......... ......... wc2b_arath .......... ......... wc1c_arath .......... ......... wc1b_arath .......... ......... tipw_lyces .......... ......... wc1a_arath .......... ......... tipw_pea .......... ......... tipa_arath EDY....... ......... aqua_atrca .......... ......... dip_antma .......... ......... aqpz_ecoli .......... ......... tip2_tobac .......... ......... tip1_tobac .......... ......... tipg_arath .......... ......... bib_drome SIHSEDELNY DMDMEKPNK tipr_arath .......... ......... tipa_phavu .......... ......... tipg_orysa .......... ......... ________________________________________________________________________________ Prediction of: - secondary structure, by PHDsec - solvent accessibility, by PHDacc - and helical transmembrane regions, by PHDhtm PHD: Profile fed neural network systems from HeiDelberg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Predict-Help@EMBL-Heidelberg.DE All rights reserved. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Secondary structure prediction by PHDsec: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method ~~~~~~~~~~~~~~~~~~~~~~~ The network procedure is described in detail in: 1) Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, 232, 584-599. A brief description is given in: Rost, Burkhard; Sander, Chris: Improved prediction of protein secondary structure by use of se- quence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562. The PHD mail server is described in: 2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard: PHD - an automatic mail server for protein secondary structure prediction. CABIOS, 1994, 10, 53-60. The latest improvement steps (up to 72%) are explained in: 3) Rost, Burkhard; Sander, Chris: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994, 19, 55-72. To be quoted for publications of PHD output: Papers 1-3 for the prediction of secondary structure and the pre- diction server. About the input to the network ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The prediction is performed by a system of neural networks. The input is a multiple sequence alignment. It is taken from an HSSP file (produced by the program MaxHom: Sander, Chris & Schneider, Reinhard: Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins, 1991, 9, 56-68. For optimal results the alignment should contain sequences with varying degrees of sequence similarity relative to the input protein. The following is an ideal situation: +-----------------+----------------------+ | sequence: | sequence identity | +-----------------+----------------------+ | target sequence | 100 % | | aligned seq. 1 | 90 % | | aligned seq. 2 | 80 % | | ... | ... | | aligned seq. 7 | 30 % | +-----------------+----------------------+ Estimated Accuracy of Prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A careful cross validation test on some 250 protein chains (in total about 55,000 residues) with less than 25% pairwise sequence identity gave the following results: ++================++-----------------------------------------+ || Qtotal = 72.1% || ("overall three state accuracy") | ++================++-----------------------------------------+ +----------------------------+-----------------------------+ | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% | | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% | | Qloop (% of observed)=79% | Qloop (% of predicted)=72% | +----------------------------+-----------------------------+ .......................................................................... These percentages are defined by: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | number of correctly predicted residues |Qtotal = --------------------------------------- (*100) | number of all residues | | no of res correctly predicted to be in helix |Qhelix (% of obs) = -------------------------------------------- (*100) | no of all res observed to be in helix | | | no of res correctly predicted to be in helix |Qhelix (% of pred)= -------------------------------------------- (*100) | no of all residues predicted to be in helix .......................................................................... Averaging over single chains ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most reasonable way to compute the overall accuracies is the above quoted percentage of correctly predicted residues. However, since the user is mainly interested in the expected performance of the prediction for a particular protein, the mean value when averaging over protein chains might be of help as well. Computing first the three state accuracy for each protein chain, and then averaging over 250 chains yields the following average: +-------------------------------====--+ | Qtotal/averaged over chains = 72.2% | +-------------------------------====--+ | standard deviation = 9.3% | +-------------------------------------+ .......................................................................... Further measures of performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthews correlation coefficient: +---------------------------------------------+ | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 | +---------------------------------------------+ .......................................................................... Average length of predicted secondary structure segments: . +------------+----------+ . | predicted | observed | +-----------+------------+----------+ | Lhelix = | 10.3 | 9.3 | | Lstrand = | 5.0 | 5.3 | | Lloop = | 7.2 | 5.9 | +-----------+------------+----------+ .......................................................................... The accuracy matrix in detail: +---------------------------------------+ | number of residues with H, E, L | +---------+------+------+------+--------+ | |net H |net E |net L |sum obs | +---------+------+------+------+--------+ | obs H |12447 | 1255 | 3990 | 17692 | | obs E | 949 | 7493 | 3750 | 12192 | | obs L | 2604 | 2875 |19962 | 25441 | +---------+------+------+------+--------+ | sum Net |16000 |11623 |27702 | 55325 | +---------+------+------+------+--------+ Note: This table is to be read in the following manner: 12447 of all residues predicted to be in helix, were observed to be in helix, 949 however belong to observed strands, 2604 to observed loop regions. The term "observed" refers to the DSSP assignment of secondary structure calculated from 3D coordinates of experimentally determined structures (Dictionary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, 2577-2637). Position-specific reliability index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The network predicts the three secondary structure types using real numbers from the output units. The prediction is assigned by choosing the maximal unit ("winner takes all"). However, the real numbers contain additional information. E.g. the difference between the maximal and the second largest output unit can be used to derive a "reliability index". This index is given for each residue along with the prediction. The index is scaled to have values between 0 (lowest reliability), and 9 (highest). The accuracies (Qtot) to be expected for residues with values above a particular value of the index are given below as well as the fraction of such residues (%res).: +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | | | | | | | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2| | | | | | | | | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4| | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1| | | | | | | | | | | | | | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4| | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ The above table gives the cumulative results, e.g. 62.5% of all residues have a reliability of at least 5. The overall three-state accuracy for this subset of almost two thirds of all residues is 82.9%. For this subset, e.g., 83.1% of the observed helices are correctly predicted, and 86.9% of all residues predicted to be in helix are correct. .......................................................................... The following table gives the non-cumulative quantities, i.e. the values per reliability index range. These numbers answer the question: how reliable is the prediction for all residues labeled with the particular index i. +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | index| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10.5| 12.5| 15.7| 14.1| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | | | | | | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2| | | | | | | | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4| | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1| | | | | | | | | | | | | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4| | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ For example, for residues with Relindex = 5 64% of all predicted betha- strand residues are correctly identified. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Solvent accessibility prediction by PHDacc: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method ~~~~~~~~~~~~~~~~~~~~~~~ The network for prediction of secondary structure is described in detail in: Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, 232, 584-599. The analysis of the prediction of solvent exposure is given in: Rost, Burkhard; Sander, Chris: Conservation and prediction of solvent accessibility in protein families. Proteins, 1994, 20, 216-226. To be quoted for publications of PHD exposure prediction: Both papers quoted above. Definition of accessibility ~~~~~~~~~~~~~~~~~~~~~~~~~~ For training the residue solvent accessibility the DSSP (Dictionary of Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22, 2577-2637) values of accessible surface area have been used. The prediction provides values for the relative solvent accessibility. The normalisation is the following: | ACCESSIBILITY (from DSSP in Angstrom) |RELATIVE_ACCESSIBILITY = ------------------------------------- * 100 | MAXIMAL_ACC (amino acid type i) where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i. The maximal values are: +----+----+----+----+----+----+----+----+----+----+----+----+ | A | B | C | D | E | F | G | H | I | K | L | M | | 106| 160| 135| 163| 194| 197| 84| 184| 169| 205| 164| 188| +----+----+----+----+----+----+----+----+----+----+----+----+ | N | P | Q | R | S | T | V | W | X | Y | Z | | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196| +----+----+----+----+----+----+----+----+----+----+----+ Notation: one letter code for amino acid, B stands for D or N; Z stands for E or Q; and X stands for undetermined. The relative solvent accessibility can be used to estimate the number of water molecules (W) in contact with the residue: W = ACCESSIBILITY /10 The prediction is given in 10 states for relative accessibility, with RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC) where PREDICTED_ACC = 0 - 9. Estimated Accuracy of Prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A careful cross validation test on some 238 protein chains (in total about 62,000 residues) with less than 25% pairwise sequence identity gave the following results: Correlation ........... The correlation between observed and predicted solvent accessibility is: ----------- corr = 0.53 ----------- This value ought to be compared to the worst and best case prediction scenario: random prediction (corr = 0.0) and homology modelling (corr = 0.66). (Note: homology modelling yields a relative accurate prediction in 3D if, and only if, a significantly identical sequence has a known 3D structure.) 3-state accuracy ................ Often the relative accessibility is projected onto, e.g., 3 states: b = buried (here defined as < 9% relative accessibility), i = intermediate ( 9% <= rel. acc. < 36% ), e = exposed ( rel. acc. >= 36% ). A projection onto 3 states or 2 states (buried/exposed) enables the compilation of a 3- and 2-state prediction accuracy. PHD reaches an overall 3-state accuracy of: Q3 = 57.5% (compared to 35% for random prediction and 70% for homology modelling). In detail: +-----------------------------------+-------------------------+ | Qburied (% of observed)=77% | Qb (% of predicted)=60% | | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% | | Qexposed (% of observed)=78% | Qe (% of predicted)=56% | +-----------------------------------+-------------------------+ 10-state accuracy ................. The network predicts relative solvent accessibility in 10 states, with state i (i = 0-9) corresponding to a relative solvent accessibility of i*i %. The 10-state accuracy of the network is: Q10 = 24.5% .......................................................................... These percentages are defined by: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | number of correctly predicted residues |Q3 = --------------------------------------- (*100) | number of all residues | | no of res. correctly predicted to be buried |Qburied (% of obs) = ------------------------------------------- (*100) | no of all res. observed to be buried | | | no of res. correctly predicted to be buried |Qburied (% of pred)= ------------------------------------------- (*100) | no of all residues predicted to be buried .......................................................................... Averaging over single chains ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The most reasonable way to compute the overall accuracies is the above quoted percentage of correctly predicted residues. However, since the user is mainly interested in the expected performance of the prediction for a particular protein, the mean value when averaging over protein chains might be of help as well. Computing first the correlation between observed and predicted accessibility for each protein chan, and then averaging over all 238 chains yields the following average: +-------------------------------====--+ | corr/averaged over chains = 0.53 | +-------------------------------====--+ | standard deviation = 0.11 | +-------------------------------------+ .......................................................................... Further details of performance accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The accuracy matrix in detail: .............................. -------+----------------------------------------------------+----------- \ PHD | 0 1 2 3 4 5 6 7 8 9 | SUM %obs -------+----------------------------------------------------+----------- OBS 0 | 8611 140 8 44 82 169 772 334 27 0 | 10187 16.6 OBS 1 | 4367 164 0 50 106 231 738 346 44 3 | 6049 9.8 OBS 2 | 3194 168 1 68 125 303 951 513 42 7 | 5372 8.7 OBS 3 | 2760 159 8 80 136 327 1246 746 58 19 | 5539 9.0 OBS 4 | 2312 144 2 72 166 396 1615 1245 124 19 | 6095 9.9 OBS 5 | 1873 96 3 84 138 425 1979 1834 187 27 | 6646 10.8 OBS 6 | 1387 67 1 60 80 278 2237 2627 231 51 | 7019 11.4 OBS 7 | 1082 35 0 32 56 225 1871 3107 302 60 | 6770 11.0 OBS 8 | 660 25 0 27 43 136 1206 2374 325 87 | 4883 7.9 OBS 9 | 325 20 2 27 29 74 648 1159 366 214 | 2864 4.7 -------+----------------------------------------------------+----------- SUM |26571 1018 25 544 961 2564 13263 14285 1706 487 | -------+----------------------------------------------------+----------- Note: This table is to be read in the following manner: 8611 of all residues predicted to be in exposed by 0%, were observed with 0% relative accessibility. However, 325 of all residues predicted to have 0% are observed as completely exposed (obs = 9 -> rel. acc. >= 81%). The term "observed" refers to the DSSP compilation of area of solvent accessibility calculated from 3D coordinates of experimentally determined structures (Diction- ary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, 2577-2637). Accuracy for each amino acid: ............................. +---+------------------------------+-----+-------+------+ |AA | Q3 b%o b%p i%o i%p e%o e%p | Q10 | corr | N | +---+------------------------------+-----+-------+------+ | A | 59.0 87 60 2 38 66 57 | 31 | 0.530 | 5054 | | C | 62.0 91 67 5 39 25 21 | 34 | 0.244 | 893 | | D | 56.5 21 45 6 49 94 57 | 20 | 0.321 | 3536 | | E | 60.8 9 40 3 41 98 61 | 21 | 0.347 | 3743 | | F | 63.3 94 67 9 46 29 37 | 27 | 0.366 | 2436 | | G | 52.1 75 51 1 31 67 53 | 22 | 0.405 | 4787 | | H | 50.9 63 53 23 45 71 50 | 18 | 0.442 | 1366 | | I | 64.9 95 68 6 41 30 38 | 34 | 0.360 | 3437 | | K | 66.6 2 11 2 37 98 67 | 23 | 0.267 | 3652 | | L | 61.6 93 65 8 44 31 40 | 31 | 0.368 | 5016 | | M | 60.1 92 64 5 39 45 44 | 29 | 0.452 | 1371 | | N | 55.5 45 45 8 38 87 59 | 17 | 0.410 | 2923 | | P | 53.0 48 48 9 39 83 56 | 18 | 0.364 | 2920 | | Q | 54.3 27 44 7 44 92 56 | 20 | 0.344 | 2225 | | R | 49.9 15 47 36 47 76 51 | 18 | 0.372 | 2765 | | S | 55.6 69 53 3 51 81 56 | 22 | 0.464 | 3981 | | T | 51.8 61 51 8 38 78 53 | 21 | 0.432 | 3740 | | V | 61.1 93 65 5 40 39 42 | 34 | 0.418 | 4156 | | W | 56.2 85 62 20 49 29 27 | 21 | 0.318 | 891 | | Y | 49.7 73 52 33 49 36 38 | 19 | 0.359 | 2301 | +---+------------------------------+-----+-------+------+ Abbreviations: AA: amino acid in one-letter code b%o, i%o, e%o: = Qburied, Qintermediate, Qexposed (% of observed), i.e. percentage of correct prediction in each state, see above b%p, i%p, e%p: = Qburied, Qintermediate, Qexposed (% of predicted), i.e. probability of correct prediction in each state, see above b%o: = Qburied (% of observed), see above Q10: percentage of correctly predicted residues in each of the 10 states of predicted relative accessibility. corr: correlation between predicted and observed rel. acc. N: number of residues in data set Accuracy for different secondary structure: ........................................... +--------+------------------------------+----+-------+-------+ | type | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | N | +--------+------------------------------+----+-------+-------+ | helix | 59.5 79 64 8 44 80 56 | 27 | 0.574 | 20100 | | strand | 61.3 84 73 9 46 69 37 | 35 | 0.524 | 13356 | | loop | 54.4 64 43 11 44 78 61 | 18 | 0.442 | 27968 | +--------+------------------------------+----+-------+-------+ Abbreviations as before. Position-specific reliability index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The network predicts the 10 states for relative accessibility using real numbers from the output units. The prediction is assigned by choosing the maximal unit ("winner takes all"). However, the real numbers contain additional information. E.g. the difference between the maximal and the second largest output unit (with the constraint that the second largest output is compiled among all units at least 2 positions off the maximal unit) can be used to derive a "reliability index". This index is given for each residue along with the prediction. The index is scaled to have values between 0 (lowest reliability), and 9 (highest). The accuracies (Q3, corr, asf.) to be expected for residues with values above a particular value of the index are given below as well as the fraction of such residues (%res).: +---+------------------------------+----+-------+-------+ |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | +---+------------------------------+----+-------+-------+ | 0 | 57.5 77 60 9 44 78 56 | 24 | 0.535 | 100.0 | | 1 | 59.1 76 63 9 45 82 57 | 25 | 0.560 | 91.2 | | 2 | 61.7 79 66 4 47 87 58 | 27 | 0.594 | 77.1 | | 3 | 66.6 87 70 1 51 89 63 | 30 | 0.650 | 57.1 | | 4 | 70.0 89 72 0 83 91 67 | 32 | 0.686 | 45.8 | | 5 | 72.9 92 75 0 0 93 70 | 34 | 0.722 | 35.6 | | 6 | 76.3 95 77 0 0 93 75 | 36 | 0.769 | 24.7 | | 7 | 79.0 97 79 0 0 93 78 | 39 | 0.803 | 16.0 | | 8 | 80.9 98 80 0 0 91 81 | 43 | 0.824 | 9.6 | | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | +---+------------------------------+----+-------+-------+ Abbreviations as before. The above table gives the cumulative results, e.g. 45.8% of all residues have a reliability of at least 4. The correlation for this most reliably predicted half of the residues is 0.686, i.e. a value comparable to what could be expected if homology modelling were possible. For this subset of 45.8% of all residues, 89% of the buried residues are correctly predicted, and 72% of all residues predicted to be buried are correct. .......................................................................... The following table gives the non-cumulative quantities, i.e. the values per reliability index range. These numbers answer the question: how reliable is the prediction for all residues labeled with the particular index i. +---+------------------------------+----+-------+-------+ |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | +---+------------------------------+----+-------+-------+ | 0 | 40.9 79 40 16 41 21 40 | 14 | 0.175 | 8.8 | | 1 | 45.4 61 46 28 44 48 44 | 17 | 0.278 | 14.1 | | 2 | 47.4 53 52 10 46 80 44 | 19 | 0.343 | 19.9 | | 3 | 52.9 75 59 4 50 77 47 | 23 | 0.439 | 11.4 | | 4 | 60.0 81 63 0 83 84 56 | 25 | 0.547 | 10.1 | | 5 | 65.2 82 70 0 0 93 62 | 28 | 0.607 | 10.9 | | 6 | 71.3 90 72 0 0 94 70 | 31 | 0.692 | 8.8 | | 7 | 76.0 94 76 0 0 95 75 | 34 | 0.762 | 6.3 | | 8 | 80.5 97 81 0 0 94 79 | 39 | 0.808 | 3.8 | | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | +---+------------------------------+----+-------+-------+ For example, for residues with RI = 4 83% of all predicted intermediate residues are correctly predicted as such. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prediction of helical transmembrane segments by PHDhtm: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Author: Burkhard Rost EMBL, Heidelberg, FRG Meyerhofstrasse 1, 69 117 Heidelberg Internet: Rost@EMBL-Heidelberg.DE All rights reserved. About the network method ~~~~~~~~~~~~~~~~~~~~~~~ The PHD mail server is described in: Rost, Burkhard; Sander, Chris; Schneider, Reinhard: PHD - an automatic mail server for protein secondary structure prediction. CABIOS, 1994, 10, 53-60. To be quoted for publications of PHDhtm output: Rost, Burkhard; Casadio, Rita; Fariselli, Piero; Sander, Chris: Prediction of helical transmembrane segments at 95% accuracy. Protein Science, 1995, 4, 521-533. Estimated Accuracy of Prediction ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A cross validation test on 69 helical trans-membrane proteins (in total about 30,000 residues) with less than 25% pairwise sequence identity gave the following results: ++================++-----------------------------------------+ || Qtotal = 94.7% || ("overall two state accuracy") | ++================++-----------------------------------------+ +----------------------------+-----------------------------+ | Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% | | Qloop (% of observed)=96% | Qloop (% of predicted)=97% | +----------------------------+-----------------------------+ .......................................................................... These percentages are defined by: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | number of correctly predicted residues |Qtotal = --------------------------------------- (*100) | number of all residues | | no of res correctly predicted to be in helix |Qhelix (% of obs) = -------------------------------------------- (*100) | no of all res observed to be in helix | | | no of res correctly predicted to be in helix |Qhelix (% of pred)= -------------------------------------------- (*100) | no of all residues predicted to be in helix .......................................................................... Further measures of performance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Matthews correlation coefficient: +---------------------------------------------+ | Chelix = 0.84, Cloop = 0.84 | +---------------------------------------------+ .......................................................................... Average length of predicted secondary structure segments: | +------------+----------+ | | predicted | observed | +-----------+------------+----------+ | Lhelix = | 24.6 | 22.2 | +-----------+------------+----------+ .......................................................................... The accuracy matrix in detail: +---------------------------------+ | number of residues with H, L | +---------+------+-------+--------+ | |net H | net L |sum obs | +---------+------+-------+--------+ | obs H | 5214 | 492 | 5706 | | obs L | 1050 | 22423 | 23473 | +---------+------+-------+--------+ | sum Net | 6264 | 22915 | 29179 | +---------+------+-------+--------+ Note: This table is to be read in the following manner: 5214 of all residues predicted to be in a helical trans-membrane region, were observed to be in the lipid bilayer, 1050 however were observed either inside or outside of the protein, i.e. in loop (or non-membrane) regions. The term "observed" refers to DSSP assignment of secondary structure calculated from 3D coordinates of experimentally determined structures (Dictionary of Secondary Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, 2577-2637) where these were available. For all other proteins, the assignment of trans-membrane segments has been taken from the Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992). .......................................................................... Overlap between predicted and observed segments: +-----------------+---------------+----------------+ | segment overlap | % of observed | % of predicted | | Sov helix | 95.6% | 95.5% | | Sov loop | 83.6% | 97.2% | +-----------------+---------------+----------------+ | Sov total | 86.0% | 96.8% | +-----------------+---------------+----------------+ Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26. As helical trans-membrane segments are longer than globular heli- ces, correctly predicted segments can easily be made out. PHDhtm misses 5 out of 258 observed segments, predicts 6 where non is observed and 3 times the predicted helical segment overlaps two observed regions. Thus, in total more than 95% of all segments are correctly predicted. .......................................................................... Entropy of prediction (information measure): +-----------------+ | I = 0.64 | +-----------------+ (For comparison: homology modelling of globular proteins in three states: I=0.62.) Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26. Position-specific reliability index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The network predicts two states: helical trans-membrane region and rest using two output units. The prediction is assigned by choosing the ma- ximal unit ("winner takes all"). However, the real numbers of the out- put units contain additional information. E.g. the difference between the two output units can be used to derive a "reliability index". This index is given for each residue along with the prediction. The index is scaled to have values between 0 (lowest reliability), and 9 (highest). The accuracies (Qtot) to be expected for residues with values above a particular value of the index are given below as well as the fraction of such residues (%res).: +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | | | | | | | | | | | | | Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8| | | | | | | | | | | | | +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7| | L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0| | | | | | | | | | | | | | H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1| | L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9| +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ The above table gives the cumulative results, e.g. 92.3% of all residues have a reliability of at least 5. The overall two-state accuracy for this subset is 97.2%. For this subset, e.g., 95.7% of the observed helical trans-membrane residues are correctly predicted, and 89.7% of all residues predicted to be in helical trans-membrane segment are correct. The resulting network (PHD) prediction is: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ________________________________________________________________________________ PHD: Profile fed neural network systems from HeiDelberg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Prediction of: secondary structure, by PHDsec solvent accessibility, by PHDacc and helical transmembrane regions, by PHDhtm Author: Burkhard Rost EMBL, 69012 Heidelberg, Germany Internet: Rost@EMBL-Heidelberg.DE All rights reserved. The network systems are described in: PHDsec: B Rost & C Sander: JMB, 1993, 232, 584-599. B Rost & C Sander: Proteins, 1994, 19, 55-72. PHDacc: B Rost & C Sander: Proteins, 1994, 20, 216-226. PHDhtm: B Rost et al.: Prot. Science, 1995, 4, 521-533. Some statistics ~~~~~~~~~~~~~~~ Percentage of amino acids: +--------------+--------+--------+--------+--------+--------+ | AA: | L | A | S | G | I | | % of AA: | 13.0 | 10.0 | 9.7 | 8.9 | 8.6 | +--------------+--------+--------+--------+--------+--------+ | AA: | V | R | T | F | D | | % of AA: | 7.8 | 5.2 | 4.5 | 4.5 | 4.5 | +--------------+--------+--------+--------+--------+--------+ | AA: | N | Q | E | P | K | | % of AA: | 4.1 | 3.0 | 3.0 | 2.6 | 2.6 | +--------------+--------+--------+--------+--------+--------+ | AA: | Y | M | W | H | C | | % of AA: | 1.9 | 1.9 | 1.5 | 1.5 | 1.5 | +--------------+--------+--------+--------+--------+--------+ Percentage of secondary structure predicted: +--------------+--------+--------+--------+ | SecStr: | H | E | L | | % Predicted: | 43.9 | 16.7 | 39.4 | +--------------+--------+--------+--------+ According to the following classes: all-alpha: %H>45 and %E< 5; all-beta : %H<5 and %E>45 alpha-beta : %H>30 and %E>20; mixed: rest, this means that the predicted class is: mixed class PHD output for your protein ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tue Nov 24 17:44:57 1998 Jury on: 10 different architectures (version 5.94_317 ). Note: differently trained architectures, i.e., different versions can result in different predictions. About the protein ~~~~~~~~~~~~~~~~~ HEADER /home/phd/server/work/predict_h25873-220 COMPND SOURCE AUTHOR SEQLENGTH 269 NCHAIN 1 chain(s) in predict_h25873-22040 data set NALIGN 48 (=number of aligned sequences in HSSP file) Abbreviations: PHDsec ~~~~~~~~~~~~~~~~~~~~~ sequence: AA : amino acid sequence secondary structure: HEL: H=helix, E=extended (sheet), blank=other (loop) PHD: Profile network prediction HeiDelberg Rel: Reliability index of prediction (0-9) detail: prH: 'probability' for assigning helix prE: 'probability' for assigning strand prL: 'probability' for assigning loop note: the 'probabilites' are scaled to the interval 0-9, e.g., prH=5 means, that the first output node is 0.5-0.6 subset: SUB: a subset of the prediction, for all residues with an expected average accuracy > 82% (tables in header) note: for this subset the following symbols are used: L: is loop (for which above " " is used) ".": means that no prediction is made for this residue, as the reliability is: Rel < 5 Abbreviations: PHDacc ~~~~~~~~~~~~~~~~~~~~~ SS : secondary structure HEL: H=helix, E=extended (sheet), blank=other (loop) solvent accessibility: 3st: relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%. PHD: Profile network prediction HeiDelberg Rel: Reliability index of prediction (0-9) O_3: observed relative acc. in 3 states: B, I, E note: for convenience a blank is used intermediate (i). P_3: predicted relative accessibility in 3 states 10st:relative accessibility in 10 states: = n corresponds to a relative acc. of n*n % subset: SUB: a subset of the prediction, for all residues with an expected average correlation > 0.69 (tables in header) note: for this subset the following symbols are used: "I": is intermediate (for which above " " is used) ".": means that no prediction is made for this residue, as the reliability is: Rel < 4 Abbreviations: PHDhtm ~~~~~~~~~~~~~~~~~~~~~ secondary structure: HL: T=helical transmembrane region, blank=other (loop) PHD: Profile network prediction HeiDelberg PHDF:filtered prediction, i.e., too long transmembrane segments are split, too short ones are deleted Rel: Reliability index of prediction (0-9) detail: prH: 'probability' for assigning helical transmembrane region prL: 'probability' for assigning loop note: the 'probabilites' are scaled to the interval 0-9, e.g., prH=5 means, that the first output node is 0.5-0.6 subset: SUB: a subset of the prediction, for all residues with an expected average accuracy > 82% (tables in header) note: for this subset the following symbols are used: L: is loop (for which above " " is used) ".": means that no prediction is made for this residue, as the reliability is: Rel < 5 protein: predict length 269 ....,....1....,....2....,....3....,....4....,....5....,....6 AA |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI| PHD sec | HHHHHHHHHHHHHHHHHHHHHHHHHHEE HHHHHHHHHHHHH| Rel sec |998443148899999999999998997676530312469989998623353579999999| detail: prH sec |001223468899999999999998888777653112210000000145566788999999| prE sec |000011000000000000000001001111233542100000000000323211000000| prL sec |998665420100000000000000000011112244578988998753100000000000| subset: SUB sec |LLL.....HHHHHHHHHHHHHHHHHHHHHHH......LLLLLLLLL...H.HHHHHHHHH| ACCESSIBILITY 3st: P_3 acc |eeeebee bbb bbbbbbbbbbbbbbbbbbbbbebeee eeeeeeeeebbbbbbbbbbbb| 10st: PHD acc |997706650005000000000000000000000607775779776677000000000000| Rel acc |735421110541467608662789996343122133420454330023453975664547| subset: SUB acc |e.ee.....bb.bbbb.bbb.bbbbbb.b.......e..eee......bb.bbbbbbbbb| ....,....7....,....8....,....9....,....10...,....11...,....12 AA |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL| PHD sec |HHHHHHHHHE HHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH | Rel sec |999996412122653167703135552356779999999999999999999998467213| detail: prH sec |998986544334223477843456665567779999999999999999999998611343| prE sec |001001123420010000145432101221110000000000000000000000000000| prL sec |000001232245765521000000123210000000000000000000000000278555| subset: SUB sec |HHHHHH......LL..HHH....HHH..HHHHHHHHHHHHHHHHHHHHHHHHHH.LL...| ACCESSIBILITY 3st: P_3 acc |bbbbebbbebbbbbb bbbbbbbbbbbebbbbbbbbbbbbbbbbbbbbbbbbeebbeeeb| 10st: PHD acc |000060006000000500000000000600000000000000000000000067006760| Rel acc |456515321655013144869663400154551757478936465465467713401400| subset: SUB acc |bbbb.b...bbb....bbbbbbb.b...bbbb.bbbbbbb.bbbbbbbbbbb..b..e..| ....,....13...,....14...,....15...,....16...,....17...,....18 AA |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH| PHD sec | HHH EEEEEEEEEEEEEEEEEEE E E HHHHHH| Rel sec |359985212134223651899898866789799875436658889963211351457756| detail: prH sec |320002345432332111000000000000100000221120000000001113567767| prE sec |100000000000011014899888877789789886100000000013544222221111| prL sec |568986543466545763100000011100000112567768889975454564210111| subset: SUB sec |.LLLLL.........LL.EEEEEEEEEEEEEEEEEE..LLLLLLLLL.....L..HHHHH| ACCESSIBILITY 3st: P_3 acc |eeebbbebbbeebeebeebbbbbbbbbbbbbbbbbbbeeeeeeeebbbbbbbbbbbbbbb| 10st: PHD acc |677000600077076077000000000000000000077767767000000000000000| Rel acc |133100124043040233247198656399879530035414413123255869586654| subset: SUB acc |........b.e..e.....bb.bbbbb.bbbbbb....ee.ee......bbbbbbbbbbb| ....,....19...,....20...,....21...,....22...,....23...,....24 AA |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD| PHD sec |HEEEE E HHHEEEE EEEEEE HHHHHHHHHHHHHEEEEE | Rel sec |321341126989622145152653534229996251699999999973147525556642| detail: prH sec |521100000000145432463121122000000114789999999875421111121124| prE sec |244564431000000000015765121358997510000000000013467642110000| prL sec |233234457889754567411012655530002364200000000010010136667765| subset: SUB sec |........LLLLL....H.H.EE.L....EEEE.L.HHHHHHHHHHH...EE.LLLLL..| ACCESSIBILITY 3st: P_3 acc |bbbbebbbbbbebb bbbbbbbbeebeebbbbbbbbbbbbbbbbbbbbbbbbeeeee ee| 10st: PHD acc |000060000006005000000007606600000000000000000000000076777577| Rel acc |754424240102242141047612131118967874356346635751777031345044| subset: SUB acc |bbbb.b.b.....b..b..bbb.......bbbbbbb.bb.bbb.bbb.bbb....ee.ee| ....,....25...,....26...,....27...,....28...,....29...,....30 AA |RMKVWTSGQVEEYDLDADDINSRVEMKPK| PHD sec |HHHHHH | Rel sec |66775259975467555457776422699| detail: prH sec |77887520012221222221111100000| prE sec |00000000000000000000001233200| prL sec |11112379987678777678887655799| subset: SUB sec |HHHHH.LLLLL.LLLLL.LLLLL...LLL| ACCESSIBILITY 3st: P_3 acc |ebebbeeeeeeeeeeeeeeeeeebeeeee| 10st: PHD acc |60700787677777677777767067789| Rel acc |10411563134335144444514212559| subset: SUB acc |..e..ee...e..e.eeeeee.e...eee| PHDhtm Helical transmembrane prediction note: PHDacc and PHDsec are reliable for water- soluble globular proteins, only. Thus, please take the predictions above with particular caution wherever transmembrane helices are predicted by PHDhtm! PHDhtm --- --- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION: SYMBOLS --- AA : amino acid in one-letter code --- PHD htm : HTM's predicted by the PHD neural network --- system (T=HTM, ' '=not HTM) --- Rel htm : Reliability index of prediction (0-9, 0 is low) --- detail : Neural network output in detail --- prH htm : 'Probability' for assigning a helical trans- --- membrane region (HTM) --- prL htm : 'Probability' for assigning a non-HTM region --- note: 'Probabilites' are scaled to the interval --- 0-9, e.g., prH=5 means, that the first --- output node is 0.5-0.6 --- subset : Subset of more reliable predictions --- SUB htm : All residues for which the expected average --- accuracy is > 82% (tables in header). --- note: for this subset the following symbols are used: --- L: is loop (for which above ' ' is used) --- '.': means that no prediction is made for this, --- residue as the reliability is: Rel < 5 --- other : predictions derived based on PHDhtm --- PHDFhtm : filtered prediction, i.e., too long HTM's are --- split, too short ones are deleted --- PHDRhtm : refinement of neural network output --- PHDThtm : topology prediction based on refined model --- symbols used: --- i: intra-cytoplasmic --- T: transmembrane region --- o: extra-cytoplasmic --- --- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION ....,....1....,....2....,....3....,....4....,....5....,....6 AA |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI| PHD htm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTT| detail: | | prH htm |000000000001136788999999999988875321110000000123678889999988| prL htm |999999999998863211000000000011124678889999999876321110000011| other: | | PHDFhtm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTT| PHDRhtm | TTTTTTTTTTTTTTTTTT TTTTTTTTTTT| PHDThtm |iiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTToooooooooooooooooTTTTTTTTTTT| subset: | | SUB htm |............................................................| ....,....7....,....8....,....9....,....10...,....11...,....12 AA |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL| PHD htm |TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | detail: | | prH htm |888888877777666677788888888888888888888888888888888876543211| prL htm |111111122222333322211111111111111111111111111111111123456788| other: | | PHDFhtm |TTTTTTTTTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | PHDRhtm |TTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTT | PHDThtm |TTTTTTTTiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTTTTTTTTToooooooooooooo| subset: | | SUB htm |............................................................| ....,....13...,....14...,....15...,....16...,....17...,....18 AA |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH| PHD htm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTT| detail: | | prH htm |000000000001234567788888999988887643211111111235788899998888| prL htm |999999999998765432211111000011112356788888888764211100001111| other: | | PHDFhtm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTT| PHDRhtm | TTTTTTTTTTTTTTTTTT TTTTTTTTTTTT| PHDThtm |ooooooooooooooooTTTTTTTTTTTTTTTTTTiiiiiiiiiiiiiiTTTTTTTTTTTT| subset: | | SUB htm |............................................................| ....,....19...,....20...,....21...,....22...,....23...,....24 AA |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD| PHD htm |TTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | detail: | | prH htm |888887765443432233334566777777788888888888888888887542100000| prL htm |111112234556567766665433222222211111111111111111112457899999| other: | | PHDFhtm |TTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | PHDRhtm |TTTTTT TTTTTTTTTTTTTTTTTTT | PHDThtm |TTTTTToooooooooooooooooooooooooTTTTTTTTTTTTTTTTTTTiiiiiiiiii| subset: | | SUB htm |............................................................| ....,....25...,....26...,....27...,....28...,....29...,....30 AA |RMKVWTSGQVEEYDLDADDINSRVEMKPK| PHD htm | | detail: | | prH htm |00000000000000000000000000000| prL htm |99999999999999999999999999999| other: | | PHDFhtm | | PHDRhtm | | PHDThtm |iiiiiiiiiiiiiiiiiiiiiiiiiiiii| subset: | | SUB htm |.............................| --- --- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION END --- ________________________________________________________________________________ ________________________________________________________________________________ ----------------------------------------------------------------------------- --- PredictProtein: NEWS from January, 1997 --- --- --- --- Dear user, --- --- --- --- as of January 1, 1997, EMBL has effectively decided to not --- --- support the PredictProtein service by personal resources. I do --- --- maintain the program, so to speak, in my private time. However, --- --- my contract obliges me to do science, instead. Unfortunately, --- --- the computer environment at EMBL is at the same time starting --- --- to become increasingly unstable. Consequence of these two re- --- --- cent developments is that the PredictProtein service is not as --- --- stable as it was. --- --- --- --- I apologise for the problems this may cause. In particular, --- --- I apologise for my inability to reply to the 20-30 daily, per- --- --- sonal mails, and suggest to re-submit requests after 24 hours! --- --- --- --- Hoping that I shall find a more convenient solution for the --- --- future of the PredictProtein I remain with my best regards, --- --- --- --- Burkhard Rost --- ----------------------------------------------------------------------------- --- PredictProtein: NEWS from April, 1998 --- --- --- -------------------------------- --- --- MOVING PredictProtein --- --- There appears to be light on the horizon! PP will may be having --- --- many hickups over the next months (as I shall leave EMBL). How- --- --- ever, the server seems to have a fair chance of survival thanks --- --- to a major support that is being raised by Columbia University, --- --- New York, U.S.A.). I hope that this will settle the issue for --- --- the years to come ... --- -------------------------------- --- --- WARNING --- --- After a major rewriting of most of the PP code over the last, --- --- I am afraid that not all errors have been traced by me, yet. --- --- Thus, please have mercy and report any bug you'll encounter! --- --- THANKS, Burkhard Rost --- -------------------------------- --- --- NEW PREDICTION DEFAULTS --- --- * Coiled-coil regions: now by default the program COILS written by --- --- Andrei Lupas is run on your sequence. An output is returned if a --- --- coiled-coil region has been detected. --- --- * Functional sequence motifs: now by default the PROSITE database --- --- written by Amos Bairoch, Philip Bucher and Kay Hofmann is scanned --- --- for sequence motifs. An output is returned if any motif has been --- --- detected. --- -------------------------------- --- --- see http://www.embl-heidelberg.de/predictprotein/ppNews.html --- --- for a description of the following new options. --- --- NEW INPUT OPTION --- --- * Your input sequence(s) in FASTA-list format ("# FASTA list ") --- --- NEW OUTPUT OPTIONS --- --- * Return also BLASTP output ("return blast") --- --- * Return prediction additionally in RDB format ("return phd rdb") --- --- * Return topits hssp ("return topits hssp") --- --- * Return topits strip ("return topits strip") --- --- * Return topits own ("return topits own") --- --- * Return no coils ("return no coils") --- --- * Return no prosite ("return no prosite") --- -----------------------------------------------------------------------------