C7orf43
C7orf43 | ||||||
---|---|---|---|---|---|---|
Identifiers | ||||||
Aliases | C7orf43 | |||||
External IDs | MGI: 2385896 HomoloGene: 10106 GeneCards: C7orf43 | |||||
RNA expression pattern | ||||||
More reference expression data | ||||||
Orthologs | ||||||
Species | Human | Mouse | ||||
Entrez | ||||||
Ensembl | ||||||
UniProt | ||||||
RefSeq (mRNA) | ||||||
RefSeq (protein) | ||||||
Location (UCSC) | Chr 7: 100.15 – 100.16 Mb | Chr 5: 138.26 – 138.26 Mb | ||||
PubMed search | [1] | [2] | ||||
Wikidata |
View/Edit Human | View/Edit Mouse |
C7orf43 (Chromosome 7 Open reading frame 43) is a protein that in human is encoded by the gene C7orf43.[3] C7orf43 has no other human alias, but in mice can be found as BC037034.[4]
Gene Locus
In humans, C7orf43 is located in the long arm of human chromosome 7 (7q22.1), and is on the negative (antisense) strand.[3] Genes located around C7orf43 include GAL3ST4, LAMTOR4, GPC2.[3] In humans, C7orf43 has 9 detected common single-nucleotide polymorphisms (SNPs), all of which are located in non-coding regions and thus do not affect amino acid sequence.[5]
mRNA
Splice variants
C7orf43 encodes 2 isoforms, the longest being C7orf43 isoform 1, which is 2585 base pairs long and has with 11 exons and 10 introns.[3] C7orf43 isoform 1 encodes a protein that is 580 amino acids long and only has one polyadenylation site.[3] C7orf43 isoform 2 is 2085 base pairs long and encodes a protein of 311 amino acids. Two additional isoforms has been reported on several occasions, encoding for proteins with 199 and 206 amino acids.[6]
Tissue expression
C7orf43 has a widespread moderate expression with tissue to tissue variability in humans and across mammalian species.[7][8] The mouse C7orf43 ortholog has been shown to be ubiquitously expressed in the brain,[9] as well as in the mouse embryonic central nervous system.[10]
Regulations
C7orf43 has one promoter region upstream of its transcription site, as predicted by Genomatix. This promoter is 657 base pairs long and is located at position 99756182 to 99756838 in the negative strand of chromosome 7.[11] There are several transcription factor binding sites located in this promoter, including binding sites for zinc fingers and Kruppel-like transcription factors.[12] The top 20 transcription binding sites as predicted by the ElDorado from Genomatix is listed in the following table.
Detailed Family Information | Detailed Matrix Information | Start Position | End Position | Anchor Position | Strand | Matrix Similarity Score | Sequence |
---|---|---|---|---|---|---|---|
Brachyury gene, mesoderm developmental factor | T-box transcription factor TBX20 | 617 | 645 | 631 | + | 1 | agcagccggAGGTgtcgggaccctctgga |
C2H2 zinc finger transcription factors 2 | KRAB-containing zinc finger protein 300 | 596 | 618 | 607 | + | 1 | ccggccgCCCCagccgggcgcag |
Fork head domain factors | Alternative splicing variant of FOXP1, activated in ESCs | 37 | 53 | 45 | - | 1 | aaaaaaaAACAaccctt |
Pleomorphic adenoma gene | Pleomorphic adenoma gene 1 | 411 | 433 | 422 | - | 1 | gaGGGGgcggggtcccgctgctc |
Pleomorphic adenoma gene | Pleomorphic adenoma gene 1 | 464 | 486 | 475 | - | 1 | gaGGGGgcgtggccgccgaggcc |
RNA polymerase II transcription factor II B | Transcription factor II B (TFIIB) recognition element | 197 | 203 | 200 | + | 1 | ccgCGCC |
TGF-beta induced apoptosis proteins | Cysteine-serine-rich nuclear protein 1 (AXUD1, AXIN1 up-regulated 1) | 73 | 79 | 76 | - | 1 | AGAGtga |
GC-Box factors SP1/GC | Stimulating protein 1, ubiquitous zinc finger transcription factor | 418 | 434 | 426 | - | 0.998 | ggaggGGGCggggtccc |
Human and murine ETS1 factors | Ets variant 3 | 486 | 506 | 496 | - | 0.996 | gagaaacaGGAAgcggaaggg |
Krueppel like transcription factors | Gut-enriched Krueppel-like factor / KLF4 | 469 | 485 | 477 | - | 0.994 | agggggcGTGGccgccg |
Two-handed zinc finger homeodomain transcription factors | AREB6 (Atp1a1 regulatory element binding factor 6) | 495 | 507 | 501 | + | 0.994 | ttcctGTTTctct |
Zinc finger transcription factor RU49, zinc finger proliferation 1 - Zipro1 | Zinc finger transcription factor RU49 (zinc finger proliferation 1 - Zipro 1). RU49 exhibits a strong preference for binding to tandem repeats of the minimal RU49 consensus binding site. | 522 | 528 | 525 | + | 0.994 | cAGTAcc |
Krueppel like transcription factors | Core promoter-binding protein (CPBP) with 3 Krueppel-type zinc fingers (KLF6, ZF9) | 418 | 434 | 426 | - | 0.992 | ggagGGGGcggggtccc |
C2H2 zinc finger transcription factors 7 | Zinc finger protein 263, ZKSCAN12 (zinc finger protein with KRAB and SCAN domains 12) | 425 | 439 | 432 | + | 0.99 | cgccccCTCCtccac |
C2H2 zinc finger transcription factors 6 | Zinc finger and BTB domain containing 7, Proto-oncogene FBI-1, Pokémon (secondary DNA binding preference) | 252 | 264 | 258 | - | 0.989 | caaGACCaccctg |
Krueppel like transcription factors | Kruppel-like factor 7 (ubiquitous, UKLF) | 416 | 432 | 424 | - | 0.989 | agggGGCGgggtcccgc |
GC-Box factors SP1/GC | Sp4 transcription factor | 471 | 487 | 479 | - | 0.986 | ggagggGGCGtggccgc |
Krueppel like transcription factors | Gut-enriched Krueppel-like factor | 137 | 153 | 145 | + | 0.986 | gggctcAAAGgatcctc |
Krueppel like transcription factors | Krueppel-like factor 2 (lung) (LKLF) | 641 | 657 | 649 | - | 0.986 | cgctaGGGTgggtccag |
Human and murine ETS1 factors | Ets variant 1 | 6 | 26 | 16 | - | 0.984 | ttctcccaGGAAgattctcca |
Protein
Composition and Domains
The human protein C7orf43 has an isoelectric point of 8.94. C7orf43 also has a glycine-rich region spanning amino acids 54 through 134.[13] Analysis using the SAPS tool from the SDSC Biology Workbench showed this glycine-rich region to not be conserved in terms of specific glycine residue positions, but is well conserved in overall glycine content in mammals and reptiles, although not in bony fishes.[14][15] C7orf43 is mostly uncharged, and this neutral charge distribution is conserved in mammals and reptiles, but bony fishes have at least one negative charge cluster [14][15] C7orf43 is predicted to have no signal peptide in its first 70 amino acid residues. However, it is predicted to have a vacuolar targeting motif starting at residue 258 in the human protein.[16] This vacuolar targeting motif is shown to be conserved throughout mammals, reptiles, birds, amphibians, and bony fishes.
Evolutionary history
The C7orf43 protein has no paralogs in humans. However, C7orf43 orthologs can be found to be highly conserved in mammals, reptiles, and several species of bony fishes. C7orf43 is also conserved in birds, although several bird species lack parts of the N-terminus.[17] No C7orf43 orthologs can be found outside the animal kingdom.[17] The following table lists representative C7orf43 orthologs across multiple animal classes.
Strict orthologs
No. | Species | Common Name | Date of Divergence (MYA) | Accession No. | E-value | Length (aa) | Identity (%) | Similarity (%) |
---|---|---|---|---|---|---|---|---|
1 | Homo sapiens | Human | - | NP_060745.3 | 0.0 | 580 | 100 | 100 |
2 | Pan troglodytes | Common Chimpanzee | 6.3 | XP_009452032 | 0.0 | 580 | 99 | 100 |
3 | Macaca mulatta | Macaque | 29.0 | XP_001102238 | 0.0 | 580 | 99 | 99 |
4 | Cavia porcellus | Guinea pig | 92.3 | XP_003470051 | 0.0 | 580 | 98 | 98 |
5 | Sus scrofa | Wild boar | 94.2 | XP_003124386 | 0.0 | 580 | 98 | 99 |
6 | Odobenus rosmarus divergens | Walrus | 94.2 | XP_004399075 | 0.0 | 580 | 98 | 98 |
7 | Tursiops truncates | Common bottlenose dolphin | 94.2 | XP_004315199 | 0.0 | 582 | 92 | 93 |
8 | Echinops telfairi | Lesser hedgehog tenrec | 98.7 | XP_004705644 | 0.0 | 581 | 95 | 97 |
9 | Dasypus novemcinctus | Nine-banded armadillo | 104.2 | XP_004457234 | 0.0 | 580 | 97 | 98 |
10 | Monodelphis domestica | Gray short-tailed opossum | 162.6 | XP_001367097 | 0.0 | 568 | 89 | 92 |
11 | Chrysemys picta bellii | Painted turtle | 296.0 | XP_008175974 | 0.0 | 572 | 76 | 83 |
12 | Alligator mississippiensis | American alligator | 296.0 | XP_006266384 | 0.0 | 582 | 75 | 82 |
13 | Pelodiscus sinensis | Chinese softshell turtle | 296.0 | XP_006127325 | 0.0 | 569 | 73 | 81 |
14 | Xenopus tropicalis | Western clawed frog | 371.2 | NP_001121523 | 0.0 | 580 | 64 | 74 |
15 | Oncorhynchus mykiss | Rainbow trout | 400.1 | CDQ84878 | 0.0 | 581 | 64 | 75 |
16 | Danio rerio | Zebrafish | 400.1 | XP_001339329 | 0.0 | 595 | 63 | 74 |
17 | Oryzias latipes | Japanese rice fish | 400.1 | XP_004076807 | 0.0 | 609 | 62 | 70 |
18 | Takifugu rubripes | Pufferfish | 400.1 | XP_003970822 | 0.0 | 618 | 61 | 71 |
Distant orthologs
No. | Species | Common Name | Date of Divergence (MYA) | Accession No. | E-value | Length (aa) | Identity (%) | Similarity (%) |
---|---|---|---|---|---|---|---|---|
1 | Nipponia Nippon | Crested ibis | 296.0 | XP_009472339 | 0.0 | 503 | 80 | 88 |
2 | Charadrius vociferous | Killdeer | 296.0 | XP_009892747 | 0.0 | 456 | 82 | 90 |
3 | Pseudopodoces humilis | Ground tit | 296.0 | XP_005533426 | 0.0 | 600 | 66 | 76 |
4 | Latimeria chalumnae | West Indian Ocean coelacanth | 414.9 | XP_006011612 | 3E-177 | 429 | 65 | 75 |
5 | Branchiostoma floridae | Florida lancelet | 713.2 | XP_002592972 | 9E-67 | 557 | 32 | 46 |
6 | Strongylocentrotus purpuratus | Purple sea urchin | 742.9 | XP_003727419 | 3E-46 | 725 | 35 | 51 |
7 | Aplysia californica | California sea slug | 782.7 | XP_005113015 | 4E-21 | 692 | 25 | 39 |
8 | Nematostella vectensis | Starlet sea anemone | 855.3 | XP_001632706 | 4E-19 | 494 | 24 | 39 |
9 | Trichoplax adhaerens | - | - | XP_002108809 | 5E-15 | 645 | 24 | 41 |
Post-translational modifications
C7orf43 has three phosphorylated sites, Ser 517, Thr 541 and, Ser 546.[13] All three sites are relatively well-conserved throughout mammals, reptiles, birds, amphibians, and bony fishes. The protein has no predicted N-myristoylation, as it has no N-terminal glycine.[18] However, C7orf43 is predicted to have one N-acetylation on a serine residue at the N-terminus.[19]
Secondary structure
The secondary structure of C7orf43 is yet to be determined. However, C7orf43 is predicted to have no transmembrane domain and to eventually be secreted from the cell.[20][21] An analysis using the PELE tool from SDSC Biology Workbench predicted mostly beta sheets and random coils that are conserved throughout the strict orthologs.[15] Similarly conserved alpha helix motifs have been predicted, one near the N-terminus and one near the C-terminus.
Clinical significance
While no studies have focused on the characterization of C7orf43, several large-scale screenings have revealed information related to C7orf43 function. A study using FLAG affinity purification mass spectrometry (AP-MS) to profile protein interactions in the Hippo signaling pathway identified C7orf43 as one of the interacting proteins.[22] C7orf43 was found to interact with angiomotin-like protein 2 (AMOTL2), also known as Leman Coiled-Coil Protein (LCCP), a regulator of Hippo signaling.[22][23] AMOTL2 is also known to be an inhibitor of Wnt signaling, a pathway with known associations to cancer development, and to be a factor for angiogenesis, a process essential to tumour maintenance and metastasis.[23]
Several studies have linked C7orf43 to carcinomic events. Other studies have also linked C7orf43 to carcinomic events. A large-scale yeast two-hybrid experiment identified C7orf43 to be interacting with transmembrane protein 50A (TMEM50A), also known as cervical cancer gene 9 or small membrane protein 1 (SMP1).[24][25][26] While the exact function of TMEM50A is unknown, it has been associated with cervical cancer.
C7orf43 has also been identified as a target gene of the transcription factor AP-2 gamma (TFAP2C).[27] TFAP2C has been shown to be involved in the development, differentiation, and oncogenesis of mammary tissues. Specifically, TFAP2C has a role in breast carcinoma through its regulatory effect to ESR1 and ERBB2, both of which are receptors whose aberrations have been associated with breast carcinomas.[27][28] TFAP2C has also been shown to have an oncogenic role by promotion of cell proliferation and tumour growth in neuroblastoma.[29][30]
Through its location in the q arm of chromosome 7, C7orf43 has been linked to various diseases. Several diseases have been described as having deletions in the q arm of chromosome 7, among them are myeloid disorders, including acute myelogenous leukemia and myelodysplasia.[31]
References
- ↑ "Human PubMed Reference:".
- ↑ "Mouse PubMed Reference:".
- 1 2 3 4 5 "C7orf43 chromosome 7 open reading frame 43 [ Homo sapiens (human) ]". NCBI Gene. Retrieved 9 May 2015.
- ↑ "BC037034 cDNA sequence BC037034 [ Mus musculus (house mouse) ]". NCBI Gene. Retrieved 9 May 2015.
- ↑ "C7orf43 UCSC Genome Browser". UCSC Genome Browser. Retrieved 1 May 2015.
- ↑ "Q8WVR3 -CG043_HUMAN". UniProt. Retrieved 8 May 2015.
- ↑ "C7orf43-Large-scale analysis of the human transcriptome (HG-U133A)". NCBI GEO Profiles. Retrieved 2 April 2015.
- ↑ "C7orf43-Multiple normal tissues". NCBI GEO Profiles. Retrieved 2 April 2015.
- ↑ "BC037034-sagittal". Allen Brain Atlas. Retrieved 2 April 2015.
- ↑ "BC037034 expression". GenePaint. Retrieved 2 April 2015.
- ↑ "C7orf43 promoter GXP_116482". Genomatix. Retrieved 5 April 2015.
- ↑ "C7orf43-promoter binding sites". Genomatix. Retrieved 5 April 2015.
- 1 2 "Uncharacterized protein C7orf43 [Homo sapiens]". NCBI Protein. Retrieved 8 May 2015.
- 1 2 Brendel, V.; Bucher, P.; Nourbakhsh, I.R.; Blaisdell, B.E. & Karlin, S. "Methods and algorithms for statistical analysis of protein sequences". SAPS (Statistical Analysis of PS). Proc. Natl. Acad. Sci. U.S.A. Retrieved 2015-04-26.
- 1 2 3 "SDSC Biology Workbench". Department of Bioengineering. University of California Sand Diego. Retrieved 1 May 2015.
- ↑ Nakai, K; Horton, P (January 1999). "PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.". Trends in Biochemical Sciences. 24 (1): 34–6. doi:10.1016/s0968-0004(98)01336-x. PMID 10087920.
- 1 2 "BLAST: Basic Local Alignment Search Tool". Conserved Domain Database. National Center for Biotechnology Information. Retrieved 2015-03-01.
- ↑ "Myristoylator". ExPASy Bioinformatics Resource Portal. Retrieved 9 May 2015.
- ↑ "NetAcet 1.0 Server". CBS. Retrieved 9 May 2015.
- ↑ "Transmembrane Topology". Phobius. Stockholm Bioinformatics Centre. Retrieved 1 May 2015.
- ↑ "SOSUI". Classification and Secondary Structure Prediction of Membrane Proteins. Mitaku Group.
- 1 2 Couzens, A. L.; Knight, J. D. R.; Kean, M. J.; Teo, G.; Weiss, A.; Dunham, W. H.; Lin, Z.-Y.; Bagshaw, R. D.; Sicheri, F.; Pawson, T.; Wrana, J. L.; Choi, H.; Gingras, A.-C. (19 November 2013). "Protein Interaction Network of the Mammalian Hippo Pathway Reveals Mechanisms of Kinase-Phosphatase Interactions". Science Signaling. 6 (302): rs15–rs15. doi:10.1126/scisignal.2004712.
- 1 2 "Q9Y2J4 - AMOL2_HUMAN". UniProt. Retrieved 30 April 2015.
- ↑ Stelzl, Ulrich; Worm, Uwe; Lalowski, Maciej; Haenig, Christian; Brembeck, Felix H.; Goehler, Heike; Stroedicke, Martin; Zenkner, Martina; Schoenherr, Anke; Koeppen, Susanne; Timm, Jan; Mintzlaff, Sascha; Abraham, Claudia; Bock, Nicole; Kietzmann, Silvia; Goedde, Astrid; Toksöz, Engin; Droege, Anja; Krobitsch, Sylvia; Korn, Bernhard; Birchmeier, Walter; Lehrach, Hans; Wanker, Erich E. (September 2005). "A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome". Cell. 122 (6): 957–968. doi:10.1016/j.cell.2005.08.029. PMID 16169070.
- ↑ "Q7RU07 - Q7RU07_HUMAN". UniProt. Retrieved 8 May 2015.
- ↑ "TMEM50A transmembrane protein 50A [ Homo sapiens (human) ]". NCBI Gene. Retrieved 9 May 2015.
- 1 2 Woodfield, George W.; Chen, Yizhen; Bair, Thomas B.; Domann, Frederick E.; Weigel, Ronald J. (October 2010). "Identification of primary gene targets of TFAP2C in hormone responsive breast carcinoma cells". Genes, Chromosomes and Cancer. 49 (10): 948–962. doi:10.1002/gcc.20807. PMC 2928401. PMID 20629094.
- ↑ Ailan, He; Xiangwen, Xiao; Daolong, Ren; Lu, Gan; Xiaofeng, Ding; Xi, Qiao; Xingwang, Hu; Rushi, Liu; Jian, Zhang; Shuanglin, Xiang (2009). "Identification of target genes of transcription factor activator protein 2 gamma in breast cancer cells". BMC Cancer. 9 (1): 279. doi:10.1186/1471-2407-9-279. PMC 3224728. PMID 19671168.
- ↑ Gao, Shun-Li; Wang, Li-Zhong; Liu, Hai-Ying; Liu, Dan-Li; Xie, Li-Ming; Zhang, Zhi-Wei (15 June 2014). "miR-200a Inhibits Tumor Proliferation by Targeting AP-2γ in Neuroblastoma Cells". Asian Pacific Journal of Cancer Prevention. 15 (11): 4671–4676. doi:10.7314/APJCP.2014.15.11.4671.
- ↑ Begon, D. Y. (25 April 2005). "Yin Yang 1 Cooperates with Activator Protein 2 to Stimulate ERBB2 Gene Expression in Mammary Cancer Cells". Journal of Biological Chemistry. 280 (26): 24428–24434. doi:10.1074/jbc.M503790200.
- ↑ Březinová, Jana; Zemanová, Zuzana; Ransdorfová, Šárka; Pavlištová, Lenka; Babická, Libuše; Houšková, Lucie; Melicherčíková, Jela; Šišková, Magda; Čermák, Jaroslav; Michalová, Kyra (February 2007). "Structural aberrations of chromosome 7 revealed by a combination of molecular cytogenetic techniques in myeloid malignancies". Cancer Genetics and Cytogenetics. 173 (1): 10–16. doi:10.1016/j.cancergencyto.2006.09.003.
Further reading
- Hartley JL, Temple GF, Brasch MA (2001). "DNA cloning using in vitro site-specific recombination.". Genome Res. 10 (11): 1788–95. doi:10.1101/gr.143000. PMC 310948. PMID 11076863.
- Wiemann S, Weil B, Wellenreuther R, et al. (2001). "Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs.". Genome Res. 11 (3): 422–35. doi:10.1101/gr.GR1547R. PMC 311072. PMID 11230166.
- Simpson JC, Wellenreuther R, Poustka A, et al. (2001). "Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing.". EMBO Rep. 1 (3): 287–92. doi:10.1093/embo-reports/kvd058. PMC 1083732. PMID 11256614.
- Scherer SW, Cheung J, MacDonald JR, et al. (2003). "Human chromosome 7: DNA sequence and biology.". Science. 300 (5620): 767–72. doi:10.1126/science.1083423. PMC 2882961. PMID 12690205.
- Ota T, Suzuki Y, Nishikawa T, et al. (2004). "Complete sequencing and characterization of 21,243 full-length human cDNAs.". Nat. Genet. 36 (1): 40–5. doi:10.1038/ng1285. PMID 14702039.
- Gerhard DS, Wagner L, Feingold EA, et al. (2004). "The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).". Genome Res. 14 (10B): 2121–7. doi:10.1101/gr.2596504. PMC 528928. PMID 15489334.
- Wiemann S, Arlt D, Huber W, et al. (2004). "From ORFeome to biology: a functional genomics pipeline.". Genome Res. 14 (10B): 2136–44. doi:10.1101/gr.2576704. PMC 528930. PMID 15489336.
- Mehrle A, Rosenfelder H, Schupp I, et al. (2006). "The LIFEdb database in 2006.". Nucleic Acids Res. 34 (Database issue): D415–8. doi:10.1093/nar/gkj139. PMC 1347501. PMID 16381901.