How to Curate a Gene Page

CCGA Gene curation guidelines: How to create a page in CCGA

Summary:

TO BE WRITTEN, IF NECESSARY EG: Steps in the curation process.

  1. After you have decided to become a curator, contact ???, read this tutorial, watch the tutorial and sign your Honor Agreement.
  2. Contact your editor and gain permission to the CCGA site
  3. Chose or be assigned a few GENE pages to curate
  4. Curate the page by reading suggested reading materials and authoring content on the GENE page (DO NOT PLAGIARIZE!)
  5. Contact Editor upon completion of GENE page curation.

Introduction:

Thank you for volunteering to help curate the Compendium of Cancer Genome Aberrations (CCGA)! Your help will make this resource a valuable tool for users of the CCGA, including researchers, clinicians and others. This short, written description will help you get started and should serve as a collection of best practices and content style as you curate. Please sign and date and return your “Honor Agreement” before starting your curation, which is available here: . (TO DO: make link to Honor Agreement, downloadable by PDF, if possible).There is also a video that should be very helpful to show you how to curate, and is required before you start to curate. It can be seen here:

TO DO: create a tutorial video based on this script

The basic logic for CCGA pages is that there is basic Gene and Protein (and mutation) molecular biology-type information on the “Gene Pages”, and that there is Disease and clinical-type information type on the “Disease pages”. The CCGA is especially interested in the curation fusion genes/mutations that arise in disease, esp. in the hematological cancers. However, the dividing line between "Gene/Protein/Mutation" information and "Disease" information is sometime hard to determine (For example, in the case of describing fusion gene's value in diagnostic or prognostic tests i a specific disease). Basically, this type of "cross over" Molecular Disease information can be included IN ABBREVIATED form on the GENE pages, and wil be addressed more fully on the DISEASE pages. These guidelines will help you determine what information goes where. Please note that you should plan to spend between 4-8 hours in curating the information onto a single gene page, more if you are unfamiliar with the gene.

Editor:

Your curation of the gene and protein information will be aided and edited and reviewed by an “EDITOR”. Your assigned Editor will be your go-to person to help you curate and review your curated information after you are done curating. Please note that this written document and the curation tutorial cannot cover all the questions and decisions your will be faced with as you curate. Therefore, PLEASE feel free to contact your editor with any questions. You will need to contact your assigned editor to get “write” permissions to the CCGA so that you can create and edit pages in the Wiki.

Wiki Pages:

The CCGA web site is based on wiki pages and can be accessed here : http://www.ccga.io/index.php/Main_Page. please ask your editor for write premissions.

The functionality of the wiki pages are described in brief, at MediaWiki, see here: https://www.mediawiki.org/wiki/MediaWiki.

There are also short videos available on youtube that describe MediaWiki functionality here: https://www.youtube.com/watch?v=F8irbbwNo2E&list=PLAagofQWV6pf0xFyUw7gJg2yYYB-nCS4l. Others are available, by searching youtube for "MediaWiki tutorial".

Gene Pages:

QUESTION: do we need to designate Genes as Drivers vs. passengers?

You will be asked to choose from a number gene pages that you would like to curate. See list here: http://www.ccga.io/index.php/List_of_Gene_Pages A “Gene specific Template” has been produced which provides you with very nice Media Wiki Template to fill in with information you have curated. It is shown here. http://www.ccga.io/index.php/Gene-Specific_Template.

Note that the Template already has the markup language you will need for headers, links, and other syntax to be used in the Gene Paes. Please do not change the syntax of the template, so users of the CCGA pages will want to see the same format from page to page.

NOTE: When you do large edits on a Wiki page, the security makes sure yo are a human with test at the top of the page and a simple mathmatical formula you must complete, as below. YOU MUST answe the mathmatical equation for your edits to be saved.

Your edit includes new external links. To protect the wiki against automated spam, we kindly ask you to solve the following task below and enter the answer in the box in order to save your edit (more info): 45−1 =

Gene curation using the “Gene specific Template” MediaWiki Template in CCGA

In most cases, the Gene you will curate has already been created, based on the Gene specific template. However, if not, you will need to copy the Gene specific template to a new page.

  1. Go to the gene specific template (http://www.ccga.io/index.php/Gene-Specific_Template )
  2. Click “edit source” link at top center-right of page
  3. Copy the entire page (eg on a Mac, “command (cmd) a”, and then “cmd c”
  4. Open in another browser window the gene page you will curate
  5. Paste in the template (cmd v) and then save (at bottom of page)

The template provides an easy "fill in the blanks" WikiMedia page which already has formatted markups (for headers) and has examples for you to follow. The Template is described briefly, below.

Sections:

  1. Primary Authors
  2. Synonyms
  3. Genomic Location
  4. Cancer Category/Type
  5. Gene Overview
  6. Common Alteration Types
  7. Internal Pages
  8. External Links
  9. References

The sections of the Template are for ease of reading for the USER: HOWEVER, as a curator, you will want to curate and load information in the WIKI in a non-linear fashion. This is the suggested workflow:

1. PRIMARY AUTHORS section.

By adding your name to the PRIMARY AUTHORS section you can make sure that no-one else is curating this gene now, and so you will get credit for curating this information by your peers.

8. EXTERNAL LINKS Section.

To familiarize yourself with the latest information on the Gene you are curating, “Your Gene of Interest” (designated YGI throughout), please start with one of the last sections of the Template, the EXTERNAL LINKS Section. The listed resources will provide exhaustive (but not necessarily recent) molecular information about YGI, its mutations and its place in disease and cancer and treatment. The resources suggested as EXTERNAL LINKS as of Dec 2018 are as follows:

  1. Atlas of Genetics and Cytogenetics in Oncology and Haematology
  2. COSMIC
  3. CIViC
  4. St. Jude ProteinPaint
  5. Precision Medicine Knowledgebase (Weill Cornell)
  6. Cancer Index
  7. OncoKB
  8. NCBI Gene
  9. My Cancer Genome
  10. UniProt
  11. Pfam
  12. GeneCards
  13. OMIM
  14. LOVD(3)
  15. TICdb
  16. In some cases, a specialized source that may be available for some genes, for example TP53 by IARC - TP53 a database with reference sequences and mutational landscapes

Please note that some genes may have special resources devoted to them. An example is the International Agency for Research on Cancer page devoted to TP53 (see http://p53.iarc.fr/). This is unusual, but P53 is the most studied gene and protein on earth, so in this case, a specialized resource is justified. TO find if there are any special resources for YGI, perform a google search on YGI and include words like "database" or "resource". Ask your Editor.

Please note that not all the external resources have links to YGI, and then please curate “NO entry for YGI at resource”. For example, the gene MEMCOM is not present in the OncoKB site, nor the CIViC site. In these cases, please write in "No Entry for YGI at Resource" in place of the hyperlink to that site. Please check all links after you have curated them, as some of the link syntax might be mistakenly changed as you curate. As you read through each of the EXTERNAL LINK resources, note information that you can use in subsequent sections. For example, the NCBI gene site and the GeneCard sites are especially rich in SYNONYM information. Either note the information directly in the wiki page for YGI, or in a text document that you keep on your computer (and can cut and paste easily into the Wiki page later), or you can take notes by hand and re-write into the wiki page later (not suggested, very error-prone).

PLEASE NOTE: DO NOT PLAGIARIZE. You can borrow phrases here or there without attribution. Yo can even use a sentence here and there if you attribute the quote directly ( eg: "From NCBI"). However, you CANNOT cut and paste whole paragraphs from other resources into the CCGA pages.

2. SYNONYMS Section

As you read though the EXTERNAL LINKS resources, you will find many of them have lists of synonyms, esp. NCBI Gene and GeneCards. Please italicize the alternative gene names (synonyms).

3. GENOMIC LOCATION Section.

This information is most easily obtained from GeneCards, in the GeneCards section "Genomics" and subsection "Genomic Locations for YFG" and subsection "Genomic View for YFG"and sub-subsection "Cytogenetic band"

5. GENE OVERVIEW Section

This is where you will be compelled to find and read journal articles and book text. You may have found some relevant articles listed at some of the EXTERNAL LINKS sites (esp. Uniprot and NCBI Gene), but you will have to search for new, relevant articles in PubMed and also from the Hematological Cancer resource "WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (download from http://publications.iarc.fr/_publications/media/download/1511/700ac655d7f248cf1044efd985275086ed4f341f.pdf). Note that you will be finding information that will fit in other sections of the gene template, like CANCER CATEGORY TYPE and COMMON ALTERATION TYPES and REFERENCES. This section should describe in 1 paragraph (approx 5-7 sentences):

  1. the molecular function of the gene/protein
  2. its normal cellular role, for example in cellular processes or cellular pathways.
  3. its role in diseases, esp. cancer diseases in the CCGA.
  4. interesting or frequent mutations and their effects on specific cancers.
  5. role in drug resistance, if any.
  6. diagnostic or prognostic value, if any.

A few examples are shown below. "The protein encoded by RUNX1 can bind the protein encoded by CBFB to form "Core Binding Factor", a hetero-dimeric transcription factor which regulates a number of genes responsible for hematopoiesis and osteogenesis [2]. Runx1 protein can bind to DNA as a monomer through the Runt domain within the Runx1 protein. RUNX1 is the most frequent target for chromosomal translocation in leukemia [1]. Alterations of RUNX1 are typically loss-of-function or decreased function, and are considered "secondary driver mutations" (disease progression) in sporadic leukemias [2]; however, germline RUNX1 mutations contribute to a lifetime risk for myeloid malignancy of about 44% [2]. RUNX1 mutations (loss-of-function or decreased function) have been associated with decreased P53 activity and increased DNA repair defects and increased inflammation [2]. RUNX1 mutations are associated with gene mutations in ASXL1, MLLPTD, and IDH1/IDH2, and are mutually exclusive with NPM1 mutations [3]. Non-complex RUNX1 mutations were found to be associated with resistance to chemotherapy, decreased disease free survival (DFS), event free survival (EFS) and overall survival (OS) [3]."

The ABL1 gene encodes a non-receptor tyrosine kinase that is ubiquitously expressed and involved in a large number of cellular processes (see "NCBI Gene). By far the most prevalent ABL1 alteration associated with cancer are the fusions of the ABL1 gene with a number of partners, but especially with the BCR gene in CML [1,2] and to a lesser extent in B-ALL and T-ALL. The head to tail arrangement of the BCR-ABL1 fusion gene results in an activated tyrosine kinase activity [6]. It appears that the N-terminal domain of BCR can cause oligomerization of the BCR-ABL1 protein product, thus activating the ABL1 tyrosine kinase domain of the fusion protein [6,10,11]. The ABL1 and ABL2 genes encode tyrosine kinases which share overlapping physiological roles, and ABL2 somatic or amplification mutations are more common than similar mutations in ABL1 [6]. See the "BCR gene" for additional details of the BCR-ABL1 gene fusion.

For molecular information and some disease and mutation information, this section is most informed by reading the EXTERNAL LINKS resources of:

  1. UniProt (by far the best)
  2. GeneCards
  3. NCBI Gene
  4. CIViC
  5. Cancer Index
  6. My Cancer Genome
  7. Pfam

For mutational information, this section is most informed by:

  1. Atlas of Genetics and Cytogenetics in Oncology and Haematology
  2. COSMIC
  3. LOVD(3)
  4. TICdb
  5. St. Jude ProteinPaint
  6. Precision Medicine Knowledgebase (Weill Cornell)
  7. OncoKB

For Disease information, the best resources are:

  1. OMIM
  2. WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues (download from http://publications.iarc.fr/_publications/media/download/1511/700ac655d7f248cf1044efd985275086ed4f341f.pdf)
  3. Pubmed (search for YFG, esp. in context of "cancer" or specific cancers)

Note that in this section, you must also go to PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) and search for articles about YGI and cancer or specific cancers. The best articles to read are usually the most recent and should be are freely available to the public (eg, PubMed Central). If you are new to YGI and some of these cancers, then a recent review article may be helpful. Also very helpful is to use the book "WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues" (download from http://publications.iarc.fr/_publications/media/download/1511/700ac655d7f248cf1044efd985275086ed4f341f.pdf) Search through the PDF file for YGI and read those sections where YGI is mentioned.

We suggest when writing in the wiki, that you use a reference "placeholder" in the text you are writing. For example, use the first author's last name and year as a placeholder in the text, at the end of a sentence. Then, write the full reference into the REFERENCE Section (section 9) using the proper syntax suggested for either book, article or internet source. Then, when you are done with all sections and the reference section, you can replace the "placeholders" with the the numbered reference.

4. CANCER CATEGORY/TYPE Section

NOTE: this section crosses over into the "Disease Pages" in CCGA, and is meant to be a short summary or introduction to the information that is already on the Disease information pages, or will be written there in the future. This section is to be a list of cancer types important to CCGA, and for the most part, already created cancer disease pages in CCGA. As you read journal articles and abstracts from Pubmed and content in "World Health Organization Classification of Tumours of Haematopoietic and Lymphoid Tissues" and OMIM, you should be filling in content on this section. As you come across cancer types that are associated with YGI, you should search CCGA for those diseases (or subclasses of disease) and read the content there. You should link YGI to the Disease page in CCGA (see examples, below). You may be able to summarize the content you have found in CCGA disease pages.

The extent of the content should be a short paragraph for each cancer type (1-7 sentences). The content should include:

  1. Frequency of YGI mutations or fusion genes associated to patients with that specific disease,
  2. Whether the mutation is a common mutation in that specific disease (if it is uncommon, you are likely NOT to reference that fact, because you would end up referencing dozens or hundreds of mutations and diseases),
  3. Whether the gene or mutation is a target of drugs
  4. Whether there is diagnostic value in this gene or mutation(s)
  5. Whether there is prognostic value in this gene or mutation(s)
  6. Whether there are companion diagnostics (with certain therapies) based on this gene or mutation(s)

In some cases, you will find that some CCGA Diseases/Cancer Types have not yet been created. Please DO NOT create a disease page. Instead, within the Gene page, simply curate that information in a heading and contact your editor. As CCGA grows, we will add new Diseases/Cancer types and link to the Gene pages later (see example of CML, below).

The example of RUNX1 is shown below.

--- Acute Myeloid Leukemia (AML) with t(8;21)(q22;q22.1); RUNX1-RUNX1T1

--- Acute Myeloid Leukemia (AML) with Mutated RUNX1

The frequency of RUNX1 mutations is between 5-18% of all AML patients tested [3]. The most common chromosomal translocation is t(8;21)(q22;q22)resulting in RUNX1-RUNX1T1 fusion in de novo AML, at approximately 7% [2,6]. This translocation confers a favorable prognosis in AML and other neoplasms [2,5,6]. Another RUNX1 alteration is the t(3;21)(q26;q22), in which the RUNT domain of RUNX1 is fused to the entire EVI1 gene. This translocation is rarely found in patients diagnosed with de novo AML and is more common in those with therapy-related myelodysplastic syndrome (MDS)/AML [9]. Other mutations in RUNX1 include deletions, missense, splicing, frameshift, and nonsense alterations (mostly loss-of-function or decreased function), and occur at a frequency of approximately 10% in AML patients [6]. These mutations are mechanistically distinct from the chromosomal translocations and confer a worse prognosis [2,5,6].


iAMP21 is an intrachromosomal amplification of chromosome 21, which includes the genes RUNX1 and miR-802 among others. This amplification occurs in about 1.5-2% of all Acute Lymphocytic Leukemia cases tested and is associated with poor prognostication [5].


RUNX1 mutations have been described in 20% of patients with early T-cell precursor acute lymphoblastic leukemia (ETP-ALL) [6].


The most common chromosomal translocation is t(12;21)(p13;q22) resulting in ETV6-RUNX1 fusion in B-cell acute lymphocytic leukemia (B-ALL) [2]. This translocation occurs in 25% of Pediatric B-ALL but only 2% of Adult B-ALL [5, 6], and confers a favorable prognosis in B-ALL and other neoplasms [2,5,6]. iAMP21 is an intrachromosomal amplification of chromosome 21 which includes the genes RUNX1 and miR-802 among others. This amplification occurs in about 2% of all B-cell Acute Lymphocytic Leukemia cases tested and is associated with poor prognostication [6].


  • Chronic Myeloid Leukemia (CML)

A number of simple mutations in RUNX1 have been reported in CML patients, and these mutations may be in part responsible for progression from the chronic phase to blast crisis (BC) [7].


A high frequency (42%) of RUNX1 mutations has been reported among radiation-associated and therapy-related Myelodysplastic Syndrome (MDS) patients [8].


  • CCUS (Clonal cytopenia of undetermined significance) or ICUS (Idiopathic cytopenia of undetermined significance )

RUNX1 mutations are more common in clonal cytopenia of undetermined significance (CCUS) [2].


  • Familial platelet disorder with predisposition to acute myeloid leukemia (FPD/AML)

Germline mutations of RUNX1 have been reported in the rare autosomal dominant Familial platelet disorder with predisposition to acute myeloid leukemia (FPD/AML) [8].

6. COMMON ALTERATION TYPES Section

This section should be completed in concert with the GENE OVERVIEW Section and CANCER CATEGORY/TYPE Section. It is meant to be an "At a Glance" section for the type of alteration that ARE MOST COMMON ACROSS ALL CANCERS. The types of alterations are:

  • Copy Number Loss
  • Copy Number Gain
  • LOH (Loss of Heterozygosity)
  • Loss-of-Function Mutation
  • Gain-of-Function Mutation
  • Translocation/Fusion

In an example for TP53, there are two short paragraphs followed by a table. This is appropriate, because TP53 is found in a vast number of different cancers.

The TP53 gene contains homozygous mutations in about 50-60% of human cancers. About 90% of the mutations in TP53 encode missense mutant proteins that span about 190 codons in the DNA-binding domain; none of the 50 most common pathogenic missense mutations occur outside of the DNA-binding region. These mutations produce a protein with a reduced capacity to bind to a specific DNA sequence that regulates p53 transcriptional pathway [15]. The eight most common mutations across all cancer types (R175H, R248Q, R273H, R248W, R273C, R282W, G245S, R249S) are found in codons that account for about 28% of the total p53 mutations (See Table 1 in [15]); these alleles appear to be selected for preferentially in human cancers of many tissue types. Seven of the eight mutations occur at methylated CpG sites in TP53, which encode arginine residues that contact the DNA and are conserved over evolutionary time scales [15].


Inactivating mutations resulting in loss of p53 function, including deletions, LOH, and loss of function (LOF) alterations often confer a poor prognosis and chemoresistance. Alternatively, gain-of-function mutations promoting the expression and stability of the p53 protein in the nucleus can also lead to oncogenic effects, including genomic instability and excessive cell proliferation [12].

Copy Number Loss Copy Number Gain LOH Loss-of-Function Mutation Gain-of-Function Mutation Translocation/Fusion
X X X X

In the example of RUNX1, the two most common cancers associated to RUNX1 mutations are listed, along with the most common fusion genes assocaited, followed by a table.


Acute Myeloid Leukemia (AML); t(8;21)(q22;q22) resulting in RUNX1-RUNX1T1 fusion

B-cell Acute Lymphocytic Leukemia (B-ALL); t(12;21)(p13;q22) resulting in ETV6-RUNX1 fusion

Copy Number Loss Copy Number Gain LOH Loss-of-Function Mutation Gain-of-Function Mutation Translocation/Fusion
X X

7. INTERNAL PAGES Section

After you have completed the GENE OVERVIEW, CANCER CATEGORY/TYPE and COMMON ALTERATION TYPES, you will be more knowledgeable about the other INTERNAL PAGES that are relevant to information on YGI page. These will include links to diseases, and links to fusion partner genes. In fact, these links will essentially collect all the related INTERNAL PAGES from the GENE OVERVIEW, CANCER CATEGORY/TYPE and COMMON ALTERATION TYPES sections, all in one place.

A good example is the ABL1 page INTERNAL PAGES Section.

Chronic Myeloid Leukemia

Acute Lymphoblastic Leukemia

Acute Myeloid Leukemia (AML) with BCR-ABL1

Mixed Phenotype Acute Leukemia (MPAL) with t(9;22)(q34.1;q11.2); BCR-ABL1

See the "BCR gene" for additional details of the BCR-ABL1 gene fusion.

9. REFERENCES section

When you are done with all sections and the reference section, you can then replace the "placeholders" authors last name with the the numbered reference in the relevant sections.

Appendix

Resources:

  1. Gene-Specific-Template: http://www.ccga.io/index.php/Gene-Specific_Template
  2. MediaWiki Help: https://www.mediawiki.org/wiki/MediaWiki
  3. YouTube videos for how to use MediaWiki: https://www.youtube.com/watch?v=F8irbbwNo2E&list=PLAagofQWV6pf0xFyUw7gJg2yYYB-nCS4l
  4. The CCGA web site is here : http://www.ccga.io/index.php/Main_Page
  5. “WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues.pdf” Download from http://publications.iarc.fr/_publications/media/download/1511/700ac655d7f248cf1044efd985275086ed4f341f.pdf


FAQ (Frequently Asked Questions)