Learn how to use our database
In this tutorial we are going to show you how to explore the different tools that are part of Cancer3D. The main page is divided into 3 different sections. The first one (marked in red in Figure 1) allows you to search for your gene/protein of interest, you just need to type in its official symbol or Uniprot id. The second one (marked in blue in Figure 1) is a description of what Cancer3D is about and which tools it currently includes. Finally, at the bottom of the page you will find the link to the gene of the day (marked in yellow in Figure 1) and a link to our help page (marked in green in Figure 1). The first will take you to one of our preselected examples so that you can get a glimpse on how to use the database, while the second will take you to this tutorial.
Figure 1 - Cancer3D main page
We will now make use of the search bar to look for data on BRAF. In order to do that please type in the bar the name “BRAF” and click on the “Submit” button that is underneath it. Once you do that, a pop-up should show up showing you some information for the various protein isoforms of BRAF, including the number of mutations in cell lines for which we have drug activity data, the number of mutations in samples from TCGA or the length of the isoform (Figure 2). It might take a few seconds to retrieve that information from the database, please be patient. You should now select the BRAF isoform named “ENSP00000288602” by clicking on it.
Figure 2 - Selecting the isoform
This page is split into 4 main different regions. On the upper left side (in yellow in Figure 3) we see a histogram representing the mutation frequency for each position of the selected protein. If you also selected a specific region, you will see its mutations highlighted in red and you will find the region boundaries represented by two vertical red dashed lines. In this latter case you will also see the p value for the selected region. For example, if you selected the e-Driver analysis for BRAF in the main page you should see the region belonging to the Kinase domain highlighted in red, just as in Figure 3
Right below this, on the lower left side (in red in Figure 3), you can find the protein navigator. This allows you to select the different annotated regions to see their respecting e-Driver results. In the first row and shown in yellow you can see PFAM domains according to ENSEMBL. In the second row and colored in blue you can see intrinsically disordered regions (IDRs) predicted by FoldIndex. The third row (in red) contains predicted novel domains by AIDA. The remaining rows, and colored in green, show PDB structures with homology to that region. If you click in any of the first three types of regions (PFAM domains, IDRs or novel predicted domains) you will see the corresponding analysis in the upper left window. If you click on any PDB structure you will see the mapping of mutations occurring in the protein of interest into the selected PDB structure in the upper right window (in blue in Figure 4). Structures in that window are represented by ribbon diagrams with positions mutated showing all the aminoacid atoms. These mutated aminoacids are colored according to their mutation frequency in TCGA, the more intense the red color of the aminoacid, the higher the mutation frequency in TCGA.
Finally, on the lower right side (shown in green in Figure 3) you can find known interaction partners for your protein according to HPRD. If you click in the View Protein button you should see the isoform selection menu of the protein pop-up so you can see our predictions for that protein. If you click in the See Paper button instead, it will take you to the paper describing the interaction between these two proteins in Pubmed.
Figure 3 - e-Driver analysis page
This page has the same initial distribution than the one used to analyze e-Driver results, but the data represented in it is sightly different. In this case, on the upper left region (shown in yellow in Figure 4), you should find a scatterplot. Each point in this scatterplot represents a missense mutation found in the protein you selected (BRAF in this tutorial) in some of the ~900 cell lines used in the CCLE and is positioned according to its location within the protein (x-axis) and the activity of the selected drug (PLX4720 in our case) in the cell lines where it was found (y-axis). If you also selected a functional region you should see the boundaries of the region as two vertical red dashed lines and all the mutations occurring within that region colored in red instead of black. For example, if you selected the kinase domain of BRAF you should see the same plot as the one shown in Figure 4
On the lower left side (shown in red in Figure 4) you will find the protein navigator. Just as for the e-Driver page, yellow boxes represent annotated PFAM domains, blue boxes predicted intrinsically disordered regions (IDRs) by FoldIndex, red boxes predicted novel domains by AIDA and green boxes regions overlapping with PDB structures.
By clicking into any of the boxes representing a functional region (PFAM, IDR or predicted novel domains) you can go to the specific analysis for that region and the selected drug. This will cause the region that you selected to be highlighted in the scatterplot in the upper left side and a boxplot to appear in the upper right side (shown in blue in Figure 4). This boxplot groups cell lines into 3 different categories: those with mutations in the region you selected (left box), those with mutations in other regions of the same protein (middle box) and those with no mutations in any region of the protein (right box). Above the boxes you should find the corresponding p values for each comparison.
If instead of a region you select a PDB structure in the protein navigator, the upper right region will show you the selected PDB structure with the mutations happening in the protein mapped on. Again, the structure will be shown as a gray ribbon, with mutated aminoacids showing their atoms. These aminoacids are colored according to the average activity of cell lines with mutations in that position. Aminoacids colored in red represent lower drug activity (higher resistance) and aminoacids colored in green represent higher drug activity (higher sensitivity).
You can change the drug being studied by clicking on the “Drugs” button. This will give you the full list of drugs for which we have activity data. Once you select a drug, the scatterplot will change accordingly and the upper right region will show you, when available, one of the PDB structures for the protein with the mutations mapped on it.
Figure 4 - Drug analysis page
e-Driver (Porta-Pardo and Godzik, Bioinformatics, 2014, accepted) is based on the idea that protein functions are mediated by different and (to some extend) independent regions and that it is possible that not all such functional regions are equally relevant for carcinogenesis. If this is the case it should be reflected in the distribution of missense mutations along the protein, with regions under selection showing an enrichment or depletion of mutations as compared to regions with random (passenger) mutations.
In order to identify PFRs under selective pressure, e-Driver first retrieves the coordinates of all missense mutations in a cancer cohort (in our case samples from the TCGA pan-cancer analysis dataset) located in any given protein and maps them to the protein's functional regions. Then, for every PFR we use a binomial test to check whether the observed number of mutations in this region is significantly different from that predicted by chance. We assume that each mutation is an independent event and that all residues of the protein have the same probabilities of being mutated. The overall process is exemplified in Figure 5 using mutation data for TP53 in TCGA.
Figure 5 - e-Driver example with TP53 data
TP53 has a total of 916 missense somatic mutations in TCGA's pan-cancer dataset. In order to analyze TP53's mutation distribution, e-Driver first calculates the probability of a mutation being in a specific functional region. In our example we will use the DNA binding domain (PF00870), that is located between the 2 vertical red-dashed lines in Figure 5. This domain has a length of 193 aminoacids, as it goes from residue 95 to residue 288. Given that TP53 is 393 aminoacids long (ENSP00000269305), the probability of a mutation to fall randomly in that domain is ~0.49 (193/393). Since we found that 900 out of the 916 mutations in TP53 are in its DNA binding domain, we can calculate the probability to randomly observe a distribution at least as extreme as this one using a binomial test:
The p value for this region is extremely low, as its enriched in cancer mutations. Since it is below the 1e-4 threshold that we recommend for the Pancancer analysis (see the "P-value explained" page), e-Driver predicts that TP53 is a likely cancer driver.
e-Drug (Porta-Pardo and Godzik, submitted), just as e-Driver, exploits the idea that proteins are made of distinct functional modules such as domains or motifs. In this case, however, we try to identify functional modules or regions that are associated with changes in anticancer drug sensitivity. In order to do that, we use drug activity data from CCLE and, given a protein region, we divide this drug activity data into three different groups: activity in cell lines with mutations in the region, activity in cell lines with mutations in other regions of the same protein and activity in cell lines with no mutations in the protein at all. We then compare the activity levels from the first group against the other two using a Wilcoxon signed-rank test to see if there are statistically significant differences between the different groups.
The whole process is exemplified in Figure 6 for PIK3CA and the drug AEW541. PIK3CA has 5 different domains, including the p85 interacting domain and the helical domain (which contains the well-known mutation hotspot E545). While analysis at the whole protein level (Figure 6a) suggests that mutations in this protein are not associated with changes in the activity of AEW541, by looking at the drug-position scatterplot (Figure 6b) one can identify domains that differ significantly in their mean drug activity. For example, mutations in the p85 interacting domain (red dots) seem to be associated with lower AEW541 activity, whereas cell lines with mutations in the helical domain (blue dots) are more sensitive to this same drug (Figure 6c).
As shown in Figure 7, Cancer3D currently has at least one PFR for 80% of all proteins. Such PFRs cover as much as 60% of the human proteome. The most abundant PFR are PFAM domains. These are present in 70% of proteins and cover 35% of the proteome. Next come intrinsically disordered regions, present in 50% of the proteins and covering 25% of the proteome. Finally, there are new domains in almost 10% of the human proteins, covering around 5% of the proteome. In terms of three-dimensional structures, we have experimental or template-based structures for 60% of the human proteins, covering the 30% of the proteome.
|Win XP||N/A||Works||Works||IE 7 not supported|
|Win7||Works||Works||Works||IE 11 works|