prev    1 | 2 | 3 | 4    next

PhenCode: Example of adding a new data source


Once we have downloaded the mutation data and the reference sequences that are used to define the mutations' positions, we need to align the reference sequences with one or more of the UCSC genome assemblies in order to translate the given positions to chromosome coordinates. I use Blat to get these alignments in PSL format. If there are a lot of reference sequences the Blat runs can be automated, but in this example there are only two, so they are easy to do by hand.

I then write a custom Perl script that reads in the downloaded mutation data and the alignment, and does most of the work to generate the table files. This script is generally similar from one database to the next, but needs to be adjusted to reflect different input formats, numbering systems, etc. For example in the case of PAHdb, the region field from the input file is used to determine the location (exon, intron, UTR, etc.), but this might be different for another database. A number of common utility scripts that do not change have been factored out for easy reuse; this simplifies the custom scripts considerably. In general the custom script will: loop over each mutation in the input file; parse out the HGVS name and send it to the parseHgvsName2 script to get the chromosome coordinates, strand, and mutation type; also send the HGVS name to the sequenceCheck script to make sure that the wild-type sequence matches the reference sequence; read and compute links, attributes, and other fields as necessary; and print output lines for the gv, gvPos, and other tables.

When the script is done running, the data is ready for verification. The positions are put online at a test location in the form of a custom track that can be loaded into the Genome Browser like this. Then the mapping can be confirmed by zooming in and examining the custom track along with the sequence and genes. After I have finished checking this, I ask the source database curators to look it over as well.


[screen shot]


prev    1 | 2 | 3 | 4    next

return to FAQ