A few useful bioinformatics sites
- PROMALS3D multiple sequence and structure alignment server is the best tool I have found for sequence-based structure alignments.
- SyntTax is a fantastic tool for assessing how operons are conserved across strains or species, and it is great for generating vector graphics of genes and operons.
- EMBOSS palindrome finder is the best tool I’ve been able to find for searching for transcription factor binding sites that consist of inverted repeats separated by a gap. It’s still tricky to decide on minimum length, gap size, and number of mismatches to get an interpretable set of results.
- RSAT is useful for fetching intergenic regions upstream or downstream of a gene of interest, but can also be used for fetching the ORF. For example, you might be interested in the sequence upstream of spr1672 to look for regulatory elements. The default parameter is to fetch the whole region upstream of the gene, but you can tune it in other ways. The parent site has lots of other tools that are worth exploring. Update: Now the site has been divided according to kingdom. Go to http://prokaryotes.rsat.eu/ for prokaryotic data.
- ESPript3 is a great tool for making your multiple sequence alignment look pretty, especially if you want to display secondary structure elements from a PDB file across the top.
- ENDscript2 is a great tool for a quick and dirty initial look at a PDB entry. It searches the PDB for homologous structures, aligns them, and displays the multiple sequence alignment with the secondary structure elements, solvent accessibility, etc, and gives you pymol scripts or session files showing sequence and structure conservation.
- There are a lot of approaches to homology modelling. I like BLASTing against the PDB and finding my favorite homologous protein, then going to MODELLER in the bioinformatics toolkit, which also has many other useful tools. Sometimes it’s painfully picky about silly details of the input format, but it has given me some nice looking results.
-
Do you wish you could restrict BLAST results to those genes or organisms that have an NCBI Gene entry that will allow you to explore the genomic context? So do I. I haven’t found the answer yet, but restricting a search to organisms listed here as representative genomes tends to help. Getting all the tax_ids from here can help. This permits searching with an unweildy Entrez Query like
txid208435 [ORGN] OR txid862971 [ORGN] OR txid888833 [ORGN] OR txid1123298 [ORGN] OR txid482234 [ORGN] OR txid862969 [ORGN] OR txid873449 [ORGN] OR txid889201 [ORGN] OR txid1123301 [ORGN] OR txid904293 [ORGN] OR txid486410 [ORGN] OR txid1123302 [ORGN] OR txid552526 [ORGN] OR txid864569 [ORGN] OR txid1123303 [ORGN] OR txid637909 [ORGN] OR txid467705 [ORGN] OR txid1123304 [ORGN] OR txid764299 [ORGN] OR txid471872 [ORGN] OR txid1069533 [ORGN] OR txid68892 [ORGN] OR txid871237 [ORGN] OR txid1318633 [ORGN] OR txid591365 [ORGN] OR txid1076934 [ORGN] OR txid764298 [ORGN] OR txid1116231 [ORGN] OR txid1123306 [ORGN] OR txid1123307 [ORGN] OR txid1123308 [ORGN] OR txid1123309 [ORGN] OR txid210007 [ORGN] OR txid1302863 [ORGN] OR txid927666 [ORGN] OR txid1123311 [ORGN] OR txid1123312 [ORGN] OR txid760570 [ORGN] OR txid936154 [ORGN] OR txid981540 [ORGN] OR txid888746 [ORGN] OR txid1123313 [ORGN] OR txid171101 [ORGN] OR txid170187 [ORGN] OR txid873448 [ORGN] OR txid1054460 [ORGN] OR txid373153 [ORGN] OR txid910313 [ORGN] OR txid160490 [ORGN] OR txid699248 [ORGN] OR txid347253 [ORGN] OR txid388919 [ORGN] OR txid1123317 [ORGN] OR txid1074052 [ORGN] OR txid1156433 [ORGN] OR txid391295 [ORGN] OR txid568814 [ORGN] OR txid299768 [ORGN] OR txid1123318 [ORGN] OR txid1282664 [ORGN] OR txid218495 [ORGN] OR txid764291 [ORGN] OR txid904306 [ORGN] OR txid365659 [ORGN]