Learn the basics of bioinformatics such as accessing, analyzing, and interpreting biological data using common databases and tools.
Course participants will learn practical bioinformatics approaches to the study of sequence similarity, phylogenetic analysis, gene expression, homology, polymorphisms, and 3-D structure and function, using an animal photoreceptor family as an example.
Learn to identify DNA polymorphisms that may lead to human disease.
Course participants will learn to identify DNA polymorphisms that may lead to human disease, by first learning to analyze genomic data from public databases using the powerful National Center for Biotechnology Information *(NCBI) integrated resources. This course provides practical instruction in integrating a variety of databases and tools so participants can ultimately create a discovery-based pipeline system to facilitate genetic testing, drug design, and disease-based research efforts.
Learn to use *NCBI BLAST®, the most commonly used sequence similarity program.
Course participants will learn to BLAST nucleotide and protein sequence data for simple to in-depth similarity comparison using the BLAST programs and interface, including genomic BLAST and population genetics applications.
Learn to use the Gene database entries as a launching pad for the bioinformatics discovery process.
Course participants will learn to navigate *NCBI’s hand-curated Gene database to obtain information about a human gene such as its mRNA and genomic sequence, gene structure (exon-intron locations), and function and phenotypes associated with mutations. Participants will also learn to determine whether SNPs in the coding region of a gene are known to alter the function of the protein product.
Learn to use Perl-based scripts for high-throughput data queries, extraction, and parsing.
Course participants will learn how to use Perl scripts to access, download, and format databank entries in bulk. Once data is extracted and saved into a commonly used format (including FASTA, XML, TXT) participants will learn to identify essential annotation fields and use scripting to parse out research specific information such as chromosome location, gene id, synonyms, organisms, accession numbers and many more.
In addition, participants will gain a working knowledge of database query syntax, critical for efficient and relevant results.
Learn to visualize and annotate 3-D protein structures.
Course participants will learn to use *NCBI's Cn3D program, identify conserved domain(s) present in a protein, search for other proteins containing similar domain(s), explore a 3-D modeling template for the query protein, and find distant sequence homologs that may not be identified in the results of a BLAST search.
The advanced Structures course is recommended for those already familiar with 3-D protein structure databases.
Learn to mine and integrate the search results of biological databases.
Course participants will learn to use the complex *NCBI Entrez cross-database global search engine (Entrez), and how to integrate the search results of the various databases. The databases covered in this course are PubMed®, Nucleotide and Protein Sequence, Gene Expression, PubChem, Protein Structures, Complete Genomes, Taxonomy, and others. This course will provide tips on effective search strategies for Entrez databases, and the best strategies for accessing the records in various formats.
Learn how to access, download, and analyze microbial genome sequences and annotations.
Course participants will learn how to access the microbial genome sequences and annotations, navigate through and download the gene and protein datasets, and use the available genomic and comparative genomic analysis tools. The course will address practical discovery questions such as “Are there identifiable genes in microbial genomes that may be horizontally transferred” and “What are the differences between closely-related pathogenic and non-pathogenic bacteria”.
Learn how to get the most out of the PubChem databases of over 20 million unique small molecule structures.
Course participants will learn to navigate the contents and understand the organization of the PubChem databases. Participants will also learn how these data are submitted to and curated by *NCBI, and how to find links from a given compound to related data including bioactivity studies, literature abstracts, protein sequences, protein structures, genes, and diseases.
Learn how to find structurally similar chemicals to a given compound.
Course participants will learn to find chemicals with similar structures based on bonding, stereochemistry, molecular parameters, and the presence or absence of substructures or isotopes. Participants will also learn to find sets of compounds with defined bioactivity profiles, and use these sets to guide more in-depth searching for related compounds
Learn how to use and mine the polymorphism data at the *NCBI.
Course participants will learn to use the Single Nucleotide Polymorphism database (dbSNP), a primary repository of polymorphism data submitted from individual research and large-scale population studies—such as the HapMap project and Framingham study. Participants will also learn to navigate through and interpret dbSNP entries, analyze current SNP data with respect to mutant proteins, population genetics, and genotype to phenotype correlations, and understand SNP data in the larger context of genomic biology and disease-based research.
Learn how to analyze genomes from virulent pathogens.
Course participants will learn about the data and analytical tools available for disease-based research with a particular focus on the HIV and Influenza viruses. The course begins with a review of the typical sequence submission process and the annotation of a viral record, and concludes with a thorough investigation of the analytical tools for SNP discovery and genotyping such as *NCBI’s Influenza Virus Resource (IVR) and the Retrovirus Genomes tools, and the CDC and the LANL HIV database pages.