A Tour of the PSI Structural Biology Knowledgebase
The PSI Structural Biology Knowledgebase (PSI SBKB) is designed to turn the products of the Protein Structure Initiative into knowledge that is important for understanding living systems and disease. This "one-stop shop" provides users with the available genetic, structural, functional and experimental information about a particular protein of interest.
This walkthrough will introduce you to the features and search capabilities of the PSI SBKB.
Navigating the SBKB homepage
The PSI SBKB homepage makes many features available from one central place.
Features available on the PSI SBKB homepage
The golden search box is the main entry point to find out more information about a protein. You can search by protein or nucleotide sequence, UniProt Accession Code, PDB ID (Protein Data Bank atomic 3D coordinates file ID) or also conduct a search by text. These will be described in the second half of this tutorial. These search boxes are found on every page.
Provides access to the our scientific resources, our article E-collection, and information about us and the Protein Structure Initiative.
Central Boxes: Protein ToolBox
Access to structural proteomics tools developed by the SBKB and PSI.
In Functional Sleuth, we present structures determined by the PSI efforts whose functions are still unknown. You can explore the database by taxonomy or PSI:Biology center, and clicking on a structure in a gallery will perform a query to explore what *is* known about each structure.
The SBKB and Nature Publishing Group produced articles describing the latest research. Research highlights provide insights into biology and medicine, spanning from basic molecular biology to infectious diseases and pathogenesis to drug discovery. Reviews of novel bench methods as well as improvements to tried-and-tested techniques also help researchers with their lab work. There are more than 450 articles to learn from in this collection.
Further Information about the Protein Structure Initiative
The PSI SBKB also contains site help and information regarding the PSI program, its mission, and its policies, found in the left navigation menu in the "About" links.
The "About this site" menu contains information about the SBKB - a "getting started" tutorial and classroom exercises made by OpenHelix and the SBKB group, contact information, site map, terms of us, and references in case you wish to cite the SBKB.
The "About PSI" menu has information on the PSI overall mission and goals, their biomedical themes. It also shows active funding opportunities to either become part of the PSI efforts, or to collaborate with current consortia, with links to the NIH/NIGMS announcements and notices.
The PSI centers link gives information about each PSI center and their research projects.
Searching the PSI SBKB
The PSI SBKB can be searched by one-letter code protein sequence, nucleotide sequence, plain text and Protein Data Bank identifier (PDB ID) code. The following section describes how to use these search options.
The PSI SBKB consists of a main searchable database linked with modules (PSI resources) that provide additional information about the query terms.
Searchable by sequence and PDB ID:
Experimental Data Tracking from TargetTrack
Structures from the PDB
Annotations from external biological resources
Protein Model Portal - homology models
Materials Repository - DNA clones
Searchable by text:
Technology Portal - a repository of technical reports and methods provided by the PSI centers, searchable by center and by experimental step.
Publications Portal - a list of all articles published by the PSI centers.
PSI Centers - search text from within the PSI centers web sites
Next, we will discuss searching these features in detail.
Searching by Sequence, UniProt AC,or PDB ID
The PSI SBKB maintains a database of the sequences of PSI protein targets and the sequences of all solved protein structures released by the Protein Data Bank. Sequence searches are performed using the BLASTP program with an E-value cutoff of 10 for sequences less than or equal to 50 amino acids (150 nucleotides) or a E-value cutoff of 0.001 for sequences 51 amino acids or longer. To search for a particular protein sequence, enter the one-letter amino acid sequence in the search form, select the by Sequence radio button and press Search. Nucleotide sequence searched are also supported, using the BLASTX program to determine possible reading frames and displaying closely matched protein sequences.
An example query is available by selecting the by Sequence radio button, pressing "example query", and then pressing the Search button. These options are highlighted in the figure below.
The PSI SBKB maintains a database of the identifier codes for all experimental structure entries released by the Protein Data Bank. To search for a particular Protein Data Bank entry, enter the structure's 4-letter ID code in the search form, select the by PDB id radio button and press Search. An example query (2BEI) is available on the site to explore these features.
Results of a Sequence, UniProt AC, or PDB ID Search
The results of sequence and PDB ID searches are first displayed as a summary of available records relating to the input query. An example of a Results Summary is shown below.
To view query result details individually, select the DB REPORT tab at the top of the summary page. From this summary, you can view the type of information you seek:
1. Structures - displays a list of experimental structures within the PDB. The structures tab will also show all genetic, structural, and functional annotations attributed to a structure through a "notebook" view (described later)
2. Models - supplied by the Protein Model Portal (http://www.proteinmodelportal.org), displays computational models related to the sequence
3. Targets and protocols - supplied by the experimental data tracking (EDT) database, TargetTrack, (http://sbkb.org/tt/), displays information on the experimental progress, status of targets, and methods used for protein production and structure determination. Target sequences will also have annotations, even in the absence of a 3D structure.
5. Materials - supplied by the PSI Materials Repository, (http://psimr.asu.edu/) displays DNA clones available for purchase.
The Structures Tab
The Structures tab of the DB Report provides the essentials details about any structures matching the input query. If the query results for a sequence search are displayed, then the percent of sequence identity (percent exact sequence similarity) with the input sequence is displayed for each matching structure entry (I), as well as the E value (E).
The Structures section presents:
- a link to the RCSB PDB Structure Explorer Page,
- a download option for the PDB format structure data file,
- a thumbnail of the structure, which when clicked, will launch the interactive FirstGlance molecular viewer application, and
- a "post-it" with a list of possible annotation types, which when clicked, launches a rich "notebook" view of all annotations connected to this structure (described later).
Other reference information includes:
- PubMed and DOI for the primary citation (when available),
- Title of the deposited structure (may not be the same as the related publication),
- Structure entry deposition and release dates, and
- Experimental method used to obtain the model.
If the structure was solved by a PSI project then this information is provided along with the associated PSI Target identifier. There is also a glossary of terms available in the upper right hand corner which defines these headings. A glossary is present for each tab.
To view the other reports, click on their tab headings (Models, Protocols, etc.)
The Annotations Notebook
Each protein target and protein structure has many biological descriptions, or annotations, attached to them. The SBKB assembles the annotations from over 150 PSI and other genomic, structural, functional, and evolutionary resources to provide you with most of the information available today about that protein sequence. These annotations are organized into a "notebook", classified by scientific topic : Gene-level view, protein-level view, structural view, biological functions, cellular localization, biochemical pathways, medicinal relationships and references.
First, you can quickly get a sense of how many annotations exist through the "quick table". By hovering the mouse over a hot linked chain ID, a quick table will appear showing you if annotations exist for ~35 popular resources. Every database that contains an annotation will be highlighted in green, and clicking on the resource name will take you directly to that record in the main "notebook" view.
The full list of annotations are available in the Notebook view. In the figure of a typical protein-level annotation notebook page below, links are provided to the databases UniProtKB (comprehensive protein database), Pfam (a protein family and motifs database), InterPro (protein family assignment), and Gene3D (predictive structural annotation).
From this view the user can see what annotation databases have data relating to the sequence, and can go directly to the record by following the link.
The Glossary of Terms, available in the top-right corner, defines these headings; in this case, the glossary describes what kind of information each linked database provides.
The Models Tab
Computational Models associated with a query sequence or structure are shown in this section.
In the case of a sequence query, the number of models that have been predicted for this sequence are presented along with a link to the details for each model. In the case of the PDB ID query, the number of computational models which are based on information from this experimental structure is presented.
All of these results are obtained by a remote query to the PSI Protein Models Portal, which collects and maintains this information. In the example below, there are 4 models from three modeling databases available. To explore, follow the "view" link to go to the PSI Protein Models Portal.
Example: using the same sequence search example,
Step 1: Once you see the results of your search, follow the "view" link.
Step 2: You will be directed to the PMP site where you can explore the available pre-computer models there. Included here is a graphical explanation of how the similar sequence, structures, and models relate to each other, along with domain information in gray. Also, the list of proteins IDs from UniProt that relate to the sequence. Lastly - the list of models themselves, along with a pictorial clue of model reliability with the little traffic light icon.
The Sequence Summary:
red: your query
blue: the model you are viewing.
this model consists of residues 27-357 of your query sequence.
Reports what protein domains are recognized in your query sequence, with a link to InterPro for further information.In this example, the model is of the GDPD domain of the protein.
The computation model is presented, with information related to its creation. You can also display an interactive view the model and also download its coordinates for further evaluation.
Protein structure models are computational predictions which may contain errors. Based on the sequence identity to the template, a model is assigned to one of three categories of modeling complexity (see PMP for more details).
The target-template alignment provided on the model info pages are generated dynamically by structural superposition of model and template structures using the program MAMMOTH.
The Targets & Protocols Tab
Information about matching protein targets is shown in the Targets tab of the DB report.
The information provides the user with a status summary of the work performed on the target already. Information in this summary includes:
- the TargetID, with a link to the record in TargetDB
- the protein sequence alignment between your query sequence and similar sequences found in the database
- reported target status
- source organism
- and PSI Target Category
The annotations "post-it", quick table, and notebook views described in the structures section is also available, and well as a Glossary of Terms in the top right corner that defines these headings.
You can read the full record by clicking on the TargetID in the report (ex. GO.74365)
General information, such as when the latest update occurred, the responsible center, status information, source organism and target sequence.
If the target's experimental structure was successfully determined, a link to the RCSB PDB Structure Explorer page is also given.
Links to domain annotation and function prediction databases are provided, along with calculated biochemical and biophysical parameters for the sequence.
The Materials Tab
The Materials tab provides information about the availability of relevant target DNA clone materials at the PSI:Biology Materials Repository (PSI MR). The PSI MR is a resource that provides an on-line searchable database of archived PSI genetic materials, transfer, storage and maintenance of PSI plasmids in a highly quality-controlled manner at centralized on-site and off-site locations, and the facilities to distribute PSI plasmids and supporting information for research purposes within the U.S. and abroad.
The information provided in this tab:
- the TargetID, with a link to the record in TargetDB
- A link to order to clone
- A link to a detailed record about the target's DNA sequence (DNA insert).
- A link to information about the DNA vector in which the target sequence resides.
Selecting one of the last three links will transfer you to the PSI-MR DNASU web site (http://psimr.asu.edu).
To see further information about this DNA clone and the vector, including antibiotic resistance for positive selection, click on the Clone Details link. An example of a record is shown below.
Searching the PSI SBKB using plain text
The PSI-Nature SBKB maintains a 'plain text' index of all content in web pages and documents at the PSI Center web sites , PSI Technology and Publications Portal, and the Annotations Module.
To the search the PSI-Nature SBKB by plain text, enter the appropriate words in the search form, select the by Text radio button and press Search. An example query (the word "membrane") is available by selecting the "by plain text" radio button, selecting the example query link, and pressing the Search button.
The results of the text search are presented as list of pages containing the input search term (e.g. membrane) as shown below.
In the Site Search, all instances of ‘membrane' that occur on the PSI centers web site are found, including 6 highlights written for the SBKB that somehow talk about membranes and membrane proteins.
Clicking on the Structural Publication tab will show all structural articles that contain the query term; in this case, all structural publications that contain the term membrane.
These records include links to protein structures that contain the search term as well. The PubMed identifier, DOI number, and PubMed Central links to the article are provided when available, and by selecting the "Read More" link, the full citation and abstract of the article will appear.
Clicking on the Methods tab will show all PSI-published articles and reports containing the search term that focus on methodology. By selecting the "Read More" link, the full citation will be shown. In this way, you can search for new methods developed by the PSI efforts to help your own research.
Lastly, explore the site on your own.
This tutorial has walked through all of the features available that you can use towards your own research. With this "one-stop shop", you can find various sorts of assistance, from structural and annotation information about your protein, to reports and protocols about how to obtain it.
If you have any questions or comments, or would like to suggest future features for the PSI SBKB, please contact us at firstname.lastname@example.org.