Friday, 2024-11-22
BioInfo Pakistan
Site menu
Section categories
Related Subjects [38]
This category includes brief overview of all related subjects.
Defining BioInformatics [7]
In this section we tried to briefly explain what bioinformatics is ?
Unviersities [30]
This contains information about universities that are offering bioinformatics degree programs.
Resources [24]
Contains information about bioinformatics resources including databases, tools and techniques.
Algorithms [31]
This category includes some of the basic algorithms that are usually used by bioinformaticians.
Our poll
Pakistani Student
Total of answers: 2
Chat Box
Statistics

Total online: 1
Guests: 1
Users: 0
Home » 2009 » September » 10 » Organizing The Information
7:30 PM
Organizing The Information

“… ORGANISE the information on a LARGE SCALE …”

Redundancy and multiplicity of data A concept that underpins most research methods in bioinformatics is that much of this data can be grouped together based on biologically meaningful similarities. For example, sequence segments are often repeated at different positions of genomic DNA . Genes can be clustered into those with particular functions (eg enzymatic actions) or according to the metabolic pathway to which they belong, although here, single genes may actually possess several functions. Going further, distinct proteins frequently have comparable sequences – organisms often have multiple copies of a particular gene through duplication while different species have equivalent or similar proteins that were inherited when they diverged from each other in evolution. At a structural level, we predict there to be a finite number of different tertiary structures – estimates range between 1,000 and 10,000 folds – and proteins adopt equivalent structures even when they differ greatly in sequence. As a result, although the number of structures in the PDB has increased exponentially, the rate of discovery of novel folds has actually decreased. There are common terms to describe the relationship between pairs of proteins or the genes from which they are derived: analogous proteins have related folds, but unrelated sequences, while homologous proteins are both sequentially and structurally similar. The two categories can sometimes be difficult to distinguish especially if the relationship between the two proteins is remote. Among homologues, it is useful to distinguish between orthologues, proteins in different species that have evolved from a common ancestral gene, and paralogues, proteins that are related by gene duplication within a genome. Normally, orthologues retain the same function while paralogues evolve distinct, but related functions. An important concept that arises from these observations is that of a finite “parts list” for different organisms : an inventory of proteins contained within an organism, arranged according to different properties such as gene sequence, protein fold or function. Taking protein folds as an example, we mentioned that with a few exceptions, the tertiary structures of proteins adopt one of a limited repertoire of folds. As the number of different fold families is considerably smaller than the number of gene families, categorising the proteins by fold provides a substantial simplification of the contents of a genome. Similar simplifications can be provided by other attributes such as protein function. As such, we expect this notion of a finite parts list to become increasingly common in the future genomic analyses. Clearly, an essential aspect of managing this large volume of data lies in developing methods for assessing similarities between different biomolecules and identifying those that are related. Below, we discuss the major databases that provide access to the primary sources of information, and also introduce some secondary databases that systematically group the data (Table 2). These classifications ease comparisons between genomes and their products, allowing the identification of common themes between those that are related and highlighting features that are unique to some.

 Table 2. List of URLs for the databases that are cited in the review.

Database

Protein sequence
(primary)

SWISS-PROT
PIR-International

Protein sequence (composite)

OWL
NRDB

Protein sequence (secondary)

PROSITE
PRINTS
Pfam

Macromolecular
structures

Protein Data Bank (PDB)
Nucleic Acids Database (NDB)
HIV Protease Database
ReLiBase
PDBsum
CATH
SCOP
FSSP

Nucleotide sequences

GenBank
EMBL
DDBJ

Genome sequences

Entrez genomes
GeneCensus
COGs

Integrated databases

InterPro
Sequence retrieval system (SRS)
Entrez
URL




www.expasy.ch/sprot/sprot-top.html
www.mips.biochem.mpg.de/proj/protseqdb



www.bioinf.man.ac.uk/dbbrowser/OWL
www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein



www.expasy.ch/prosite
www.bioinf.man.ac.uk/dbbrowser/PRINTS/PRINTS.html
www.sanger.ac.uk/Pfam/




www.rcsb.org/pdb
ndbserver.rutgers.edu/
www.ncifcrf.gov/CRYS/HIVdb/NEW_DATABASE
www2.ebi.ac.uk:8081/home.html
www.biochem.ucl.ac.uk/bsm/pdbsum
www.biochem.ucl.ac.uk/bsm/cath
scop.mrc-lmb.cam.ac.uk/scop
www2.embl-ebi.ac.uk/dali/fssp



www.ncbi.nlm.nih.gov/Genbank
www.ebi.ac.uk/embl
www.ddbj.nig.ac.jp



www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome
bioinfo.mbb.yale.edu/genome
www.ncbi.nlm.nih.gov/COG



www.ebi.ac.uk/interpro
www.expasy.ch/srs5
www.ncbi.nlm.nih.gov/Entrez
Read About Different Types Of Data Bases On Next Page



Views: 993 | Added by: Ansari | Rating: 0.0/0
Total comments: 0
Name *:
Email *:
Code *:
Log In

Search
Calendar
«  September 2009  »
SuMoTuWeThFrSa
  12345
6789101112
13141516171819
20212223242526
27282930
Entries archive
Site friends
Copyright MyCorp © 2024
Free website builderuCoz