This document provides an introduction to computers and bioinformatics. It defines key concepts like what a computer is, computer hardware, software, programming languages, computer networks, the internet, bioinformatics, and important bioinformatics databases and tools. Specifically, it discusses how computers accept data as input, process it, and provide information as output. It also explains how bioinformatics applies information technology to biological data to receive, analyze and retrieve biological information. Important databases mentioned include NCBI, EMBL, SRS and tools like Entrez.
5. What is a Computer???
• In general, a computer is a machine which
accepts data, processes it and returns new
information as output.
Processing
Data
(Input)
Information
(output)
5
6. Software
• Software is set of programs (which are step by
step instructions) telling the computer how to
process data. it is Also called “firmware”.
• Software needs to be installed on a computer,
usually from a CD or USB.
• e.g Digital audio editors , Win 98 , Win2000 ,
MS Office , Win7 , XP ghost , Win 2006.
6
7. Advantages of Using Computers
• Speed: Computers can carry out instructions in less
than a millionth of a second.
• Accuracy : Computers can do the calculations without
errors and very accurately.
• Diligence : Computers are capable of performing any
task given to them repetitively.
• Storage Capacity : Computers can store large
volume of data and information on magnetic media.
7
8. Computers languages
• Commonly used high-level programming languages
are Ada, V-BASIC, C , C++ , COBOL , Java , Lisp ,
Pascal.
• Commonly used scripting languages are
Bourne script, JavaScript, Python, Ruby, PHP, Perl
8
9. What is PERL?
• Larry Wall developed Perl in 1986.
• Perl is an interpreted language optimized for
scanning arbitrary text files, extracting information
from these files, and printing reports based on that
information.
• It is also a good language for many system
management tasks.
• In addition Perl-5 is used for graphics
programming, system administration, network
programming, finance, bioinformatics, and other
applications
9
10. Advantages of PERL
• These benefits include its generous licensing (it's
free).
• Cost and Licensing
First, Perl is generally available on most server
platforms, including the following:
• Most UNIX variants , MS-DOS , Windows NT
Windows 95 , OS/2
10
11. What is an Internet?
• The Internet is a global system of interconnected
computer networks that use the standard
Internet protocol suite (TCP/IP) to serve several
billion users worldwide.
• Internet provides many services:
– Email
– World Wide Web (www)
– Remote Login (Telnet)
– File Transfer (FTP)
11
12. Computer Network
• A Computer Network is interconnection of
Computers to share resources.
• Resources can be : Information, Load,
Devices etc.
12
13. Types Of Computer Networks
On the basis of Size:
• Local Area Network (LAN)
Its a network of the computers locally i.e. in
one room, one building or home.
• Wide Area Network (WAN)
Its a network of the computers
spread widely geographically.
13
14. Benefits of Computer Networks
•
•
•
•
Information Sharing , Device Sharing
Load Sharing , Mobility
Fast Communication
Anywhere Anytime Banking
14
15. How to get connected ? ? ?
• We can get connected through a modem which
uses copper twisted cables carrying signals to
transmit data.
15
16. Through WI-FI
• Wi-Fi, is a popular technology that allows an
electronic device to exchange data or connect to
the internet wirelessly using radio waves.
16
17. Browsers
• Clients that communicates with servers , using a
set of standard protocols & conversations.
• It contains the software we need in order to find ,
retrieve , view & send information over internet.
17
18. Browsers
• Lynx
it was developed in Kansas university USA to
construct a campus-wide information system.
it only provide a text-only via lower cost.
• Mosaic
Developed in 1993 at NCSA university of Illinois USA
deign for M.Windows it provide a single user-friendly
interface to diverse protocols , data formats & info.
Servers available throughout internet.
18
19. •
Netscape
developed in 1994 by NCC California USA.
it is now the most popular package for browsing
information's on internet. e.g e-mail , audio videos etc
•
Internet Explorer
developed in 1995 by Microsoft corp. Redmond USA
designed to work with PC-based OS , it offers
hypermedia browsers , including java & ActiveX
User can navigate by clicking on specific buttons or
pictures which are known as hyperlinks.
19
20. • Hyperlinks
usually characterized by being highlighted in some
way , either by using a different color from the main
body of the text or by being boxed etc.
• Each link have a uniform address known as URL
(uniform resource locator)
• HTTP (hyper text transport
protocol) used to exchange info
over internet.
20
21. • HTML (hyper text markup language)
Hyper text documents are written in a standard markup
language known as HTML.
HTML code is strictly text-based & any associated
graphics or sound for that document exist as
separate files in a common format.
21
22. EMB net
• EMB net (European Molecular Biology network)
is an international network that aims to enhance
bioinformatics services by bringing together
bioinformatics service providers.
22
23. EMB net
• Computer store sequence info as a simple rows of
sequence characters called strings. Each character
stored in binary code “smallest unit of memory”
called byte 1byte = 8 bites
• A DNA seq usually stored & read in computer as a
series of 8-bit words in binary format, Value = 0 or 1
producing 255 possible combinations.
• A protein seq appears as a series of 8-bit words
comprising the binary form of amino acid letters.
23
24. • Normally DNA & Protein seq are presented in ASCII
(American Standard Code for Information Interchange) &
FASTA (FAST Alignment) format.
(1)
>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPT
EAELQDMINEVDADGNGTID FPEFLTMMARKMKDTDSEEEIRE
AFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*
24
25. TELNET
• Its allows a user to remotely log onto a computer &
access its facilities. It is useful only for occasional
queries.
• Its disadvantage is “it has extensive management of
user identification & overloading of remote
computer processing power”.
25
26. Address
• To facilitate communication b/w nodes each
computer on internet is given a unique identifying No
(its IP address).
• It is encoded in dotted decimal format e.g
182.181.255.15 it represent a particular machine
(PC).
• But the domain-name sys also implemented which
makes internet addresses easier to users.
e.g ncbi.nlm.nih.gov meaning
ncbi = national center for biotec & information
nlm = national library of medicine
nih = national institute of health
26
27. World Wide Web (www)
• The World Wide Web consists of all the public Web
sites connected to the Internet worldwide, including
the client devices (such as computers and cell
phones) that access Web content.
27
28. • It was developed by ENRC (European Nuclear Research Council)
in 1989. to allow internationally info sharing , it led to
a medium through which text , images , sounds &
videos could be delivered on demand to users.
• WWW greatly enhanced the power of cross
references with the guarantee to retrieve the latest
information.
• The 1st Molecular biology web server was ExPASy
(Expert Protein Analysis System) developed in 1993 by
Geneva University Hospital & University of Geneva.
28
29. Web pages
• The documents which appear in the web browser
window when we surf the www called web pages.
• Each document display on web is called “web page”
& all of the related pages of a particular server is
collectively called a web site.
• Web site is a collection of relevant web pages &
stored on one computer & each website has a
unique address , the most feature of a site is link
which allows jump to another page anywhere in the
current website.
29
31. Nodes
• In communication networks, a node is a connection
point, either a redistribution point or a communication
endpoint.
• EMBnet operates 34 nodes in which 20 are national
b/c nations have the mandate to provide database ,
software and online services , including sequence
analysis , protein modeling , genetic mapping etc.
31
32. • 8 nodes design for user support & training and to
undertake research and development.
• These are actually academic , industrial or research
centers that have knowledge of specific areas of B.I
• They are responsible for the maintenance of
biological database & software's.
32
33. • Remaining 6 sites have been accepted within
EMBnet as associate nodes, Which are biocomputing
centers from non-European countries
• that serve their user communicate with the same
kinds of service , as might a typical national node.
• Most of them offer up-to-date access to sequence
databases & analysis software.
for molecular mapping , genome management ,
genetic mapping & so on.
33
35. SRS (Sequence Retrieval System)
• It is a network browser for database in molecular
biology , this involved to help EMBnet users.
• It allows any flat-file database to be index to any
other , it allows user to retrieve , link & access
entries from all the interconnected resources.
• The source links nucleic acid , protein sequence ,
structure , pattern , bibliographic databeses.
35
36. • SRS is integral system for info retrieval from many
different sequence & for feeding the sequences
retrieved into analytic tools such as sequence
comparison and alignment programes.
• It can search a total of 141 databases of protein &
nucleotide sequences , metabolic pathways , 3D
structures & functions , genomes , diseases and
phenotype information.
36
37. NCBI (The National Center for Biotechnology Information)
• Established in 1988 in USA as a division of National
Library of Medicine located at Bethesda, Maryland
• Its role is to develop new information technologies in
aiding our understanding of molecular & genetic
processes that underline health & diseases.
• Its specific aims include the creation of automated
system for sorting and analyzing biological infor..
37
38. • The development of advanced methods of
computer-based information processing.
• The facilitation to user access to databases &
software , and coordination of efforts to gather
biotechnology information worldwide.
• It maintain GenBank , the NIH DNA seq database.
this data is exchange with international nucleotide
databases , EMBL & DDBJ.
38
39. Entrez
• DB of different kind merged together and become
global hubs of knowledge.
• Just like SRS for EMBnet , entrez facility evolved at
NCBI to allow retrieval of molecular biology data &
bibliographic citations from NCBI`s.
• It permit related articles in different database to be
linked to each other.
39
40. • It provide access to DNA seq from (GenBank ,EMBL
& DDBJ) while protein seq from (SWISS-PORT ,PIR
, PRF ,PBD & translated protein seq from DNA seq
databases).
• It is front-end to all databases maintained by NCBI`s
& it is extremely easy to use , it is linked to total of
11 databases
• It can be accessed through NCBI website by
following URL
http://www.ncbi.nlm.nih.gov/entrez/
40
41. Databases covered by Entrez are listed below
Category
1. N.A sequence
Databases
Entrez ntds: seq obtained from GenBank , Refseq & PDB
Entrez Protein: seq obtained , from SWISS-PROT, PIR ,
2. Protein sequences PRF , PDB & translations from coding region GenBank ,
Refseq
3. 3D structure
Entrez Molecular Modeling Database (MMDB)
4. Genomes
Complete genome assemblies from many sources
5. PopSet
From GenBank , set of DNA seq that have been collected to
analyze the evolutionary relatedness of a population.
6. OMIM
Online Mendelian Inheritance in Man
7. Taxonomy
NCBI taxonomy database
8. Books
Bookshelf
9. Probeset
Gene Expression Omnibus (GEO)
10. 3D domain
Domains from the entrez Molecular Modeling Database
11. Literature
PubMed
41
42. Retrieval & Application
• The two main reasons for putting the data on the
computer is Retrieval & Discovery.
• Retrieval is the ability to get back out what we put in
so this is more valuable to get back from the system
more knowledge than was put in.
• This will help in biological discoveries
• NCBI uses 4 core data elements: bibliographic
citations , DNA seq , Protein seq , & 3D structures.
42
43. Bioseq
• Bioseq or biological sequence is a central element in
NCBI data model it contain a single , continues
molecule of nucleic acid or protein
43
44. Mirrors & Intranet
Different servers providing the same services are
called mirrors , to access a particular website it is
necessary to type the URL in the address bar of the
browser.
44
45. Intranet
• Many academic institutions have an intranet , which
means a local network that can be accessed only
from computer within the institution.
45
46. • What makes a web the most powerful is its network
• Here some basic sites for beginner of bioinformatics
1. http://www.ncbi.nlm.nib.gov/
2. http://www.ebi.ac.uk/
3. http://www.expasy.ch/
4. http://www.embl-heidelberg.de/
5. http://www.gmd.de/welcome.en.html
6. http://links.bmn.com/
46
47. • Apart from these sites , there are a great number of
specialist sites with biological data which can be
accessed. e.g
• General purpose search engines such as
47
48. THANK YOU FOR YOUR
ATTENTION
Questions are Welcomed . . .
48