The document summarizes the experience of a biologist in adopting an e-science approach to their work. It describes how before e-science, the biologist took an uncoordinated "spaghetti" approach using various tools without a unified strategy. The biologist then explains how adopting e-science principles like collaboration, reusable workflows, and web services helped enhance their work by allowing experts from different domains to combine their expertise. The biologist also reflects on outreach efforts to promote e-science to other researchers.
6. Example: bioinformatics before e-science Human Transcriptome Map (HTM) (Versteeg et al. , Genome Research, 2003) Sage tag count (TU, Sage library) TU identifier position Transcriptional Unit (TU)
7. Before e-science HTM construction and RIDGE detection /* * determines ridges in htm expression table */ #include "ridge.h" int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable) { char querystring[256]; sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname); htmtable = PQexec(conn, querystring); return(validquery(htmtable, querystring)); } int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount) /* determines if mincount genes in a row are (part of) a ridge */ /* pre: htmtable is valid and sorted on genStart (ascending) /* post: { if (mincount<=0) return TRUE; if (row>=PQntuples(htmtable)) return FALSE; if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold) { return FALSE; } return(is_ridge(htmtable, ++row, exprthreshold, --mincount)); } int main() { PGconn *conn; /* holds database connection */ char querystring[256]; /* query string */ PGresult *result; int i; conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim"); if (PQstatus(conn)==CONNECTION_BAD) { fprintf(stderr, "connection to database failed."); fprintf(stderr, "%s", PQerrorMessage(conn)); exit(1); } else printf("Connection ok"); sprintf(querystring, "SELECT * FROM chromosomes"); printf("%s", querystring); result = PQexec(conn, querystring); if (validquery(result, querystring)) { printresults(result); } else { PQclear(result); PQfinish(conn); return FALSE; } PQclear(result); PQfinish(conn); return TRUE; } int printresults(PGresult *tuples) { int i; for (i=0; i< PQntuples(tuples) && i < 10; i++) { printf("%d, ", i); printf("%s", PQgetvalue(tuples,i,0)); } return TRUE; } int validquery(PGresult *result, char *querystring) { printf(" in validquery"); if (PQresultStatus(result) != PGRES_TUPLES_OK) { printf("Query %s failed.", querystring); fprintf(stderr, "Query %s failed.", querystring); return FALSE; } return TRUE; } IT used Perl PostgresSQL C MS Excel + VBA SPSS No predefined development strategy No design phase Data Data Data Data Data Data
14. Biological knowledge extraction 10/06/09 BioAID Biological question/model Computational experiment Extracted knowledge I want to do it my way Carole Goble’s me -scientist >17 million citations +400,000/yr
46. Why should I adopt e-Science? I do not believe in e -Science I only believe in Me -Science
47. Why adopt e-science? For determined sinners: ‘ The seven deadly sins of bioinformatics’ by Carole Goble http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics/