SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Cloudy	
  with	
  a	
  Touch	
  of	
  
                Cheminforma4cs	
  


Rajarshi	
  Guha,	
  Tyler	
  Peryea,	
  Dac-­‐Trung	
  Nguyen	
  
        NIH	
  Center	
  for	
  Advancing	
  Transla@onal	
  Science	
  
                                       	
  
                                Chemaxon	
  UGM	
  
                          September	
  26th,	
  2012	
  
                                 Wellesley,	
  MA	
  
Parallel	
  compu4ng	
  in	
  the	
  cloud	
  
•  Modern	
  cloud	
  vendors	
  make	
  provisioning	
  
   compute	
  resources	
  easy	
  
    –  Allows	
  one	
  to	
  handle	
  unpredictable	
  loads	
  easily	
  
    –  Pay	
  only	
  for	
  what	
  you	
  need	
  
•  Chemistry	
  applica<ons	
  don’t	
  usually	
  have	
  very	
  
   dynamic	
  loads	
  
•  But	
  large	
  scale	
  resources	
  are	
  an	
  opportunity	
  for	
  
   large	
  scale	
  (parallel)	
  computa<ons	
  
All	
  HPC	
  is	
  not	
  equal	
  


•  Use	
  cloud	
  resources	
  in	
            •  Make	
  use	
  of	
  cloud	
                         •  Huge	
  datasets	
  
   the	
  same	
  way	
  as	
  a	
  local	
        capabili<es	
                                        •  Candidates	
  for	
  map-­‐
   cluster	
                                    •  Old	
  algorithms,	
  new	
                             reduce	
  
•  MIT	
  StarCluster	
  makes	
                   infrastructure	
                                     •  Involves	
  algorithm	
  	
  
   this	
  easy	
  to	
  do	
                   •  Spot	
  instances,	
  SNS,	
                            (re)design	
  
                                                   SQS	
  SimpleDB,	
  S3,	
  etc	
  

Legacy	
                                        Cloudy	
                                                 Big	
  Data	
  
HPC	
                                           HPC	
                                                    HPC	
  



                                                                      hOp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa<cs-­‐to-­‐the-­‐cloud	
  
Big	
  data	
  &	
  cheminforma4cs	
  
•  Computa<on	
  over	
  large	
  chemical	
  databases	
  
   –  Pubchem,	
  ChEMBL,	
  GDB-­‐13,	
  …	
  
•  What	
  types	
  of	
  computa<ons?	
  
   –  Searches	
  (substructure,	
  pharmacophore,	
  ….)	
  
   –  QSAR	
  models	
  &	
  predic<ons	
  over	
  large	
  data	
  
•  Fundamentally,	
  “big	
  chemical	
  data”	
  lets	
  us	
  
   explore	
  larger	
  chemical	
  spaces	
  
Map-­‐Reduce	
  
                                     copy
                           sort


Split 0            Map
                                            merge


                                                                           Reduce                                        Part 0




Split 1            Map
                                            merge


                                                                           Reduce                                        Part 1



Split 2            Map




          K1,V1 ! list ( K 2 ,V2 )          K 2 , list (V2 ) ! list ( K 3,V3 )
                                                    Tom	
  White,	
  Hadoop,	
  The	
  Defini/ve	
  Guide.	
  3rd	
  Ed.	
  O’Reilly	
  	
  
Coun4ng	
  atoms	
  
  •  The	
  chemical	
  version	
  of	
  the	
  word	
  coun<ng	
  task	
  

Arbitrary line                                                            Atom               list (V2)
                 SMILES (V1)                 Atom
numbers (K1)                                          Occurence (V2)   Symbol (K2)
                                          Symbol (K2)


    1, Nc1ccc2ncccc2c1N                                                        N, list(1,1,1,1,...)
    2, Cl.CC1CCc2nc3ccccc3c(C)c2C1                 N1                          C, list(1,1,1,1,...)
    .                                              N1
    .                                              N1
    .                                              N1
    152366, Nc1ccc2ncccc2c1N         MAP	
          .                                         Reduce	
  
                                                    .
                                                                               Atom
                                                                                               Count (V3)
                                                                            Symbol (K3)



                                                                                     N,100
                                                                                     C,5684
                                                                                     .
                                                                                     .
                                                                                     .
The	
  Hadoop	
  ecosystem	
  

             Chukwa                            Zookeeper                                   Flume                         Pig

               HBase                                Mahout                                   Avro                       Whirr

                                   Map Reduce Engine                                                                    Hama

                                    Hadoop Distributed
                                                                                                                        Hive
                                       Filesystem

                                                         Hadoop Common


Based	
  on	
  hOp://www.slideshare.net/informa<cacorp/101111-­‐part-­‐3-­‐maO-­‐asleO-­‐the-­‐hadoop-­‐ecosystem	
  
Cheminforma4cs	
  on	
  Hadoop	
  
•      Hadoop	
  and	
  Atom	
  Coun<ng	
  
•      Hadoop	
  and	
  SD	
  Files	
  
•      Cheminforma<cs,	
  Hadoop	
  and	
  EC2	
  
•      Pig	
  and	
  Cheminforma<cs	
  
	
  


        But	
  are	
  cheminforma@cs	
  problems	
  	
  
       really	
  big	
  enough	
  to	
  jus@fy	
  all	
  of	
  this?	
  
Simplifying	
  Hadoop	
  applica4ons	
  
                                 package gov.nih.ncgc.hadoop;
                                                                                                                                         public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter
                                 import chemaxon.formats.MolFormatException;
                                                                                                                                    reporter) throws IOException {
                                 import chemaxon.formats.MolImporter;




•  Raw	
  Hadoop	
  	
  
                                                                                                                                           Molecule mol = MolImporter.importMol(value.toString());
                                 import chemaxon.license.LicenseManager;
                                                                                                                                           matches.set(mol.getName());
                                 import chemaxon.license.LicenseProcessingException;
                                                                                                                                           search.setTarget(mol);
                                 import chemaxon.sss.search.MolSearch;
                                                                                                                                           try {
                                 import chemaxon.sss.search.SearchException;
                                                                                                                                               if (search.isMatching()) {
                                 import chemaxon.struc.Molecule;
                                                                                                                                                   output.collect(matches, one);
                                 import org.apache.hadoop.conf.Configuration;
                                                                                                                                               } else {




   programs	
  can	
  	
  
                                 import org.apache.hadoop.conf.Configured;
                                                                                                                                                   output.collect(matches, zero);
                                 import org.apache.hadoop.filecache.DistributedCache;
                                                                                                                                               }
                                 import org.apache.hadoop.fs.Path;
                                                                                                                                           } catch (SearchException e) {
                                 import org.apache.hadoop.io.IntWritable;
                                                                                                                                           }
                                 import org.apache.hadoop.io.LongWritable;
                                                                                                                                         }
                                 import org.apache.hadoop.io.Text;
                                                                                                                                      }
                                 import org.apache.hadoop.mapred.FileInputFormat;




   be	
  tedious	
  to	
  	
  
                                 import org.apache.hadoop.mapred.FileOutputFormat;
                                                                                                                                       public static class SmartsMatchReducer extends MapReduceBase implements Reducer<Text,
                                 import org.apache.hadoop.mapred.JobClient;
                                                                                                                                    IntWritable, Text, IntWritable> {
                                 import org.apache.hadoop.mapred.JobConf;
                                                                                                                                         private IntWritable result = new IntWritable();
                                 import org.apache.hadoop.mapred.MapReduceBase;
                                 import org.apache.hadoop.mapred.Mapper;
                                                                                                                                            public void reduce(Text key,
                                 import org.apache.hadoop.mapred.OutputCollector;
                                                                                                                                                         Iterator<IntWritable> values,
                                 import org.apache.hadoop.mapred.Reducer;
                                                                                                                                                         OutputCollector<Text, IntWritable> output,




   write	
  
                                 import org.apache.hadoop.mapred.Reporter;
                                                                                                                                                         Reporter reporter) throws IOException {
                                 import org.apache.hadoop.mapred.TextInputFormat;
                                                                                                                                              while (values.hasNext()) {
                                 import org.apache.hadoop.mapred.TextOutputFormat;
                                                                                                                                                 if (values.next().compareTo(one) == 0) {
                                 import org.apache.hadoop.util.Tool;
                                                                                                                                                     result.set(1);
                                 import org.apache.hadoop.util.ToolRunner;
                                                                                                                                                     output.collect(key, result);
                                                                                                                                                 }
                                 import java.io.BufferedReader;
                                                                                                                                              }
                                 import java.io.FileReader;
                                                                                                                                            }
                                 import java.io.IOException;
                                                                                                                                        }
                                 import java.util.Iterator;
                                                                                                                                        public int run(String[] args) throws Exception {
                                 /**
                                                                                                                                          JobConf jobConf = new JobConf(getConf(), HeavyAtomCount.class);
                                  * SMARTS searching over a set of files using Hadoop.
                                                                                                                                          jobConf.setJobName("smartsSearch");
                                  *
                                  * @author Rajarshi Guha
                                                                                                                                            jobConf.setOutputKeyClass(Text.class);
                                  */
                                                                                                                                            jobConf.setOutputValueClass(IntWritable.class);
                                 public class SmartsSearch extends Configured implements Tool {
                                     private final static IntWritable one = new IntWritable(1);
                                                                                                                                            jobConf.setMapperClass(MoleculeMapper.class);
                                     private final static IntWritable zero = new IntWritable(0);
                                                                                                                                            jobConf.setCombinerClass(SmartsMatchReducer.class);
                                                                                                                                            jobConf.setReducerClass(SmartsMatchReducer.class);
                                   public static class MoleculeMapper extends MapReduceBase implements Mapper<LongWritable, Text,
                                 Text, IntWritable> {
                                                                                                                                            jobConf.setInputFormat(TextInputFormat.class);
                                      private String pattern = null;
                                                                                                                                            jobConf.setOutputFormat(TextOutputFormat.class);
                                      private MolSearch search;
                                                                                                                                            jobConf.setNumMapTasks(5);
                                     public void configure(JobConf job) {
                                                                                                                                            if (args.length != 4) {
                                         try {
                                                                                                                                                System.err.println("Usage: ss <in> <out> <pattern> <license file>");
                                            Path[] licFiles = DistributedCache.getLocalCacheFiles(job);
                                                                                                                                                System.exit(2);
                                            BufferedReader reader = new BufferedReader(new FileReader(licFiles[0].toString()));
                                                                                                                                            }
                                            StringBuilder license = new StringBuilder();
                                            String line;
                                                                                                                                            FileInputFormat.setInputPaths(jobConf, new Path(args[0]));
                                            while ((line = reader.readLine()) != null) license.append(line);
                                                                                                                                            FileOutputFormat.setOutputPath(jobConf, new Path(args[1]));
                                            reader.close();
                                                                                                                                            jobConf.setStrings("pattern", args[2]);
                                            LicenseManager.setLicense(license.toString());
                                         } catch (IOException e) {
                                                                                                                                            // make the license file available vis dist cache
                                         } catch (LicenseProcessingException e) {
                                                                                                                                            DistributedCache.addCacheFile(new Path(args[3]).toUri(), jobConf);
                                         }
                                                                                                                                            JobClient.runJob(jobConf);
                                         pattern = job.getStrings("pattern")[0];
                                                                                                                                            return 0;
                                         search = new MolSearch();
                                                                                                                                        }
                                         try {
                                            Molecule queryMol = MolImporter.importMol(pattern, "smarts");
                                                                                                                                        public static void main(String[] args) throws Exception {
                                            search.setQuery(queryMol);
                                         } catch (MolFormatException e) {
                                                                                                                                            int res = ToolRunner.run(new Configuration(), new SmartsSearch(), args);
                                         }
                                                                                                                                        }

                                                                                                                                                       SMARTS	
  based	
  	
  
                                     }
                                                                                                                                    }
                                     final static IntWritable one = new IntWritable(1);
                                     Text matches = new Text();


                                                                                                                                                       substructure	
  search	
  	
  
Pig	
  &	
  Pig	
  La4n	
  
•  Pig	
  La<n	
  programs	
  are	
  much	
  simpler	
  to	
  write	
  
   and	
  get	
  translated	
  to	
      A = load 'medium.smi' as (smiles:chararray);
                                         B = filter A by gov.nih.ncgc.hadoop.pig.SMATCH(smiles, 'NC(=O)C(=O)N');
                                         store B into 'output.txt';


   Hadoop	
  code	
                                      SMARTS	
  search	
  in	
  	
  
                                                         Pig	
  La<n	
  

•  SQL-­‐like,	
  requires	
  	
           package gov.nih.ncgc.hadoop.pig;

                                           import chemaxon.formats.MolImporter;


   UDF	
  to	
  be	
  	
  
                                           import chemaxon.sss.search.MolSearch;
                                           import chemaxon.sss.search.SearchException;
                                           import chemaxon.struc.Molecule;
                                           import org.apache.pig.FilterFunc;


   implemented	
  to	
  	
  
                                           import org.apache.pig.data.Tuple;

                                           import java.io.IOException;



   perform	
  	
  
                                           public class SMATCH extends FilterFunc {
                                             static MolSearch search = null;




   non-­‐standard	
  tasks	
  
                                               public Boolean exec(Tuple tuple) throws IOException {
                                                 if (tuple == null || tuple.size() < 2) return false;
                                                 String target = (String) tuple.get(0);
                                                 String query = (String) tuple.get(1);
                                                 try {
                                                     Molecule queryMol = MolImporter.importMol(query, "smarts");
                                                     search.setQuery(queryMol);
                                                     search.setTarget(MolImporter.importMol(target, "smiles"));
                                                     return search.isMatching();
                                                 } catch (SearchException e) {
                                                     e.printStackTrace();
                                                 }
                                                 return false;

                                           }
                                               }                                   UDF	
  for	
  SMARTS	
  search	
  
Going	
  beyond	
  chunking?	
  
•  All	
  the	
  preceding	
  use	
  cases	
  are	
  embarrassingly	
  
   parallel	
  	
  
    –  Chunking	
  the	
  input	
  data	
  and	
  applying	
  the	
  same	
  
       opera<on	
  to	
  each	
  chunk	
  
    –  Very	
  nice	
  when	
  you	
  have	
  a	
  big	
  cluster	
  


               Are	
  there	
  algorithms	
  in	
  	
  
        cheminforma@cs	
  that	
  	
  can	
  employ	
  	
  
       map-­‐reduce	
  at	
  the	
  algorithmic	
  level?	
  
Going	
  beyond	
  chunking?	
  
•  Applica<ons	
  that	
  make	
  use	
  of	
  pairwise	
  (or	
  higher	
  
   order)	
  calcula<ons	
  could	
  benefit	
  from	
  a	
  map-­‐
   reduce	
  incarna<on	
  
    –  Doesn’t	
  necessarily	
  avoid	
  the	
  O(N2)	
  barrier	
  
    –  Bioisostere	
  iden<fica<on	
  is	
  one	
  case	
  that	
  could	
  be	
  
       rephrased	
  as	
  a	
  map-­‐reduce	
  problem	
  
•  Map-­‐Reduce	
  Design	
  PaOerns	
  
Iden4fying	
  MMPs	
  
•  First	
  step	
  in	
  iden<fying	
  bioisosteres	
  is	
  to	
  iden<fy	
  
   candidate	
  matched	
  molecular	
  pairs	
  
    –  Naïve	
  all	
  pairs	
  comparison	
  
    –  Predefined	
  list	
  of	
  transforma<ons	
  	
  
           •  Birch	
  et	
  al,	
  BMCL,	
  2009	
  
    –  Fragment	
  intersec<on	
  
           •  Hussain	
  et	
  al,	
  JCIM,	
  2010	
  
    –  MCS	
  based	
  approaches	
  (e.g.,	
  WizePairZ)	
  
           •  Warner	
  et	
  al,	
  JCIM,	
  2010	
  
    	
  
Naïve	
  Bioisostere	
  evalua4on	
  
N	
  molecules	
                      N(N-­‐1)/2	
  comparisons	
  




                                              ...
Scaffold	
  seeding	
  
               Seed	
  Fragment:	
  




Members:	
  
Scaffold	
  seeded	
  bioisosteres	
  
                    M(M-­‐1)/2	
  comparisons	
  




                     M(M-­‐1)/2	
  comparisons	
  
Seeded	
  bioisosteres	
  –	
  MR	
  style	
  

• Do	
  pairwise	
  MCS	
  
                                                REDUCE	
  
  analysis	
  on	
  scaffold	
  
                                  • Collect	
  pairs	
  of	
  
  series	
  
                                    SMILES	
  for	
  a	
  given	
  
• For	
  each	
  pair	
             SMIRKS	
  
  output	
  SMIRKS	
  
                                  • Store	
  in	
  DB,	
  or	
  
  transform	
  and	
  the	
  
  pair	
  of	
  SMILES	
          • Filter	
  by	
  ac<vity,	
  or	
  
                                  • …	
  


              MAP	
  
Does	
  seeding	
  help?	
  
•  Doesn’t	
  bypass	
  the	
  O(N2)	
  barrier	
  –	
  does	
  reduce	
  the	
  
   constant	
  
•  Depends	
  on	
  how	
  many	
  scaffolds	
  and	
  the	
  	
  
   number	
  of	
  member	
  for	
                                            1e+14


   each	
  scaffold	
  
•  Certainly	
  useful	
  when	
  
                                         log Number of pairwise comparisons
                                                                              1e+11


   there	
  a	
  few	
  members	
                                                                                               Method


   per	
  scaffold	
                                                           1e+08
                                                                                                                                   all
                                                                                                                                   seeded.7
                                                                                                                                   seeded.21



•  Highly	
  populated	
  
                                                                                                                                   seeded.100




   scaffolds	
  can	
  throw	
  
   things	
  off	
  
                                                                              1e+05




                                                                                      1e+03                 1e+05       1e+07
                                                                                              log Number of molecules
Data	
  
•  Exhaus<vely	
  fragmented	
  ChEMBL	
  13	
  
•  Iden<fied	
  scaffolds	
  with	
  	
  
   	
  
   	
                     N members
   	
                                   ! 1.8
                           N scaffold
   	
  
•  Ended	
  up	
  with	
  231,875	
  scaffolds	
  	
                              1e+08




   –  Covers	
  235,693	
  unique	
  molecules	
  


                                                               log Comparisons
   –  Average	
  of	
  7	
  members	
  per	
  scaffold	
                          1e+05




   –  95%	
  of	
  scaffolds	
  had	
  <	
  21	
  members	
  
   –  99.5%	
  had	
  <	
  74	
  members	
                                       1e+02




        •  The	
  0.05%	
  are	
  a	
  bit	
  problema<c	
  
                                                                                         All             Seeded
                                                                                               Method
Timing	
  experiments	
  
•  Selected	
  50	
  scaffolds	
  with	
  10	
  or	
  fewer	
  members	
  
•  Configured	
  so	
  as	
  to	
  have	
  ~	
  5	
  maps	
  
•  Effec<ve	
  running	
  <me	
  for	
  
   the	
  en<re	
  job	
  is	
  3.8	
  min	
                  200




   on	
  Hadoop	
  
                                                              150


    –  Only	
  needed	
  5	
  of	
  8	
  map	
  
       slots	
  on	
  our	
  “cluster”	
           Time (s)   100




•  Takes	
  ~	
  6	
  min	
  without	
                         50


   Hadoop	
  
                                                                0

                                                                    1   2       3        4   5
                                                                            Job Number
Timing	
  experiments	
  
•  Selected	
  1000	
  scaffolds	
  with	
  20	
  or	
  fewer	
  
   members	
  
    –  Ran	
  with	
  10	
  scaffolds	
  /	
  map	
  
•  Hadoop	
  run	
  <me	
  
   was	
  ~	
  2	
  hr	
  
                                                       15




    –  Most	
  maps	
  were	
  
                                      Number of Jobs




                                                       10

       fast	
  (<	
  20	
  sec)	
  
•  Serial	
  evalua<on	
                                5


   would	
  be	
  >	
  7	
  hr	
  
                                                        0

                                                             1.0   1.5   2.0        2.5   3.0   3.5   4.0
                                                                           log Time (s)
A	
  M-­‐R	
  workflow	
  
•  We’re	
  currently	
  focused	
  on	
  just	
  the	
  MMP	
  step	
  as	
  
   as	
  a	
  MR	
  example	
  
•  Could	
  also	
  include	
  fragmenta<on	
  step	
  as	
  part	
  of	
  
   the	
  workflow	
  
    –  But	
  a	
  pre-­‐calculated	
  set	
  of	
  scaffolds	
  is	
  more	
  sensible	
  
•  Store	
  transforma<ons	
  and	
  members	
  in	
  HBase	
  
•  Link	
  with	
  ac<vity	
  data	
  and	
  apply	
  structure	
  &	
  
   ac<vity	
  filters	
  on	
  candidate	
  pairs	
  
What	
  Hadoop	
  is	
  not	
  for	
  
•  Doesn’t	
  replace	
  an	
  actual	
  database	
  
•  It’s	
  not	
  uniformly	
  fast	
  or	
  efficient	
  
•  Not	
  good	
  for	
  ad	
  hoc	
  or	
  real-­‐<me	
  analysis	
  
•  Generally	
  not	
  effec<ve	
  unless	
  dealing	
  with	
  
   massive	
  datasets	
  
•  All	
  algorithms	
  are	
  not	
  amenable	
  to	
  the	
  map-­‐
   reduce	
  method	
  
Conclusions	
  
•  Cheminforma<cs	
  applica<ons	
  can	
  be	
  rehosted	
  or	
  
   rewriOen	
  to	
  take	
  advantage	
  of	
  cloud	
  resources	
  
    –  Remotely	
  hosted	
  	
  
    –  Embarrassingly	
  parallel	
  /	
  chunked	
  
    –  Map/reduce	
  	
  
•  Ability	
  to	
  process	
  larger	
  structure	
  collec<ons	
  lets	
  
   us	
  explore	
  more	
  chemical	
  space	
  
•  “Big	
  data”	
  isn’t	
  really	
  that	
  big	
  in	
  chemistry	
  
Conclusions	
  
•  Q:	
  But	
  are	
  cheminforma/cs	
  problems	
  really	
  big	
  
   enough	
  to	
  jus/fy	
  all	
  of	
  this?	
  	
  
•  A:	
  Yes	
  –	
  virtual	
  libraries,	
  integra<ng	
  chemical	
  
   structure	
  with	
  other	
  types	
  and	
  scales	
  of	
  data	
  

•  Q:	
  Are	
  there	
  algorithms	
  in	
  cheminforma/cs	
  that	
  	
  
   can	
  employ	
  map-­‐reduce	
  at	
  the	
  algorithmic	
  level?	
  
•  A:	
  Yes	
  –	
  especially	
  when	
  we	
  consider	
  problems	
  
   with	
  a	
  combinatorial	
  flavor	
  
hRps://github.com/rajarshi/chem.hadoop	
  

Weitere ähnliche Inhalte

Andere mochten auch

Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Discovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSKDiscovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSKDavid Leahy
 
Kecerdasan spiritual gol_iii
Kecerdasan spiritual gol_iiiKecerdasan spiritual gol_iii
Kecerdasan spiritual gol_iiidila semangat
 

Andere mochten auch (6)

Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Discovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSKDiscovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSK
 
Kecerdasan spiritual gol_iii
Kecerdasan spiritual gol_iiiKecerdasan spiritual gol_iii
Kecerdasan spiritual gol_iii
 

Ähnlich wie Cloudy with a Touch of Cheminformatics

Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific databaseJohn De Goes
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組みRyousei Takano
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Benoit Perroud
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...Big Data Spain
 
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...ricky_pi_tercios
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Servicesstephenjbarr
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)Amazon Web Services Korea
 

Ähnlich wie Cloudy with a Touch of Cheminformatics (20)

Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Rise of the scientific database
Rise of the scientific databaseRise of the scientific database
Rise of the scientific database
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Scala+data
Scala+dataScala+data
Scala+data
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
No Sql
No SqlNo Sql
No Sql
 
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Parallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web ServicesParallel Computing for Econometricians with Amazon Web Services
Parallel Computing for Econometricians with Amazon Web Services
 
Apache Nemo
Apache NemoApache Nemo
Apache Nemo
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
 

Mehr von Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataRajarshi Guha
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Rajarshi Guha
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}Rajarshi Guha
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Rajarshi Guha
 

Mehr von Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
 
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
Predicting Activity Cliffs - Can Machine Learning Handle Special Cases?
 
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
 
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
Enabling Discoveries at High Throughput - Small molecule and RNAi HTS at the ...
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Cloudy with a Touch of Cheminformatics

  • 1. Cloudy  with  a  Touch  of   Cheminforma4cs   Rajarshi  Guha,  Tyler  Peryea,  Dac-­‐Trung  Nguyen   NIH  Center  for  Advancing  Transla@onal  Science     Chemaxon  UGM   September  26th,  2012   Wellesley,  MA  
  • 2. Parallel  compu4ng  in  the  cloud   •  Modern  cloud  vendors  make  provisioning   compute  resources  easy   –  Allows  one  to  handle  unpredictable  loads  easily   –  Pay  only  for  what  you  need   •  Chemistry  applica<ons  don’t  usually  have  very   dynamic  loads   •  But  large  scale  resources  are  an  opportunity  for   large  scale  (parallel)  computa<ons  
  • 3. All  HPC  is  not  equal   •  Use  cloud  resources  in   •  Make  use  of  cloud   •  Huge  datasets   the  same  way  as  a  local   capabili<es   •  Candidates  for  map-­‐ cluster   •  Old  algorithms,  new   reduce   •  MIT  StarCluster  makes   infrastructure   •  Involves  algorithm     this  easy  to  do   •  Spot  instances,  SNS,   (re)design   SQS  SimpleDB,  S3,  etc   Legacy   Cloudy   Big  Data   HPC   HPC   HPC   hOp://www.slideshare.net/chrisdag/mapping-­‐life-­‐science-­‐informa<cs-­‐to-­‐the-­‐cloud  
  • 4. Big  data  &  cheminforma4cs   •  Computa<on  over  large  chemical  databases   –  Pubchem,  ChEMBL,  GDB-­‐13,  …   •  What  types  of  computa<ons?   –  Searches  (substructure,  pharmacophore,  ….)   –  QSAR  models  &  predic<ons  over  large  data   •  Fundamentally,  “big  chemical  data”  lets  us   explore  larger  chemical  spaces  
  • 5. Map-­‐Reduce   copy sort Split 0 Map merge Reduce Part 0 Split 1 Map merge Reduce Part 1 Split 2 Map K1,V1 ! list ( K 2 ,V2 ) K 2 , list (V2 ) ! list ( K 3,V3 ) Tom  White,  Hadoop,  The  Defini/ve  Guide.  3rd  Ed.  O’Reilly    
  • 6. Coun4ng  atoms   •  The  chemical  version  of  the  word  coun<ng  task   Arbitrary line Atom list (V2) SMILES (V1) Atom numbers (K1) Occurence (V2) Symbol (K2) Symbol (K2) 1, Nc1ccc2ncccc2c1N N, list(1,1,1,1,...) 2, Cl.CC1CCc2nc3ccccc3c(C)c2C1 N1 C, list(1,1,1,1,...) . N1 . N1 . N1 152366, Nc1ccc2ncccc2c1N MAP   . Reduce   . Atom Count (V3) Symbol (K3) N,100 C,5684 . . .
  • 7. The  Hadoop  ecosystem   Chukwa Zookeeper Flume Pig HBase Mahout Avro Whirr Map Reduce Engine Hama Hadoop Distributed Hive Filesystem Hadoop Common Based  on  hOp://www.slideshare.net/informa<cacorp/101111-­‐part-­‐3-­‐maO-­‐asleO-­‐the-­‐hadoop-­‐ecosystem  
  • 8. Cheminforma4cs  on  Hadoop   •  Hadoop  and  Atom  Coun<ng   •  Hadoop  and  SD  Files   •  Cheminforma<cs,  Hadoop  and  EC2   •  Pig  and  Cheminforma<cs     But  are  cheminforma@cs  problems     really  big  enough  to  jus@fy  all  of  this?  
  • 9. Simplifying  Hadoop  applica4ons   package gov.nih.ncgc.hadoop; public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter import chemaxon.formats.MolFormatException; reporter) throws IOException { import chemaxon.formats.MolImporter; •  Raw  Hadoop     Molecule mol = MolImporter.importMol(value.toString()); import chemaxon.license.LicenseManager; matches.set(mol.getName()); import chemaxon.license.LicenseProcessingException; search.setTarget(mol); import chemaxon.sss.search.MolSearch; try { import chemaxon.sss.search.SearchException; if (search.isMatching()) { import chemaxon.struc.Molecule; output.collect(matches, one); import org.apache.hadoop.conf.Configuration; } else { programs  can     import org.apache.hadoop.conf.Configured; output.collect(matches, zero); import org.apache.hadoop.filecache.DistributedCache; } import org.apache.hadoop.fs.Path; } catch (SearchException e) { import org.apache.hadoop.io.IntWritable; } import org.apache.hadoop.io.LongWritable; } import org.apache.hadoop.io.Text; } import org.apache.hadoop.mapred.FileInputFormat; be  tedious  to     import org.apache.hadoop.mapred.FileOutputFormat; public static class SmartsMatchReducer extends MapReduceBase implements Reducer<Text, import org.apache.hadoop.mapred.JobClient; IntWritable, Text, IntWritable> { import org.apache.hadoop.mapred.JobConf; private IntWritable result = new IntWritable(); import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; public void reduce(Text key, import org.apache.hadoop.mapred.OutputCollector; Iterator<IntWritable> values, import org.apache.hadoop.mapred.Reducer; OutputCollector<Text, IntWritable> output, write   import org.apache.hadoop.mapred.Reporter; Reporter reporter) throws IOException { import org.apache.hadoop.mapred.TextInputFormat; while (values.hasNext()) { import org.apache.hadoop.mapred.TextOutputFormat; if (values.next().compareTo(one) == 0) { import org.apache.hadoop.util.Tool; result.set(1); import org.apache.hadoop.util.ToolRunner; output.collect(key, result); } import java.io.BufferedReader; } import java.io.FileReader; } import java.io.IOException; } import java.util.Iterator; public int run(String[] args) throws Exception { /** JobConf jobConf = new JobConf(getConf(), HeavyAtomCount.class); * SMARTS searching over a set of files using Hadoop. jobConf.setJobName("smartsSearch"); * * @author Rajarshi Guha jobConf.setOutputKeyClass(Text.class); */ jobConf.setOutputValueClass(IntWritable.class); public class SmartsSearch extends Configured implements Tool { private final static IntWritable one = new IntWritable(1); jobConf.setMapperClass(MoleculeMapper.class); private final static IntWritable zero = new IntWritable(0); jobConf.setCombinerClass(SmartsMatchReducer.class); jobConf.setReducerClass(SmartsMatchReducer.class); public static class MoleculeMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { jobConf.setInputFormat(TextInputFormat.class); private String pattern = null; jobConf.setOutputFormat(TextOutputFormat.class); private MolSearch search; jobConf.setNumMapTasks(5); public void configure(JobConf job) { if (args.length != 4) { try { System.err.println("Usage: ss <in> <out> <pattern> <license file>"); Path[] licFiles = DistributedCache.getLocalCacheFiles(job); System.exit(2); BufferedReader reader = new BufferedReader(new FileReader(licFiles[0].toString())); } StringBuilder license = new StringBuilder(); String line; FileInputFormat.setInputPaths(jobConf, new Path(args[0])); while ((line = reader.readLine()) != null) license.append(line); FileOutputFormat.setOutputPath(jobConf, new Path(args[1])); reader.close(); jobConf.setStrings("pattern", args[2]); LicenseManager.setLicense(license.toString()); } catch (IOException e) { // make the license file available vis dist cache } catch (LicenseProcessingException e) { DistributedCache.addCacheFile(new Path(args[3]).toUri(), jobConf); } JobClient.runJob(jobConf); pattern = job.getStrings("pattern")[0]; return 0; search = new MolSearch(); } try { Molecule queryMol = MolImporter.importMol(pattern, "smarts"); public static void main(String[] args) throws Exception { search.setQuery(queryMol); } catch (MolFormatException e) { int res = ToolRunner.run(new Configuration(), new SmartsSearch(), args); } } SMARTS  based     } } final static IntWritable one = new IntWritable(1); Text matches = new Text(); substructure  search    
  • 10. Pig  &  Pig  La4n   •  Pig  La<n  programs  are  much  simpler  to  write   and  get  translated  to   A = load 'medium.smi' as (smiles:chararray); B = filter A by gov.nih.ncgc.hadoop.pig.SMATCH(smiles, 'NC(=O)C(=O)N'); store B into 'output.txt'; Hadoop  code   SMARTS  search  in     Pig  La<n   •  SQL-­‐like,  requires     package gov.nih.ncgc.hadoop.pig; import chemaxon.formats.MolImporter; UDF  to  be     import chemaxon.sss.search.MolSearch; import chemaxon.sss.search.SearchException; import chemaxon.struc.Molecule; import org.apache.pig.FilterFunc; implemented  to     import org.apache.pig.data.Tuple; import java.io.IOException; perform     public class SMATCH extends FilterFunc { static MolSearch search = null; non-­‐standard  tasks   public Boolean exec(Tuple tuple) throws IOException { if (tuple == null || tuple.size() < 2) return false; String target = (String) tuple.get(0); String query = (String) tuple.get(1); try { Molecule queryMol = MolImporter.importMol(query, "smarts"); search.setQuery(queryMol); search.setTarget(MolImporter.importMol(target, "smiles")); return search.isMatching(); } catch (SearchException e) { e.printStackTrace(); } return false; } } UDF  for  SMARTS  search  
  • 11. Going  beyond  chunking?   •  All  the  preceding  use  cases  are  embarrassingly   parallel     –  Chunking  the  input  data  and  applying  the  same   opera<on  to  each  chunk   –  Very  nice  when  you  have  a  big  cluster   Are  there  algorithms  in     cheminforma@cs  that    can  employ     map-­‐reduce  at  the  algorithmic  level?  
  • 12. Going  beyond  chunking?   •  Applica<ons  that  make  use  of  pairwise  (or  higher   order)  calcula<ons  could  benefit  from  a  map-­‐ reduce  incarna<on   –  Doesn’t  necessarily  avoid  the  O(N2)  barrier   –  Bioisostere  iden<fica<on  is  one  case  that  could  be   rephrased  as  a  map-­‐reduce  problem   •  Map-­‐Reduce  Design  PaOerns  
  • 13. Iden4fying  MMPs   •  First  step  in  iden<fying  bioisosteres  is  to  iden<fy   candidate  matched  molecular  pairs   –  Naïve  all  pairs  comparison   –  Predefined  list  of  transforma<ons     •  Birch  et  al,  BMCL,  2009   –  Fragment  intersec<on   •  Hussain  et  al,  JCIM,  2010   –  MCS  based  approaches  (e.g.,  WizePairZ)   •  Warner  et  al,  JCIM,  2010    
  • 14. Naïve  Bioisostere  evalua4on   N  molecules   N(N-­‐1)/2  comparisons   ...
  • 15. Scaffold  seeding   Seed  Fragment:   Members:  
  • 16. Scaffold  seeded  bioisosteres   M(M-­‐1)/2  comparisons   M(M-­‐1)/2  comparisons  
  • 17. Seeded  bioisosteres  –  MR  style   • Do  pairwise  MCS   REDUCE   analysis  on  scaffold   • Collect  pairs  of   series   SMILES  for  a  given   • For  each  pair   SMIRKS   output  SMIRKS   • Store  in  DB,  or   transform  and  the   pair  of  SMILES   • Filter  by  ac<vity,  or   • …   MAP  
  • 18. Does  seeding  help?   •  Doesn’t  bypass  the  O(N2)  barrier  –  does  reduce  the   constant   •  Depends  on  how  many  scaffolds  and  the     number  of  member  for   1e+14 each  scaffold   •  Certainly  useful  when   log Number of pairwise comparisons 1e+11 there  a  few  members   Method per  scaffold   1e+08 all seeded.7 seeded.21 •  Highly  populated   seeded.100 scaffolds  can  throw   things  off   1e+05 1e+03 1e+05 1e+07 log Number of molecules
  • 19. Data   •  Exhaus<vely  fragmented  ChEMBL  13   •  Iden<fied  scaffolds  with         N members   ! 1.8 N scaffold   •  Ended  up  with  231,875  scaffolds     1e+08 –  Covers  235,693  unique  molecules   log Comparisons –  Average  of  7  members  per  scaffold   1e+05 –  95%  of  scaffolds  had  <  21  members   –  99.5%  had  <  74  members   1e+02 •  The  0.05%  are  a  bit  problema<c   All Seeded Method
  • 20. Timing  experiments   •  Selected  50  scaffolds  with  10  or  fewer  members   •  Configured  so  as  to  have  ~  5  maps   •  Effec<ve  running  <me  for   the  en<re  job  is  3.8  min   200 on  Hadoop   150 –  Only  needed  5  of  8  map   slots  on  our  “cluster”   Time (s) 100 •  Takes  ~  6  min  without   50 Hadoop   0 1 2 3 4 5 Job Number
  • 21. Timing  experiments   •  Selected  1000  scaffolds  with  20  or  fewer   members   –  Ran  with  10  scaffolds  /  map   •  Hadoop  run  <me   was  ~  2  hr   15 –  Most  maps  were   Number of Jobs 10 fast  (<  20  sec)   •  Serial  evalua<on   5 would  be  >  7  hr   0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 log Time (s)
  • 22. A  M-­‐R  workflow   •  We’re  currently  focused  on  just  the  MMP  step  as   as  a  MR  example   •  Could  also  include  fragmenta<on  step  as  part  of   the  workflow   –  But  a  pre-­‐calculated  set  of  scaffolds  is  more  sensible   •  Store  transforma<ons  and  members  in  HBase   •  Link  with  ac<vity  data  and  apply  structure  &   ac<vity  filters  on  candidate  pairs  
  • 23. What  Hadoop  is  not  for   •  Doesn’t  replace  an  actual  database   •  It’s  not  uniformly  fast  or  efficient   •  Not  good  for  ad  hoc  or  real-­‐<me  analysis   •  Generally  not  effec<ve  unless  dealing  with   massive  datasets   •  All  algorithms  are  not  amenable  to  the  map-­‐ reduce  method  
  • 24. Conclusions   •  Cheminforma<cs  applica<ons  can  be  rehosted  or   rewriOen  to  take  advantage  of  cloud  resources   –  Remotely  hosted     –  Embarrassingly  parallel  /  chunked   –  Map/reduce     •  Ability  to  process  larger  structure  collec<ons  lets   us  explore  more  chemical  space   •  “Big  data”  isn’t  really  that  big  in  chemistry  
  • 25. Conclusions   •  Q:  But  are  cheminforma/cs  problems  really  big   enough  to  jus/fy  all  of  this?     •  A:  Yes  –  virtual  libraries,  integra<ng  chemical   structure  with  other  types  and  scales  of  data   •  Q:  Are  there  algorithms  in  cheminforma/cs  that     can  employ  map-­‐reduce  at  the  algorithmic  level?   •  A:  Yes  –  especially  when  we  consider  problems   with  a  combinatorial  flavor