SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
What	
  is	
  So	
  Special	
  About	
  Science	
  
Clouds	
  and	
  Why	
  Does	
  It	
  Ma8er?	
  	
  
November	
  17,	
  2013	
  
Robert	
  L.	
  Grossman	
  
University	
  of	
  Chicago	
  
Open	
  Data	
  Group	
  
Open	
  Cloud	
  ConsorLum	
  
Part	
  1	
  
Clouds	
  

2	
  
In	
  2011,	
  aNer	
  
several	
  years	
  and	
  
15	
  draNs,	
  NIST	
  
developed	
  a	
  
definiLon	
  of	
  a	
  
cloud	
  that	
  is	
  now	
  
the	
  standard	
  
definiLon.	
  
EssenLal	
  CharacterisLcs	
  of	
  a	
  Cloud	
  
1.  Self	
  Service	
  	
  
2.  Scale	
  

4	
  
Self	
  Service	
  

Self	
  Service	
  

5	
  
Scale	
  

6	
  
Cloud	
  Deployment	
  Models	
  
•  Public	
  Clouds	
  	
  
–  Vendors	
  offering	
  cloud	
  services,	
  such	
  as	
  Amazon.	
  

•  Private	
  Clouds	
  
–  Run	
  internally	
  by	
  company	
  or	
  organizaLon,	
  such	
  
as	
  the	
  University	
  of	
  Chicago.	
  

•  Community	
  Clouds	
  
–  Run	
  by	
  a	
  community	
  or	
  	
  organizaLons	
  (either	
  
formally	
  or	
  informally),	
  such	
  as	
  the	
  Open	
  Cloud	
  
ConsorLum	
  
7	
  
How	
  do	
  you	
  measure	
  compute	
  
capacity	
  for	
  science	
  clouds?	
  

TB?	
  PB?	
  EB?	
  	
  

100’s?	
  1,000’s?	
  10,000’s?	
  
Another	
  way:	
  

opencompute.org	
  

Think	
  of	
  science	
  clouds	
  as	
  large	
  if	
  you	
  measure	
  
them	
  in	
  MW,	
  as	
  in	
  Facebook’s	
  Pineville	
  Data	
  
Center	
  is	
  30	
  MW.	
  
What	
  about	
  automaLc	
  provisioning	
  and	
  
infrastructure	
  management?	
  	
  
This	
  is	
  not	
  a	
  cloud.	
  

11	
  
This	
  is	
  a	
  cloud.	
  
Commercial	
  Cloud	
  Service	
  Provider	
  (CSP)	
  	
  
15	
  MW	
  Data	
  Center	
  
Monitoring,	
  
network	
  security	
  
and	
  forensics	
  
AutomaLc	
  
provisioning	
  and	
  
infrastructure	
  
management	
  

AccounLng	
  and	
  
billing	
  

Customer	
  
Facing	
  
Portal	
  

100,000	
  servers	
  
1	
  PB	
  DRAM	
  
100’s	
  of	
  PB	
  of	
  disk	
   ~1	
  Tbps	
  egress	
  bandwidth	
  
	
  

25	
  operators	
  for	
  15	
  MW	
  Commercial	
  Cloud	
  

Data	
  center	
  network	
  
Requirement of a cloud
computing infrastructure

Rack	
  /	
  Container	
  Test:	
  	
  The	
  
addiLon	
  of	
  racks	
  /	
  containers	
  
of	
  cores	
  and	
  disks	
  is	
  
automated	
  and	
  does	
  not	
  
require	
  changing	
  the	
  soNware	
  
stack,	
  but	
  aNerwards	
  the	
  
capacity	
  of	
  the	
  system	
  has	
  
increased.	
  
•  For	
  many	
  organizaLons,	
  

system	
  administrators	
  are	
  
just	
  performing	
  a	
  service.	
  
•  It’s	
  considered	
  a	
  good	
  
pracLce	
  to	
  outsource	
  the	
  
service	
  to	
  the	
  lowest	
  cost	
  
provider.	
  
15	
  

•  At	
  good	
  cloud	
  service	
  
providers,	
  development	
  and	
  
operaLons	
  are	
  integrated	
  
(devops).	
  	
  
•  SRE/Devops	
  are	
  considered	
  
key	
  personnel.	
  
Latency	
  is	
  Difficult	
  
EssenLal	
  CharacterisLcs	
  of	
  a	
  Cloud	
  

1. 
2. 
3. 
4. 

Self	
  Service	
  	
  
Scale	
  
Infrastructure	
  management	
  and	
  automaLon	
  
Focus	
  on	
  devops	
  

17	
  
Part	
  2	
  
Science	
  Clouds	
  

18	
  
Some	
  Examples	
  of	
  the	
  Sizes	
  of	
  
Datasets	
  Produced	
  by	
  Instruments	
  
Discipline	
  

Dura5on	
   Size	
  

#	
  Devices	
  

HEP	
  -­‐	
  LHC	
  

10	
  years	
   15	
  PB/year*	
  

One	
  

Astronomy	
  -­‐	
  LSST	
   10	
  years	
   12	
  PB/year**	
  
Genomics	
  -­‐	
  NGS	
  

One	
  

2-­‐4	
  years	
   0.5	
  TB/genome	
   1000’s	
  

N.B.	
  	
  This	
  is	
  just	
  the	
  data	
  produced	
  by	
  the	
  instrument	
  itself.	
  	
  The	
  analysis	
  of	
  this	
  
data	
  produces	
  significantly	
  more	
  data.	
  
*At	
  full	
  capacity,	
  the	
  Large	
  Hadron	
  Collider	
  (LHC),	
  the	
  world's	
  largest	
  parLcle	
  accelerator,	
  is	
  expected	
  to	
  produce	
  more	
  than	
  15	
  
million	
  Gigabytes	
  of	
  data	
  each	
  year.	
  	
  …	
  This	
  ambiLous	
  project	
  connects	
  and	
  combines	
  the	
  IT	
  power	
  of	
  more	
  than	
  140	
  computer	
  
centres	
  in	
  33	
  countries.	
  	
  Source:	
  h8p://press.web.cern.ch/public/en/Spotlight/SpotlightGrid_081008-­‐en.html	
  
	
  
**As	
  it	
  carries	
  out	
  its	
  10-­‐year	
  survey,	
  LSST	
  will	
  produce	
  over	
  15	
  terabytes	
  of	
  raw	
  astronomical	
  data	
  each	
  night	
  (30	
  terabytes	
  
processed),	
  resulLng	
  in	
  a	
  database	
  catalog	
  of	
  22	
  petabytes	
  and	
  an	
  image	
  archive	
  of	
  100	
  petabytes.	
  	
  Source:	
  h8p://www.lsst.org/
News/enews/teragrid-­‐1004.html	
  
Sci	
  CSP	
  services	
  

Data	
  scienLst	
  

Science	
  Cloud	
  	
  
Service	
  Provider	
  (Sci	
  CSP)	
  
What	
  are	
  some	
  of	
  the	
  important	
  
differences	
  between	
  commercial	
  
and	
  research-­‐focused	
  Sci	
  CSPs?	
  	
  
vs.	
  
Amazon	
  Web	
  Services	
  
(AWS)?	
  

Community	
  clouds,	
  
science	
  clouds,	
  etc.	
  

•  Lower	
  cost	
  (at	
  medium	
  &	
  large	
  scale)	
  
•  Some	
  data	
  too	
  important	
  to	
  be	
  stored	
  
•  Scale	
  
exclusively	
  in	
  commercial	
  cloud	
  
•  Simplicity	
  of	
  a	
  credit	
  card	
   •  CompuLng	
  over	
  scienLfic	
  data	
  is	
  a	
  core	
  
•  Wide	
  variety	
  of	
  offerings.	
  
competency	
  
•  Can	
  support	
  any	
  required	
  governance	
  /	
  
security	
  model	
  

It	
  is	
  essenLal	
  that	
  community	
  science	
  clouds	
  
interoperate	
  with	
  public	
  clouds.	
  

22	
  
POV	
  

Science	
  Clouds	
  
DemocraLze	
  access	
  to	
  
data.	
  	
  Integrate	
  data	
  to	
  
make	
  discoveries.	
  	
  Long	
  
term	
  archive.	
  

Commercial	
  Clouds	
  
As	
  long	
  as	
  you	
  pay	
  the	
  bill;	
  
as	
  long	
  as	
  the	
  business	
  
model	
  holds.	
  

Internet	
  style	
  scale	
  out	
  
Science	
  Clouds	
  bject-­‐based	
  storage	
  
and	
  o

Data	
  &	
  
Storage	
  

In	
  addiLon,	
  data	
  
intensive	
  compuLng	
  &	
  
HP	
  storage	
  

Flows	
  
AccounLng	
  
Lock	
  in	
  

Large	
  &	
  small	
  data	
  flows	
   Lots	
  of	
  small	
  web	
  flows	
  
EssenLal	
  
EssenLal	
  
Moving	
  environment	
  
Lock	
  in	
  is	
  good	
  
between	
  CSPs	
  essenLal	
  

Interop	
  

CriLcal,	
  but	
  difficult	
  

Customers	
  will	
  drive	
  to	
  
some	
  degree	
  
23	
  
EssenLal	
  Services	
  for	
  a	
  Science	
  CSP	
  
•  Support	
  for	
  data	
  intensive	
  compuLng	
  
•  Support	
  for	
  big	
  data	
  flows	
  
•  Account	
  management,	
  authenLcaLon	
  and	
  
authorizaLon	
  services	
  
•  Health	
  and	
  status	
  monitoring	
  
•  Billing	
  and	
  accounLng	
  
•  Ability	
  to	
  rapidly	
  provision	
  infrastructure	
  
•  Security	
  services,	
  logging,	
  event	
  reporLng	
  
•  Access	
  to	
  large	
  amounts	
  of	
  public	
  data	
  
•  High	
  performance	
  storage	
  
•  Simple	
  data	
  export	
  and	
  import	
  services	
  
Sci	
  CSP	
  services	
  

Data	
  scienLst	
  

Datascope	
  –	
  Science	
  Cloud	
  	
  
Service	
  Provider	
  (Sci	
  CSP)	
  

Cloud	
  Service	
  OperaLons	
  
Center	
  (CSOC)	
  
Part	
  3.	
  
Open	
  Science	
  Data	
  Cloud	
  
Number	
  

1000’s	
  

Individual	
  scienLsts	
  &	
  
small	
  projects	
  

100’s	
  

Community	
  based	
  
science	
  via	
  Science	
  as	
  a	
  
Service	
  
very	
  large	
  projects	
  

10’s	
  

Data	
  Size	
  

Small	
  

Public	
  
infrastructure	
  

Medium	
  to	
  Large	
  	
   Very	
  Large	
  
Shared	
  community	
  
infrastructure	
  

Dedicated	
  	
  
infrastructure	
  
The	
  long	
  tail	
  of	
  data	
  science	
  

A	
  few	
  large	
  data	
  
science	
  projects.	
  

Many	
  smaller	
  data	
  
science	
  projects.	
  
Commercial	
  Cloud	
  Service	
  Provider	
  (CSP)	
  	
  
15	
  MW	
  Data	
  Center	
  
Monitoring,	
  
network	
  security	
  
and	
  forensics	
  
AutomaLc	
  
provisioning	
  and	
  
infrastructure	
  
management	
  

AccounLng	
  and	
  
billing	
  

Customer	
  
Facing	
  
Portal	
  

100,000	
  servers	
  
1	
  PB	
  DRAM	
  
100’s	
  of	
  PB	
  of	
  disk	
   ~1	
  Tbps	
  egress	
  bandwidth	
  
	
  

25	
  operators	
  for	
  15	
  MW	
  Commercial	
  Cloud	
  

Data	
  center	
  network	
  
Open	
  Science	
  Data	
  Cloud	
  
Compliance,	
  &	
  
security	
  (OCM)	
  
Infrastructure	
  
automaLon	
  &	
  
management	
  
(Yates)	
  

AccounLng	
  &	
  
billing	
  
(Salesforce.com)	
  
Science	
  Cloud	
  SW	
  
&	
  Services	
  
Cores	
  &	
  Disks	
  
(OpenStack,	
  
GlusterFS	
  &	
  
Hadoop)	
  

6	
  engineers	
  to	
  operate	
  0.5	
  MW	
  Science	
  Cloud	
  

• 
• 
• 
• 
• 

Customer	
  Facing	
  
Portal	
  (Tukey)	
  

~10-­‐100	
  Gbps	
  bandwidth	
  
	
  

Data	
  center	
  network	
  

Virtual	
  Machine	
  (VM)	
  containing	
  common	
  applicaLons	
  &	
  pipelines	
  
Tukey	
  (OSDC	
  portal	
  &	
  middleware	
  v0.2)	
  
Yates	
  (infrastructure	
  automaLon	
  and	
  management	
  v0.1)	
  
UDR	
  /	
  UDT	
  for	
  high	
  performance	
  data	
  transport	
  
Interoperate	
  with	
  other	
  clouds	
  (upcoming)	
  and	
  proprietary	
  systems	
  (such	
  as	
  
Globus	
  Online.)	
  
The	
  Open	
  Science	
  Data	
  Cloud	
  (OSDC)	
  is	
  a	
  producLon	
  	
  
5	
  PB*,	
  7500	
  core,	
  wide	
  area	
  10G	
  cloud.	
  
*10	
  PB	
  raw	
  storage.	
  

www.opensciencedatacloud.org	
  
•  U.S	
  based	
  not-­‐for-­‐profit	
  corporaLon.	
  
•  Manages	
  cloud	
  compuLng	
  infrastructure	
  to	
  
support	
  scienLfic	
  research:	
  Open	
  Science	
  Data	
  
Cloud.	
  
•  Manages	
  cloud	
  compuLng	
  infrastructure	
  to	
  
support	
  medical	
  and	
  health	
  care	
  research:	
  
Biomedical	
  Commons	
  Cloud	
  
•  Manages	
  cloud	
  compuLng	
  testbeds:	
  Open	
  Cloud	
  
Testbed.	
  
	
  

www.opencloudconsorLum.org	
  

32	
  
•  Companies:	
  Cisco,	
  Yahoo!,	
  Infoblox,	
  …	
  
•  UniversiLes:	
  	
  University	
  of	
  Chicago,	
  Northwestern	
  
Univ.,	
  Johns	
  Hopkins,	
  Calit2,	
  LLNL,	
  University	
  of	
  
Illinois	
  at	
  Chicago,	
  …	
  
•  Federal	
  agencies	
  and	
  labs:	
  NASA,	
  LLNL,	
  …	
  
•  InternaLonal	
  Partners:	
  AIST	
  (Japan),	
  U.	
  Edinburgh,	
  U.	
  
Amsterdam,	
  …	
  

www.opencloudconsorLum.org	
  

33	
  
Science	
  Cloud	
  

• 
• 
• 
• 
• 

Earth	
  sciences	
  
Biological	
  sciences	
  
Social	
  sciences	
  
Digital	
  humaniLes	
  
ACL,	
  groups,	
  etc.	
  

Biomedical	
  Cloud	
  

Designed	
  to	
  hold	
  Protected	
  
Health	
  InformaLon	
  (PHI)	
  
e.g.	
  genomic	
  data,	
  
electronic	
  medical	
  records,	
  
etc.	
  	
  (HIPAA,	
  FISMA)	
  
What	
  You	
  Get	
  with	
  the	
  OSDC	
  
•  Login	
  with	
  your	
  university	
  credenLals	
  via	
  
InCommon	
  
•  Launch	
  virtual	
  machines,	
  virtual	
  clusters,	
  
access	
  to	
  large	
  Hadoop	
  clusters,	
  etc.	
  
•  Access	
  PB+	
  of	
  open	
  and	
  protected	
  data	
  
•  Manage	
  files,	
  collecLons	
  of	
  files,	
  collecLons	
  of	
  
collecLons	
  
•  Manage	
  users,	
  groups	
  of	
  users	
  
•  Manage	
  accounts,	
  sub-­‐accounts	
  
•  Efficient	
  transfer	
  of	
  large	
  data	
  (UDT,	
  UDR)	
  
Our	
  Point	
  of	
  View	
  
•  We	
  want	
  to	
  develop	
  as	
  li8le	
  technology	
  and	
  
soNware	
  as	
  possible	
  –	
  we	
  want	
  others	
  to	
  develop	
  
soNware	
  and	
  technology.	
  
•  We	
  focus	
  on	
  providing	
  researchers	
  the	
  ability	
  to	
  
compute	
  over	
  large	
  and	
  very	
  large	
  datasets.	
  
•  We	
  need	
  open	
  source	
  soluLons.	
  
•  We	
  can	
  interoperate	
  with	
  proprietary	
  soluLons.	
  
•  We	
  are	
  working	
  to	
  make	
  interoperaLon	
  with	
  
AWS	
  seamless	
  
•  Run	
  lights	
  out	
  over	
  mulLple	
  data	
  centers	
  
connected	
  with	
  10G	
  (soon	
  100G)	
  	
  networks.	
  
OSDC	
  Cloud	
  Services	
  	
  
OperaLons	
  Center	
  (CSOC)	
  
•  The	
  OSDC	
  operates	
  a	
  Cloud	
  Services	
  
OperaLons	
  Center	
  (or	
  CSOC).	
  
•  It	
  is	
  a	
  CSOC	
  focused	
  on	
  supporLng	
  Science	
  
Clouds	
  for	
  researchers.	
  
OSDC	
  Racks	
  

2013	
  OSDC	
  rack	
  design	
  	
  
•  1	
  PB	
  /	
  rack	
  
•  1150	
  cores	
  /	
  rack	
  

•  How	
  quickly	
  can	
  we	
  
set	
  up	
  a	
  rack?	
  
•  How	
  efficiently	
  can	
  
we	
  operate	
  a	
  rack?	
  
(racks/admin)	
  
•  How	
  few	
  changes	
  
does	
  our	
  soNware	
  
stack	
  and	
  
operaLons	
  require	
  
when	
  we	
  add	
  new	
  
racks?	
  
Tukey	
  

•  Tukey	
  (based	
  in	
  part	
  on	
  Horizon).	
  
•  We	
  have	
  factored	
  out	
  digital	
  ID	
  service,	
  file	
  
sharing,	
  and	
  transport	
  from	
  the	
  	
  Bionimbus	
  and	
  
Matsu	
  Projects.	
  
Yates	
  
•  AutomaLon	
  
installaLon	
  of	
  
OSDC	
  soNware	
  
stack	
  on	
  rack	
  of	
  
computers.	
  
•  Based	
  upon	
  Chef	
  
•  Version	
  0.1	
  
UDR	
  

•  UDT	
  is	
  a	
  high	
  performance	
  network	
  transport	
  protocol	
  
•  UDR	
  =	
  rsync	
  +	
  UDT	
  	
  
•  It	
  is	
  easy	
  for	
  an	
  average	
  systems	
  administrator	
  to	
  keep	
  
100’s	
  of	
  TB	
  of	
  distributed	
  data	
  synchronized.	
  	
  
•  We	
  are	
  using	
  it	
  to	
  distribute	
  c.	
  1	
  PB	
  from	
  the	
  OSDC	
  
Bionimbus	
  Protected	
  Data	
  Cloud	
  

42	
  
Analyzing	
  Data	
  From	
  	
  
The	
  Cancer	
  Genome	
  Atlas	
  (TCGA)	
  
Current	
  Prac5ce	
  

With	
  Protected	
  Data	
  Cloud	
  (PDC)	
  

1.  Apply	
  to	
  dbGaP	
  for	
  access	
  
1.  Apply	
  to	
  dbGaP	
  for	
  access	
  
to	
  data.	
  
to	
  data.	
  
2.  Hire	
  staff,	
  set	
  up	
  and	
  
2.  Use	
  your	
  exisLng	
  NIH	
  grant	
  
operate	
  secure	
  compliant	
  
eRA	
  credenLals	
  to	
  login	
  to	
  
compuLng	
  environment	
  to	
  
mange	
  10	
  –	
  100+	
  TB	
  of	
  data.	
  	
  	
  
the	
  PDC,	
  select	
  the	
  data	
  
3.  Get	
  environment	
  approved	
  
that	
  you	
  want	
  to	
  analyze,	
  
by	
  your	
  research	
  center.	
  
and	
  the	
  pipelines	
  that	
  you	
  
4.  Setup	
  analysis	
  pipelines.	
  
want	
  to	
  use.	
  	
  
5.  Download	
  data	
  from	
  CG-­‐
Hub	
  (takes	
  days	
  to	
  weeks).	
  	
   3.  Begin	
  analysis.	
  
6.  Begin	
  analysis.	
  
OCC Project Matsu
Clouds to Support Earth Science

matsu.opensciencedatacloud.org	
  

44
Biomedical	
  Community	
  Cloud	
  
Medical	
  Research	
  
Center	
  A	
  

Medical	
  Research	
  
Center	
  C	
  

Cloud	
  for	
  
Public	
  Data	
  
	
  

Cloud	
  for	
  Controlled	
  
Genomic	
  Data	
  
	
  

Cloud	
  for	
  
EMR,	
  PHI,	
  
data	
  
Medical	
  Research	
  
Center	
  B	
  

Example:	
  Open	
  Cloud	
  ConsorLum’s	
  
Biomedical	
  Commons	
  Cloud	
  (BCC)	
  

Hospital	
  D	
  

Company	
  E	
  
45	
  
4.	
  Cloud	
  Condos	
  
Cyber	
  Condo	
  Model	
  
•  Research	
  insLtuLons	
  today	
  
have	
  access	
  to	
  high	
  
performance	
  networks	
  –	
  
10G	
  &	
  100G.	
  
•  They	
  couldn’t	
  afford	
  access	
  
to	
  these	
  networks	
  from	
  
commercial	
  providers.	
  
•  Over	
  a	
  decade	
  ago,	
  they	
  
got	
  together	
  to	
  buy	
  and	
  
light	
  fiber.	
  	
  	
  	
  
•  This	
  changed	
  how	
  we	
  do	
  
scienLfic	
  research.	
  
Cloud	
  Condos	
  
•  The	
  Open	
  Cloud	
  
ConsorLum’s	
  Burnham	
  
Facility	
  (in	
  planning)	
  is	
  a	
  
Cloud	
  Condo	
  model.	
  
•  This	
  infrastructure	
  
provides	
  a	
  sustainable	
  
home	
  for	
  large	
  commons	
  
of	
  research	
  data	
  (and	
  an	
  
infrastructure	
  to	
  compute	
  
over	
  it).	
  
•  Please	
  join	
  us.	
  
Some	
  Data	
  Commons	
  Guidelines	
  for	
  
the	
  Next	
  Five	
  Years	
  
•  There	
  is	
  a	
  societal	
  benefit	
  when	
  research	
  data	
  is	
  
available	
  in	
  data	
  commons	
  operated	
  by	
  a	
  NFP	
  (vs	
  sold	
  
exclusively	
  as	
  data	
  products	
  by	
  commercial	
  enLLes	
  or	
  
only	
  offered	
  for	
  download	
  by	
  the	
  USG).	
  
•  Large	
  data	
  commons	
  providers	
  should	
  peer.	
  
•  Data	
  commons	
  providers	
  should	
  develop	
  standards	
  for	
  
interoperaLng.	
  
•  Standards	
  should	
  not	
  be	
  developed	
  ahead	
  of	
  open	
  
source	
  reference	
  implementaLons.	
  
•  We	
  need	
  a	
  period	
  of	
  experimentaLon	
  as	
  we	
  develop	
  
the	
  best	
  technology	
  and	
  pracLces.	
  
•  The	
  details	
  are	
  hard	
  (consent,	
  publicaLon,	
  IDs,	
  open	
  vs	
  
controlled	
  access,	
  sustainability,	
  etc.)	
  
Working	
  with	
  the	
  OSDC	
  -­‐	
  CSP	
  
•  If	
  you	
  have	
  a	
  cloud,	
  please	
  interoperate	
  it	
  with	
  
the	
  OSDC.	
  
•  Work	
  with	
  us	
  to	
  design	
  and	
  prototype	
  
standards	
  so	
  that	
  Science	
  Clouds	
  and	
  Science	
  
Data	
  Commons	
  can	
  interoperate.	
  
–  Data	
  synchronizaLon	
  between	
  two	
  clouds	
  
–  APIs	
  to	
  access	
  data	
  	
  
–  Resvul	
  queries	
  	
  
–  Sca8ering	
  queries,	
  gathering	
  the	
  results	
  
–  Coordinated	
  analysis	
  
OSDC	
  SoNware	
  Ecosystem	
  
CSP	
  A	
  

University	
  E	
  

Hadoop	
  
AWS	
  
Tukey	
  
Bioninmbus	
  

Medical	
  Research	
  
Center	
  B	
  

GlusterFS	
  
OpenStack	
  
Hospital	
  D	
  

R	
  
Globus	
  Online	
  
UDT	
  

Startup	
  F	
  
Startup	
  G	
  

51	
  
Working	
  with	
  the	
  OSDC	
  -­‐	
  Researchers	
  	
  
• 
• 
• 
• 
• 

Apply	
  for	
  an	
  account	
  and	
  make	
  a	
  discovery	
  
Add	
  data	
  to	
  the	
  OSDC	
  
Add	
  your	
  soNware	
  to	
  the	
  OSDC	
  
Suggest	
  someone	
  else’s	
  data	
  to	
  add	
  
Suggest	
  someone	
  else’s	
  soNware	
  to	
  add	
  
Data	
  Commons	
  
CSP	
  A	
  

University	
  E	
  

TCGA	
  
EO1	
  

Social	
  sciences	
  data	
  

1000	
  Genomes	
  
census	
  
urban	
  sciences	
  data	
  

EMR	
  

Bookworm	
  
Hospital	
  D	
  

earth	
  cube	
  data	
  
Medical	
  Research	
  
Center	
  B	
  
Startup	
  F	
  
Startup	
  G	
  

53	
  
QuesLons?	
  

54	
  
Thank	
  You!	
  
For	
  more	
  informaLon	
  
•  @bobgrossman	
  
•  You	
  can	
  find	
  more	
  informaLon	
  on	
  my	
  blog:	
  
	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rgrossman.com.	
  
•  You	
  can	
  find	
  more	
  of	
  my	
  talks	
  on:	
  
	
  
	
  
	
  
	
  
	
  slideshare.net/rgrossman	
  

Center for
Research
Informatics
Major	
  funding	
  and	
  support	
  for	
  the	
  Open	
  Science	
  Data	
  Cloud	
  (OSDC)	
  is	
  provided	
  by	
  the	
  Gordon	
  and	
  
Be8y	
  Moore	
  FoundaLon.	
  	
  This	
  funding	
  is	
  used	
  to	
  support	
  the	
  OSDC-­‐Adler,	
  Sullivan	
  and	
  Root	
  faciliLes.	
  
	
  
AddiLonal	
  funding	
  for	
  the	
  OSDC	
  has	
  been	
  provided	
  by	
  the	
  following	
  sponsors:	
  
	
  
•  The	
  Bionimbus	
  Protected	
  Data	
  Cloud	
  is	
  supported	
  in	
  by	
  part	
  by	
  NIH/NCI	
  through	
  NIH/SAIC	
  Contract	
  
13XS021	
  /	
  HHSN261200800001E.	
  	
  
•  The	
  OCC-­‐Y	
  Hadoop	
  Cluster	
  (approximately	
  1000	
  cores	
  and	
  1	
  PB	
  of	
  storage)	
  was	
  donated	
  by	
  Yahoo!	
  
in	
  2011.	
  
•  Cisco	
  provides	
  the	
  OSDC	
  access	
  to	
  the	
  Cisco	
  C-­‐Wave,	
  which	
  connects	
  OSDC	
  data	
  centers	
  with	
  10	
  
Gbps	
  wide	
  area	
  networks.	
  
•  The	
  OSDC	
  is	
  supported	
  by	
  a	
  5-­‐year	
  (2010-­‐2016)	
  PIRE	
  award	
  (OISE	
  –	
  1129076)	
  to	
  train	
  scienLsts	
  to	
  
use	
  the	
  OSDC	
  and	
  to	
  further	
  develop	
  the	
  underlying	
  technology.	
  
•  OSDC	
  technology	
  for	
  high	
  performance	
  data	
  transport	
  is	
  support	
  in	
  part	
  by	
  	
  NSF	
  Award	
  1127316.	
  
•  The	
  StarLight	
  Facility	
  in	
  Chicago	
  enables	
  the	
  OSDC	
  to	
  connect	
  to	
  over	
  30	
  high	
  performance	
  
research	
  networks	
  around	
  the	
  world	
  at	
  10	
  Gbps	
  or	
  higher.	
  
•  Any	
  opinions,	
  findings,	
  and	
  conclusions	
  or	
  recommendaLons	
  expressed	
  in	
  this	
  material	
  are	
  those	
  
of	
  the	
  author(s)	
  and	
  do	
  not	
  necessarily	
  reflect	
  the	
  views	
  of	
  the	
  NaLonal	
  Science	
  FoundaLon,	
  NIH	
  or	
  
other	
  funders	
  of	
  this	
  research.	
  
	
  
The	
  OSDC	
  is	
  managed	
  by	
  the	
  Open	
  Cloud	
  ConsorLum,	
  a	
  501(c)(3)	
  not-­‐for-­‐profit	
  corporaLon.	
  If	
  you	
  are	
  
interested	
  in	
  providing	
  funding	
  or	
  donaLng	
  equipment	
  or	
  services,	
  please	
  contact	
  us	
  at	
  
info@opensciencedatacloud.org.	
  
Please	
  join	
  us!	
  
	
  
www.opensciencedatacloud.org	
  
www.opencloudconsorLum.org	
  	
  
	
  
	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)Robert Grossman
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataRobert Grossman
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Robert Grossman
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 

Was ist angesagt? (20)

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Large Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster ReliefLarge Scale On-Demand Image Processing For Disaster Relief
Large Scale On-Demand Image Processing For Disaster Relief
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
My Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big DataMy Other Computer is a Data Center: The Sector Perspective on Big Data
My Other Computer is a Data Center: The Sector Perspective on Big Data
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 

Andere mochten auch

Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Robert Grossman
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 

Andere mochten auch (8)

Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 

Ähnlich wie What Are Science Clouds?

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Impact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsImpact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsTal Lavian Ph.D.
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingCollin Bennett
 
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptdhanasekarscse
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
 
All Things Open SDN, NFV and Open Daylight
All Things Open SDN, NFV and Open Daylight All Things Open SDN, NFV and Open Daylight
All Things Open SDN, NFV and Open Daylight Mark Hinkle
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)Robert Grossman
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIBig Data Week
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computingCBOD ANR project U-PSUD
 
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillMPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillAlan Sill
 
Introduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSIntroduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSKeith Basil
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkKetan Paranjape
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...OCCIware
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...Marc Dutoo
 

Ähnlich wie What Are Science Clouds? (20)

2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Impact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW VendorsImpact of Grid Computing on Network Operators and HW Vendors
Impact of Grid Computing on Network Operators and HW Vendors
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
 
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
All Things Open SDN, NFV and Open Daylight
All Things Open SDN, NFV and Open Daylight All Things Open SDN, NFV and Open Daylight
All Things Open SDN, NFV and Open Daylight
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Dice presents-feb2014
Dice presents-feb2014Dice presents-feb2014
Dice presents-feb2014
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
Nabil Sultan. The disruptive and democratizing credentials of  cloud computingNabil Sultan. The disruptive and democratizing credentials of  cloud computing
Nabil Sultan. The disruptive and democratizing credentials of cloud computing
 
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillMPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
 
Introduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSIntroduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaS
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talk
 
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
Extensible and Standard-based XaaS Platform To Manage Everything in The Cloud...
 
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
OCCIware@CloudExpoLondon2017 - an extensible, standard XaaS Cloud consumer pl...
 

Mehr von Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 

Mehr von Robert Grossman (9)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 

Kürzlich hochgeladen

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Kürzlich hochgeladen (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

What Are Science Clouds?

  • 1. What  is  So  Special  About  Science   Clouds  and  Why  Does  It  Ma8er?     November  17,  2013   Robert  L.  Grossman   University  of  Chicago   Open  Data  Group   Open  Cloud  ConsorLum  
  • 3. In  2011,  aNer   several  years  and   15  draNs,  NIST   developed  a   definiLon  of  a   cloud  that  is  now   the  standard   definiLon.  
  • 4. EssenLal  CharacterisLcs  of  a  Cloud   1.  Self  Service     2.  Scale   4  
  • 5. Self  Service   Self  Service   5  
  • 7. Cloud  Deployment  Models   •  Public  Clouds     –  Vendors  offering  cloud  services,  such  as  Amazon.   •  Private  Clouds   –  Run  internally  by  company  or  organizaLon,  such   as  the  University  of  Chicago.   •  Community  Clouds   –  Run  by  a  community  or    organizaLons  (either   formally  or  informally),  such  as  the  Open  Cloud   ConsorLum   7  
  • 8. How  do  you  measure  compute   capacity  for  science  clouds?   TB?  PB?  EB?     100’s?  1,000’s?  10,000’s?  
  • 9. Another  way:   opencompute.org   Think  of  science  clouds  as  large  if  you  measure   them  in  MW,  as  in  Facebook’s  Pineville  Data   Center  is  30  MW.  
  • 10. What  about  automaLc  provisioning  and   infrastructure  management?    
  • 11. This  is  not  a  cloud.   11  
  • 12. This  is  a  cloud.  
  • 13. Commercial  Cloud  Service  Provider  (CSP)     15  MW  Data  Center   Monitoring,   network  security   and  forensics   AutomaLc   provisioning  and   infrastructure   management   AccounLng  and   billing   Customer   Facing   Portal   100,000  servers   1  PB  DRAM   100’s  of  PB  of  disk   ~1  Tbps  egress  bandwidth     25  operators  for  15  MW  Commercial  Cloud   Data  center  network  
  • 14. Requirement of a cloud computing infrastructure Rack  /  Container  Test:    The   addiLon  of  racks  /  containers   of  cores  and  disks  is   automated  and  does  not   require  changing  the  soNware   stack,  but  aNerwards  the   capacity  of  the  system  has   increased.  
  • 15. •  For  many  organizaLons,   system  administrators  are   just  performing  a  service.   •  It’s  considered  a  good   pracLce  to  outsource  the   service  to  the  lowest  cost   provider.   15   •  At  good  cloud  service   providers,  development  and   operaLons  are  integrated   (devops).     •  SRE/Devops  are  considered   key  personnel.  
  • 17. EssenLal  CharacterisLcs  of  a  Cloud   1.  2.  3.  4.  Self  Service     Scale   Infrastructure  management  and  automaLon   Focus  on  devops   17  
  • 18. Part  2   Science  Clouds   18  
  • 19. Some  Examples  of  the  Sizes  of   Datasets  Produced  by  Instruments   Discipline   Dura5on   Size   #  Devices   HEP  -­‐  LHC   10  years   15  PB/year*   One   Astronomy  -­‐  LSST   10  years   12  PB/year**   Genomics  -­‐  NGS   One   2-­‐4  years   0.5  TB/genome   1000’s   N.B.    This  is  just  the  data  produced  by  the  instrument  itself.    The  analysis  of  this   data  produces  significantly  more  data.   *At  full  capacity,  the  Large  Hadron  Collider  (LHC),  the  world's  largest  parLcle  accelerator,  is  expected  to  produce  more  than  15   million  Gigabytes  of  data  each  year.    …  This  ambiLous  project  connects  and  combines  the  IT  power  of  more  than  140  computer   centres  in  33  countries.    Source:  h8p://press.web.cern.ch/public/en/Spotlight/SpotlightGrid_081008-­‐en.html     **As  it  carries  out  its  10-­‐year  survey,  LSST  will  produce  over  15  terabytes  of  raw  astronomical  data  each  night  (30  terabytes   processed),  resulLng  in  a  database  catalog  of  22  petabytes  and  an  image  archive  of  100  petabytes.    Source:  h8p://www.lsst.org/ News/enews/teragrid-­‐1004.html  
  • 20. Sci  CSP  services   Data  scienLst   Science  Cloud     Service  Provider  (Sci  CSP)  
  • 21. What  are  some  of  the  important   differences  between  commercial   and  research-­‐focused  Sci  CSPs?    
  • 22. vs.   Amazon  Web  Services   (AWS)?   Community  clouds,   science  clouds,  etc.   •  Lower  cost  (at  medium  &  large  scale)   •  Some  data  too  important  to  be  stored   •  Scale   exclusively  in  commercial  cloud   •  Simplicity  of  a  credit  card   •  CompuLng  over  scienLfic  data  is  a  core   •  Wide  variety  of  offerings.   competency   •  Can  support  any  required  governance  /   security  model   It  is  essenLal  that  community  science  clouds   interoperate  with  public  clouds.   22  
  • 23. POV   Science  Clouds   DemocraLze  access  to   data.    Integrate  data  to   make  discoveries.    Long   term  archive.   Commercial  Clouds   As  long  as  you  pay  the  bill;   as  long  as  the  business   model  holds.   Internet  style  scale  out   Science  Clouds  bject-­‐based  storage   and  o Data  &   Storage   In  addiLon,  data   intensive  compuLng  &   HP  storage   Flows   AccounLng   Lock  in   Large  &  small  data  flows   Lots  of  small  web  flows   EssenLal   EssenLal   Moving  environment   Lock  in  is  good   between  CSPs  essenLal   Interop   CriLcal,  but  difficult   Customers  will  drive  to   some  degree   23  
  • 24. EssenLal  Services  for  a  Science  CSP   •  Support  for  data  intensive  compuLng   •  Support  for  big  data  flows   •  Account  management,  authenLcaLon  and   authorizaLon  services   •  Health  and  status  monitoring   •  Billing  and  accounLng   •  Ability  to  rapidly  provision  infrastructure   •  Security  services,  logging,  event  reporLng   •  Access  to  large  amounts  of  public  data   •  High  performance  storage   •  Simple  data  export  and  import  services  
  • 25. Sci  CSP  services   Data  scienLst   Datascope  –  Science  Cloud     Service  Provider  (Sci  CSP)   Cloud  Service  OperaLons   Center  (CSOC)  
  • 26. Part  3.   Open  Science  Data  Cloud  
  • 27. Number   1000’s   Individual  scienLsts  &   small  projects   100’s   Community  based   science  via  Science  as  a   Service   very  large  projects   10’s   Data  Size   Small   Public   infrastructure   Medium  to  Large     Very  Large   Shared  community   infrastructure   Dedicated     infrastructure  
  • 28. The  long  tail  of  data  science   A  few  large  data   science  projects.   Many  smaller  data   science  projects.  
  • 29. Commercial  Cloud  Service  Provider  (CSP)     15  MW  Data  Center   Monitoring,   network  security   and  forensics   AutomaLc   provisioning  and   infrastructure   management   AccounLng  and   billing   Customer   Facing   Portal   100,000  servers   1  PB  DRAM   100’s  of  PB  of  disk   ~1  Tbps  egress  bandwidth     25  operators  for  15  MW  Commercial  Cloud   Data  center  network  
  • 30. Open  Science  Data  Cloud   Compliance,  &   security  (OCM)   Infrastructure   automaLon  &   management   (Yates)   AccounLng  &   billing   (Salesforce.com)   Science  Cloud  SW   &  Services   Cores  &  Disks   (OpenStack,   GlusterFS  &   Hadoop)   6  engineers  to  operate  0.5  MW  Science  Cloud   •  •  •  •  •  Customer  Facing   Portal  (Tukey)   ~10-­‐100  Gbps  bandwidth     Data  center  network   Virtual  Machine  (VM)  containing  common  applicaLons  &  pipelines   Tukey  (OSDC  portal  &  middleware  v0.2)   Yates  (infrastructure  automaLon  and  management  v0.1)   UDR  /  UDT  for  high  performance  data  transport   Interoperate  with  other  clouds  (upcoming)  and  proprietary  systems  (such  as   Globus  Online.)  
  • 31. The  Open  Science  Data  Cloud  (OSDC)  is  a  producLon     5  PB*,  7500  core,  wide  area  10G  cloud.   *10  PB  raw  storage.   www.opensciencedatacloud.org  
  • 32. •  U.S  based  not-­‐for-­‐profit  corporaLon.   •  Manages  cloud  compuLng  infrastructure  to   support  scienLfic  research:  Open  Science  Data   Cloud.   •  Manages  cloud  compuLng  infrastructure  to   support  medical  and  health  care  research:   Biomedical  Commons  Cloud   •  Manages  cloud  compuLng  testbeds:  Open  Cloud   Testbed.     www.opencloudconsorLum.org   32  
  • 33. •  Companies:  Cisco,  Yahoo!,  Infoblox,  …   •  UniversiLes:    University  of  Chicago,  Northwestern   Univ.,  Johns  Hopkins,  Calit2,  LLNL,  University  of   Illinois  at  Chicago,  …   •  Federal  agencies  and  labs:  NASA,  LLNL,  …   •  InternaLonal  Partners:  AIST  (Japan),  U.  Edinburgh,  U.   Amsterdam,  …   www.opencloudconsorLum.org   33  
  • 34. Science  Cloud   •  •  •  •  •  Earth  sciences   Biological  sciences   Social  sciences   Digital  humaniLes   ACL,  groups,  etc.   Biomedical  Cloud   Designed  to  hold  Protected   Health  InformaLon  (PHI)   e.g.  genomic  data,   electronic  medical  records,   etc.    (HIPAA,  FISMA)  
  • 35. What  You  Get  with  the  OSDC   •  Login  with  your  university  credenLals  via   InCommon   •  Launch  virtual  machines,  virtual  clusters,   access  to  large  Hadoop  clusters,  etc.   •  Access  PB+  of  open  and  protected  data   •  Manage  files,  collecLons  of  files,  collecLons  of   collecLons   •  Manage  users,  groups  of  users   •  Manage  accounts,  sub-­‐accounts   •  Efficient  transfer  of  large  data  (UDT,  UDR)  
  • 36. Our  Point  of  View   •  We  want  to  develop  as  li8le  technology  and   soNware  as  possible  –  we  want  others  to  develop   soNware  and  technology.   •  We  focus  on  providing  researchers  the  ability  to   compute  over  large  and  very  large  datasets.   •  We  need  open  source  soluLons.   •  We  can  interoperate  with  proprietary  soluLons.   •  We  are  working  to  make  interoperaLon  with   AWS  seamless   •  Run  lights  out  over  mulLple  data  centers   connected  with  10G  (soon  100G)    networks.  
  • 37. OSDC  Cloud  Services     OperaLons  Center  (CSOC)   •  The  OSDC  operates  a  Cloud  Services   OperaLons  Center  (or  CSOC).   •  It  is  a  CSOC  focused  on  supporLng  Science   Clouds  for  researchers.  
  • 38. OSDC  Racks   2013  OSDC  rack  design     •  1  PB  /  rack   •  1150  cores  /  rack   •  How  quickly  can  we   set  up  a  rack?   •  How  efficiently  can   we  operate  a  rack?   (racks/admin)   •  How  few  changes   does  our  soNware   stack  and   operaLons  require   when  we  add  new   racks?  
  • 39. Tukey   •  Tukey  (based  in  part  on  Horizon).   •  We  have  factored  out  digital  ID  service,  file   sharing,  and  transport  from  the    Bionimbus  and   Matsu  Projects.  
  • 40. Yates   •  AutomaLon   installaLon  of   OSDC  soNware   stack  on  rack  of   computers.   •  Based  upon  Chef   •  Version  0.1  
  • 41. UDR   •  UDT  is  a  high  performance  network  transport  protocol   •  UDR  =  rsync  +  UDT     •  It  is  easy  for  an  average  systems  administrator  to  keep   100’s  of  TB  of  distributed  data  synchronized.     •  We  are  using  it  to  distribute  c.  1  PB  from  the  OSDC  
  • 42. Bionimbus  Protected  Data  Cloud   42  
  • 43. Analyzing  Data  From     The  Cancer  Genome  Atlas  (TCGA)   Current  Prac5ce   With  Protected  Data  Cloud  (PDC)   1.  Apply  to  dbGaP  for  access   1.  Apply  to  dbGaP  for  access   to  data.   to  data.   2.  Hire  staff,  set  up  and   2.  Use  your  exisLng  NIH  grant   operate  secure  compliant   eRA  credenLals  to  login  to   compuLng  environment  to   mange  10  –  100+  TB  of  data.       the  PDC,  select  the  data   3.  Get  environment  approved   that  you  want  to  analyze,   by  your  research  center.   and  the  pipelines  that  you   4.  Setup  analysis  pipelines.   want  to  use.     5.  Download  data  from  CG-­‐ Hub  (takes  days  to  weeks).     3.  Begin  analysis.   6.  Begin  analysis.  
  • 44. OCC Project Matsu Clouds to Support Earth Science matsu.opensciencedatacloud.org   44
  • 45. Biomedical  Community  Cloud   Medical  Research   Center  A   Medical  Research   Center  C   Cloud  for   Public  Data     Cloud  for  Controlled   Genomic  Data     Cloud  for   EMR,  PHI,   data   Medical  Research   Center  B   Example:  Open  Cloud  ConsorLum’s   Biomedical  Commons  Cloud  (BCC)   Hospital  D   Company  E   45  
  • 47. Cyber  Condo  Model   •  Research  insLtuLons  today   have  access  to  high   performance  networks  –   10G  &  100G.   •  They  couldn’t  afford  access   to  these  networks  from   commercial  providers.   •  Over  a  decade  ago,  they   got  together  to  buy  and   light  fiber.         •  This  changed  how  we  do   scienLfic  research.  
  • 48. Cloud  Condos   •  The  Open  Cloud   ConsorLum’s  Burnham   Facility  (in  planning)  is  a   Cloud  Condo  model.   •  This  infrastructure   provides  a  sustainable   home  for  large  commons   of  research  data  (and  an   infrastructure  to  compute   over  it).   •  Please  join  us.  
  • 49. Some  Data  Commons  Guidelines  for   the  Next  Five  Years   •  There  is  a  societal  benefit  when  research  data  is   available  in  data  commons  operated  by  a  NFP  (vs  sold   exclusively  as  data  products  by  commercial  enLLes  or   only  offered  for  download  by  the  USG).   •  Large  data  commons  providers  should  peer.   •  Data  commons  providers  should  develop  standards  for   interoperaLng.   •  Standards  should  not  be  developed  ahead  of  open   source  reference  implementaLons.   •  We  need  a  period  of  experimentaLon  as  we  develop   the  best  technology  and  pracLces.   •  The  details  are  hard  (consent,  publicaLon,  IDs,  open  vs   controlled  access,  sustainability,  etc.)  
  • 50. Working  with  the  OSDC  -­‐  CSP   •  If  you  have  a  cloud,  please  interoperate  it  with   the  OSDC.   •  Work  with  us  to  design  and  prototype   standards  so  that  Science  Clouds  and  Science   Data  Commons  can  interoperate.   –  Data  synchronizaLon  between  two  clouds   –  APIs  to  access  data     –  Resvul  queries     –  Sca8ering  queries,  gathering  the  results   –  Coordinated  analysis  
  • 51. OSDC  SoNware  Ecosystem   CSP  A   University  E   Hadoop   AWS   Tukey   Bioninmbus   Medical  Research   Center  B   GlusterFS   OpenStack   Hospital  D   R   Globus  Online   UDT   Startup  F   Startup  G   51  
  • 52. Working  with  the  OSDC  -­‐  Researchers     •  •  •  •  •  Apply  for  an  account  and  make  a  discovery   Add  data  to  the  OSDC   Add  your  soNware  to  the  OSDC   Suggest  someone  else’s  data  to  add   Suggest  someone  else’s  soNware  to  add  
  • 53. Data  Commons   CSP  A   University  E   TCGA   EO1   Social  sciences  data   1000  Genomes   census   urban  sciences  data   EMR   Bookworm   Hospital  D   earth  cube  data   Medical  Research   Center  B   Startup  F   Startup  G   53  
  • 56. For  more  informaLon   •  @bobgrossman   •  You  can  find  more  informaLon  on  my  blog:                                                  rgrossman.com.   •  You  can  find  more  of  my  talks  on:            slideshare.net/rgrossman   Center for Research Informatics
  • 57. Major  funding  and  support  for  the  Open  Science  Data  Cloud  (OSDC)  is  provided  by  the  Gordon  and   Be8y  Moore  FoundaLon.    This  funding  is  used  to  support  the  OSDC-­‐Adler,  Sullivan  and  Root  faciliLes.     AddiLonal  funding  for  the  OSDC  has  been  provided  by  the  following  sponsors:     •  The  Bionimbus  Protected  Data  Cloud  is  supported  in  by  part  by  NIH/NCI  through  NIH/SAIC  Contract   13XS021  /  HHSN261200800001E.     •  The  OCC-­‐Y  Hadoop  Cluster  (approximately  1000  cores  and  1  PB  of  storage)  was  donated  by  Yahoo!   in  2011.   •  Cisco  provides  the  OSDC  access  to  the  Cisco  C-­‐Wave,  which  connects  OSDC  data  centers  with  10   Gbps  wide  area  networks.   •  The  OSDC  is  supported  by  a  5-­‐year  (2010-­‐2016)  PIRE  award  (OISE  –  1129076)  to  train  scienLsts  to   use  the  OSDC  and  to  further  develop  the  underlying  technology.   •  OSDC  technology  for  high  performance  data  transport  is  support  in  part  by    NSF  Award  1127316.   •  The  StarLight  Facility  in  Chicago  enables  the  OSDC  to  connect  to  over  30  high  performance   research  networks  around  the  world  at  10  Gbps  or  higher.   •  Any  opinions,  findings,  and  conclusions  or  recommendaLons  expressed  in  this  material  are  those   of  the  author(s)  and  do  not  necessarily  reflect  the  views  of  the  NaLonal  Science  FoundaLon,  NIH  or   other  funders  of  this  research.     The  OSDC  is  managed  by  the  Open  Cloud  ConsorLum,  a  501(c)(3)  not-­‐for-­‐profit  corporaLon.  If  you  are   interested  in  providing  funding  or  donaLng  equipment  or  services,  please  contact  us  at   info@opensciencedatacloud.org.  
  • 58. Please  join  us!     www.opensciencedatacloud.org   www.opencloudconsorLum.org