Hybrid Strategies for Research Data Management

•

1 gefällt mir•334 views

While some early adopters have realized benefits by incorporating clouds into their analysis pipelines, many challenges remain. In this presentation we will highlight the critical issues associated with research data management, and describe alternative approaches for addressing these challenges by optimizing the use of local, distributed and cloud-hosted resources.

Technologie Business

Hybrid Strategies for
Research Data Management
Vas Vasiliadis, Computation Institute
vas@ci.uchicago.edu

computationinstitute.org

The Computation Institute
= UChicago + Argonne
= Cross-disciplinary nexus
= Home of the Research Cloud

computationinstitute.org

x10 in 6 years
x105 in 6 years

computationinstitute.org

1 PB data in last experiment
Accessed by 800 scientists
worldwide
computationinstitute.org

1.2 PB of climate data
Delivered to 23,000 users

computationinstitute.org

We have exceptional
infrastructure for the 1%

How can the 99% manage
this?

computationinstitute.org

What would a “dropbox for
science” look like?

computationinstitute.org

• Collect • Catalog
• Move • Publish
• Replicate • Search
• Share • Archive
• Analyze • Backup
…among distributed research groups
computationinstitute.org

Registry
Staging Ingest
Store Store

Community
Store
Analysis
Store

Archive Mirror

computationinstitute.org

• Collect • Catalog
• Move • Publish
• Replicate • Search -as-a-Service
• Share • Archive
• Analyze • Backup

computationinstitute.org

Security
Privacy
Reliability
Scalability
Control
computationinstitute.org

A great user experience

computationinstitute.org

Registry
Staging Ingest
StoreResearch Data Management-as-a-Service
Store

Globus Globus Globus Globus
Community SaaS
Transfer Storage Collaborate Catalog
Store
Analysis
Globus Integrate (Globus Nexus, Globus Connect)
Store
PaaS

Archive Mirror

computationinstitute.org

Communities using Globus

computationinstitute.org

What does it mean for us as
IT resource managers?

computationinstitute.org

installers  brokers

computationinstitute.org

developers  integrators

GSI-OpenSSH

computationinstitute.org

administrators  curators
(of the user experience)

Cloud? What cloud?
1 : 1 : 0
UX : Dev : Ops
computationinstitute.org

Other innovative science
SaaS projects

computationinstitute.org

Our vision for a 21st century
cyberinfrastructure

To provide more capability for
more people at substantially
lower cost by creatively
aggregating (“cloud”) and
federating (“grid”) resources in a
hybrid world
computationinstitute.org

Thank you to our sponsors

computationinstitute.org

Weitere ähnliche Inhalte

Ähnlich wie Hybrid Strategies for Research Data Management

This presentation is by Ian Foster, director of the Computation Institute at The University of Chicago. It was given at the Great Plains Network Annual Meeting, on May 29, 2013. For more information on Globus Online, visit globusonline.org. "What would a Dropbox for science look like?" asks Foster. "It should be trivial to collect, move, sync, share, analyze, annotate, publish, search, backup, and archive Big Data. But in reality it's often very challenging." Globus Online, a software as a service for data management, solves these problems. This slideshow explains how Globus Online does that for universities and laboratories around the world.

Research Data Management as a Service

Globus

A talk at NASA Goddard, February 27, 2013 Large and diverse data result in challenging data management problems that researchers and facilities are often ill-equipped to handle. I propose a new approach to these problems based on the outsourcing of research data management tasks to software-as-a-service providers. I argue that this approach can both achieve significant economies of scale and accelerate discovery by allowing researchers to focus on research rather than mundane information technology tasks. I present early results with the approach in the context of Globus Online

Big Process for Big Data @ NASA

Ian Foster

Science cloud foster june 2013

Kirill Osipov

Science as a Service: How On-Demand Computing can Accelerate Discovery

Ian Foster

Part 2 OCLC Strategic Presentation Bruce Crocco ACURIL 2011

Antonio Alba

Surveiller une application complexe n'est pas une tâche aisée, mais avec les bons outils, ce n'est pas si sorcier. Néanmoins, des périodes fortes telles que les opérations de type "Black Friday" (Vendredi noir) ou période de Noël peuvent pousser votre application aux limites de ce qu'elle peut supporter, ou pire, la faire crasher. Parce que le système est fortement sollicité, il génère encore davantage de logs qui peuvent également mettre à mal votre système de supervision. Dans cette session, j'aborderai les bonnes pratiques d'utilisation de la suite Elastic pour centraliser et monitorer vos logs. Je partagerai également avec vous quelques trucs et astuces pour vous aider à passer sans souci vos Vendredis noirs ! Nous verrons : * Les architectures de monitoring * Trouver la taille optimale pour l'API _bulk * Distribuer la charge * Taille des index et des shards * Optimiser les E/S disque Vous ressortirez de la session avec : des bonnes pratiques pour bâtir son système de monitoring avec la suite Elastic, le tuning avancé pour optimiser les performances d'ingestion et de recherche.

Managing your black friday logs Voxxed Luxembourg

David Pilato

Breeding 1

National Information Standards Organization (NISO)

Serverless data lake architecture

Maik Wiesmüller

How Klout is changing the landscape of social media with Hadoop and BI

Denny Lee

Deep thoughts from the real world of azure

Michele Leroux Bustamante

Metadata-powered dissemination of content

Nikos Manouselis

OpenStack: Why Is It Gaining So Much Traction?

mestery

LiquidPub: Services at Service of Science

Aliaksandr Birukou

Azure Data Explorer deep dive - review 04.2020

Riccardo Zamana

In this age of Big Data, data volumes grow exceedingly larger while the technical problems and business scenarios become more complex. Compounding these complexities, data consumers are demanding faster analysis to common business questions asked of their Big Data. This session provides concrete examples of how to address this challenge. We will highlight the use of Big Data technologies—including Hadoop and Hive —with classic BI systems such as SQL Server Analysis Services. Session takeaways: • Understand the architectural components surrounding Hadoop, Hive, Classic BI, and the Tier-1 BI ecosystem • Get strategies for addressing the technical issues when working with extremely large cubes • See how to address the technical issues when working with Big Data systems from the DBA perspective

Klout changing landscape of social media

DataWorks Summit

Webinar: Semantic web for developers

Semantic Web Company

Intro slides from AKES workshop at ISMB2016. This workshop addresses the challenges and requirements for working effectively on cloud computing and high performance computing resources, discusses the key principles that should guide responsible scientific computation and collaboration, and using hands-on sessions presents practical solutions using emergent software tools that are becoming widely adopted in the global scientific community. Specifically, we will look at using “containers” to bundle software applications and their full execution environment in a portable way. We will look at managing and sharing data across distributed resources. And finally, we will tackle how to orchestrate job execution across systems and capture metadata on the results (and the process) so that parameters and methodologies are not lost.

Clouds, Clusters, and Containers: Tools for responsible, collaborative computing

Matthew Vaughn

Sequence Services Phase 2 Webinar Series: Constellation Technology and Genestack

Pistoia Alliance

Comparing Microsoft Big Data Platform Technologies

Jen Stirrup

At the beginning of a project it is simple to promise to clients different things, but when you need to prove them you might have discover that is impossible. Living in the IoT era we need to be able to process large amounts of content per second. This is why in this session we will see how we can construct a solution around Azure that can handle very easy 1M messages per second. We will start the session with a real time demo and we will continue to describe how we can construct such a system using Azure Services in less than 8h. Each Azure component that was used for the demo will be describes in detail and will see what are the pros and cons of it.

How to manage one million messages per second using Azure, Radu Vunvulea, ITD...

Radu Vunvulea

Ähnlich wie Hybrid Strategies for Research Data Management (20)

Research Data Management as a Service

Big Process for Big Data @ NASA

Science cloud foster june 2013

Science as a Service: How On-Demand Computing can Accelerate Discovery

Part 2 OCLC Strategic Presentation Bruce Crocco ACURIL 2011

Managing your black friday logs Voxxed Luxembourg

Breeding 1

Serverless data lake architecture

How Klout is changing the landscape of social media with Hadoop and BI

Deep thoughts from the real world of azure

Metadata-powered dissemination of content

OpenStack: Why Is It Gaining So Much Traction?

LiquidPub: Services at Service of Science

Azure Data Explorer deep dive - review 04.2020

Klout changing landscape of social media

Webinar: Semantic web for developers

Clouds, Clusters, and Containers: Tools for responsible, collaborative computing

Sequence Services Phase 2 Webinar Series: Constellation Technology and Genestack

Comparing Microsoft Big Data Platform Technologies

How to manage one million messages per second using Azure, Radu Vunvulea, ITD...

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!

WSO2's API Vision: Unifying Control, Empowering Developers

WSO2

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Bhuvaneswari Subramani

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

[BuildWithAI] Introduction to Gemini.pdf

Sandro Moreira

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher

MS Copilot expands with MS Graph connectors

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

WSO2's API Vision: Unifying Control, Empowering Developers

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Why Teams call analytics are critical to your entire business

Artificial Intelligence Chap.5 : Uncertainty

Vector Search -An Introduction in Oracle Database 23ai.pptx

Understanding the FAA Part 107 License ..

Elevate Developer Efficiency & build GenAI Application with Amazon Q

Platformless Horizons for Digital Adaptability

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

[BuildWithAI] Introduction to Gemini.pdf

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

How to Troubleshoot Apps for the Modern Connected Worker

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Hybrid Strategies for Research Data Management

1. Hybrid Strategies for Research Data Management Vas Vasiliadis, Computation Institute vas@ci.uchicago.edu computationinstitute.org

2. The Computation Institute = UChicago + Argonne = Cross-disciplinary nexus = Home of the Research Cloud computationinstitute.org

3. computationinstitute.org

4. x10 in 6 years x105 in 6 years computationinstitute.org

5. 1 PB data in last experiment Accessed by 800 scientists worldwide computationinstitute.org

6. 1.2 PB of climate data Delivered to 23,000 users computationinstitute.org

7. computationinstitute.org

8. We have exceptional infrastructure for the 1% How can the 99% manage this? computationinstitute.org

9. What would a “dropbox for science” look like? computationinstitute.org

10. • Collect • Catalog • Move • Publish • Replicate • Search • Share • Archive • Analyze • Backup …among distributed research groups computationinstitute.org

11. Registry Staging Ingest Store Store Community Store Analysis Store Archive Mirror computationinstitute.org

12. Registry Staging Ingest Store Store Community Store Analysis Store Archive Mirror computationinstitute.org

13. Registry Staging Ingest Store Store Community Store Analysis Store Archive Mirror computationinstitute.org

14. • Collect • Catalog • Move • Publish • Replicate • Search -as-a-Service • Share • Archive • Analyze • Backup computationinstitute.org

15. Security Privacy Reliability Scalability Control computationinstitute.org

16. A great user experience computationinstitute.org

17. Registry Staging Ingest StoreResearch Data Management-as-a-Service Store Globus Globus Globus Globus Community SaaS Transfer Storage Collaborate Catalog Store Analysis Globus Integrate (Globus Nexus, Globus Connect) Store PaaS Archive Mirror computationinstitute.org

18. Communities using Globus computationinstitute.org

19. What does it mean for us as IT resource managers? computationinstitute.org

20. installers  brokers computationinstitute.org

21. developers  integrators GSI-OpenSSH computationinstitute.org

22. administrators  curators (of the user experience) Cloud? What cloud? 1 : 1 : 0 UX : Dev : Ops computationinstitute.org

23. computationinstitute.org

24. computationinstitute.org

25. Other innovative science SaaS projects computationinstitute.org

26. Our vision for a 21st century cyberinfrastructure To provide more capability for more people at substantially lower cost by creatively aggregating (“cloud”) and federating (“grid”) resources in a hybrid world computationinstitute.org

27. Thank you to our sponsors computationinstitute.org

Hinweis der Redaktion

Share some thoughts with youAsk you to think critically about managing research data in what is rapidly becoming a hybrid IT world
A place where researchers from multiple disciplines come together and engage in research that is fundamentally enabled by computationMore recently we’ve been talking about it as the home of the “research cloud”… I’ll describe what we mean by that throughout this talk
Example of areaswhere we have active projectsMuch of our legacy is in the physical sciencesBut increasingly we are finding ourselves working in the life sciences….
And the reason is pretty obvious…This chart and others like it are becoming a cliché in next gen sequencing and big data presentations >>>> ANIMATE…but the point I want to make is that while Moore’s law translates to roughly 10x increase in processor power>>>> ANIMATE…data volumes are growing many orders of magnitude fasterAND MEANWHILE, other resources [money, people] are staying pretty flatSo we have a looming crisis……and we hear that magic bullet of “the cloud” is going to solve itAs far as cost goes, clouds are helping …but many issues remain
Two examples to illustrate some of these issues…LIGO searches for gravitational waves to explore fundamental physics conceptsIt runs three observatories around the world and generated over a petabyte of data in their most recent experimentIt’s no just the volume of data – arguably 1PB is becoming commonplace……the real complexity is that this data has to be made available to almost a thousand researchers all over the world…it has to be actively managed for many years while experiments and analyses are run against itA very complex undertakingAnd by the way, their next experiment, Advanced LIGO, will generate a couple of orders of magnitude more data
Earth System Grid Federation provides data and tools to over 20,000 climate scientists around the worldSo what’s notable about these examples?Again, tt’s the combination of the amount of data being managed and the number of people that need access to that dataWe heard Martin Leach tell us in his keynote that the Broad Institute hit 10PB of spinning disk last year -- and that it’s not a big dealTo a select few, these numbers are routine ….And for the projects I just talked about, the IT infrastructure is in placeThey have robust production solutionsBuilt by substantial teams at great expenseSustained, multi-year effortsApplication-specific solutions, built mostlyon common/homogeneoustechnology platforms
The obligatory data deluge slide…>>>> ANIMATESo this fellow here is well prepared for the data deluge …but what about the rest of us?
The point is, the 1% of projects are in good shape>>>>ANIMATEBut what about the 99% set?There are hundreds of thousands of small and medium labs around the world that are faced with similar data management challengesThey don’t have the resources to deal with these challenges…So their research suffers …and over time they may become irrelevantSo at the CI we asked ourselves questions about how we can help avert this crisisAnd one question that sums up our thinking is…
Many in this room are probably users of Dropbox or similar services for keeping their files synced across multiple machines>>>> ASK FOR SHOW OF HANDS …confirm majorityWell, the scientific research equivalent is a little different…
We figured it needs to allow researchers to do many or all of these things with their data ……and not just with the 2GB of PowerPoint decks or the 100GB of family photos and videos…but the petabytes and exabytes of data that will soon be the norm for many>>> ANIMATEAgain, it’s the large distributed group of collaborating researchers that’s key here
So how would such a drop box for science be used? Let’s look at a very typical scientific data work flow . . .Data is generated by some instrument (an NGS core in China, or a large telescope in Chile)Since these instruments are in high demand, users have to get their data off the instrument to make way for the next user……so the data is typically moved from a staging area to some type of ingest storeThis is usually pretty raw data …so some of it may need be run through one or more analysis pipelinesAt this point we’ve not only distributed the data, we’ve also multiplied it in sizeThen we may need to maybe do some post-processing and apply some metadata……before publishing it in a Community Store where other collaborators can access it securelyPerhaps also place a subset of the data in a national Registry for public accessAnd we’d also like to keep Mirrors of the data for performance and various other reasonsAnd over time we will end up moving data to an Archive, perhaps a hierarchical storage systemIn practice the various stores are probably owned and managed by different organizations:>>>>>ANIMATE …Ingest is on my campus at University of Chicago>>>>>ANIMATE…Analysis may be on a public cloud provider because I can’t get enough cycles on demand on campus>>>>>ANIMATE …The Registry is in some vault in Virginia>>>>>ANIMATE…The Community Store is on a private cloud on one of the national labsAnd so on… we have to deal with a hybrid storage world
Beyond the hybrid storage environments, we also have to deal with moving the data reliably -- something that sounds pretty mundane…and it is mundane when you’re moving 50 pictures of Fluffy to Picassa…but it’s a little more challenging when you’re moving a petabyte to half a dozen locations around the worldYou end up having to become familiar with many tools and techniques>>>ANIMATE …some systems will force you use arcane commands like SCP that require extensive configuration and tuning – and yet still deliver only modest performance and reliability>>>ANIMATE…in other cases you’ll find that a hard drive and a FedEx account are the way to go>>>ANIMATE…or some custom portal with a convoluted workflowSo we have to deal with a hybrid (and generally poor) user experience
And if that wasn’t enough, each of these systems is going to bein a different security domain>>>>ANIMATE….and you’ll have to deal with multiple identities and security protocols to get the job doneSo we have to deal with a hybrid security worldRealization: building a solution is really only feasible for very few among us -- certainly not for the typical research labSo we looked at what’s worked in a number of business application areas like CRM and ERP and decided that…
…for small research groups, the only feasible way to provide all of these capabilities is…>>>> ANIMATE…Using a software-as-a-service approachAnd what’s interesting is that much of this also applies to larger groups who are starting to question the level of investment they are making in building their ownIt’s similar to the debate that many large companies have had about using SaaS vs. in-house software…and we’ve seen that pendulum swing strongly in favor of SaaS
And when we spoke with IT folks at various research communities they insisted that some things were not up for negotiation
We can deal with that complexity technically but the key is to deliver a great user experienceWe’re trying to serve the needs of the vast majority of researchers who cannot hope to…navigate Amazon’s API…or figure out how to configure an Isilonstorage node for their internal cloud
So a couple of years ago we started building such a solution…Transfer: move big data reliably >4,000 users in just over a year, approaching the 4PB mark …Storage: enabling any number of object stores to be used in a consistent manner to replicate, version, and share dataCollaborate: allow the group to manage their work flows and publish data for internal and external consumptionCatalog: make metadata part and parcel of the data, not an afterthoughtIntegrate: enable groups to access the various services programmaticallyNexus: provide a federated identity infrastructure which allows users to access the services with their existing accounts at their primary institution…+ a group management service that serves as the basis for sharing of data across all other Globus servicesIn developing this we started with the User experience…service + multiple Uis for different types of users…a very, small, no-maintenance footprint on the endpoints -- a drag and drop or single command packaged installation that makes the resource part of the Globus service
So SaaS is one strategy for dealing with the hybrid world coming our way…but we also need strategies for dealing with our organizationFor many years we built up a fairly traditional software development organization: lots of devs, some QA, some opsWe realized that we would need change our view of what the organization should look like
The first shift we are experiencing is from being installers to capability brokersWe are less concerned with building a data center or installing and configuring softwareThere is absolutely still a role for that but there a few that have the skills and experience…so we take advantage of that experience and focus instead of selecting various components and spend our time making them easy to use-- again it’s focusing on the user experienceAn example of this is the Globus Storage serviceWe are working with multiple providers>>> talk to UC IT Services deployment et alCloud storage providers will keep driving the unit cost of storage downWe believe the value lies in making trivial to use that storage in the normal course of their workOther components for Globus Collaborate: Drupal, JIRA, ConfluenceAnd we eat our own dog food …Zendeskfor support…Using Globus Integrate and Globus Nexus…from the user’s perspective they only have a single account on Globus and can access external services like Zendesk to track their support tickets, post to forums, etc.
We’re also moving from being developers to playing more of an integrator roleAgain, there are lots of smart people out there that have figured out the hard bits, for example in identity management and securityWe’ve taken that knowledge and packaged it in such a way that shields the user from all of this complexity…they just need to remember their single username/password or campus login or Google account or whatever>>>> TALK TO FEDERATED IDENTITY
If you truly want to focus on the user experience then you need to build the as suchWe’ve shifted the make up of our team fromdev-heavy to more balanced with respect to UX…and quite a shift away from traditional ops (the devs run their own stuff using simple software like Chef)

Hybrid Strategies for Research Data Management

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Hybrid Strategies for Research Data Management

Ähnlich wie Hybrid Strategies for Research Data Management (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hybrid Strategies for Research Data Management

Hinweis der Redaktion