The goal of this talk is to highlight open source opportunities for students especially through an opportunity to earn $5000 through Google Summer of Code program. I will discuss some of the tips on how to engage with open source communities, the befits for contributing. I will provide motivating examples on how students can gain significant experience in contributing challenging distributed systems problems while impacting scientific research. I will specifically focus with a concrete example of Apache Airavata software suite for Web-based science gateways. I will list some example GSoC topics of interest and provide some recipes for success in getting accepted and navigating through success.
On National Teacher Day, meet the 2024-25 Kenan Fellows
Learning Open Source through GSOC
1. Science Gateways, Open Source
& Google Summer of Code
Suresh Marru
Apache Software Foundation
Indiana University
2. Acknowledgements
Apache Software Foundation (ASF)
Extreme Science and Engineering
Discovery Environments (XSEDE)
Science Gateways Group, Pervasive
Technology Institute, Indiana
University (SGG)
3. Credits to ….
Science Gateways Group @ IU
Marlon Pierce: Group Lead
Amila Jayasekara
Chathuri Wimalasena
Heshan Suriyaachchi
Jun Wang
Lahiru Gunathilake
Raminder Singh
Saminda Wijeratne
Suresh Marru
Viknes Balasubramanee
Yu (Marie) Ma
4. What will you hear today?
Science Gateways
Web 2.0, Social Networking, Grid & Cloud
Computing, BigData, everything-as-a-service -
- churned into real-world scientific research.
Open Source
Hack into Open Source projects – a good way to
cherish doing what you like as opposite to
what you have to.
Google Summer of Code
Reward yourself with $5000 while making a case
for Future Employments & Graduate School
Admissions
Apache Airavata
5. Outline
What are Science Gateways?
Getting your way in Open Source
Apache Software Foundation
Google Summer of Code
Interested? Next Steps……
7. What is Google Summer of Code?
Google Summer of Code is a program designed to
encourage college student participation in
open source software development.
8. Key Goals of GSOC
• Inspire young developers to begin participating in
open source development
• Provide students in computer science and related fields
the opportunity to do work related to their academic
pursuits during the summer
• Give students more exposure to real-world software
development scenarios (e.g. distributed development,
software licensing questions, mailing list etiquette, etc.)
• Get more open source code created and released for
the benefit of all
• Help open source projects identify and bring in new
developers and committers
11. GSoC in numbers: Students
Number of
students max’ed
and stabilized
around 1200.
This is not
expected to grow
in near future,
understandable,
still thank you
Google!!
12. GSoC Win-Win Perspective
• Project Perspective:
o Paid software developer for the summer.
o Attracting a new member into the project
community.
• Student Perspective
o Opportunity to gain (open source) software
development experience.
o Good payment for rewarding work.
o Ability to network and become known within a
structured, distributed setting.
13. What to look for in a project?
Can you engage with project (not just
the mentor)?. Can they guide you with
tutorials and hand hold early on?
For instance, will you get to experience
“Apache Way”?
Is the project welcoming and
appreciative?
Is there a mileage for your extra effort
with long term commitments?
15. Core Contributions beyond GSOC
Milinda realized he could execute his
GSOC project, but had great thoughts on
how we can fundamentally improve
Airavata Architecture to make it easy for
future extensions.
Developer community agreed to the new
Architecture.
Simple
Easy extendibility.
Airavata has adopted his proposed new
architecture
16. Enhanced Airavata Architecture
Global InHandlers
Job Execution Context
Provider Logic
Provider specific InHandlers
Application specific In Handlers
Application specific OutHandlers
Global OutHandlers Provider specific OutHandlers
17. Pick what motivates you
Harness your skills and interests
If possible pick a project relevant and “required”
by aligning with your’ academic curriculum
As a final year (research) project
As a Masters-level research project
Create an interesting and challenging research
problem
Sense of satisfaction and achievements
Research publications
Presentations at ApacheCon and similar conferences
Committership
18. What does a good mentor look for?
Free & Paid Contributions – the reality
Long term participant in the project (not a
software developer for ~3 months)
Accomplish meaningful research-oriented
goals either within the project or cross-
cutting projects.
Teach open source/community
participation to the next generation
workforce
19. What will you hear today?
Science Gateways
Web 2.0, Social Networking, Grid & Cloud
Computing, BigData, everything-as-a-service -
- churned into real-world scientific research.
Apache Airavata
20. What Is Cyberinfrastructure?
“Cyberinfrastructure consists of computing systems,
data storage systems, advanced instruments and
data repositories, visualization environments, and
people, all linked together by software and high
performance networks to improve research
productivity and enable breakthroughs not otherwise
possible.”
–Craig Stewart, Indiana University
26. XSEDE Vision
The eXtreme Science and
Engineering Discovery
Environment (XSEDE):
enhances the productivity of scientists
and engineers by providing them with
new and innovative capabilities
and thus
facilitates scientific discovery while
enabling transformational
science/engineering and innovative
educational programs
29. What will you hear today?
Open Source
Hack into Open Source projects – a good way to
cherish doing what you like as opposite to
what you have to.
Apache Airavata
30. The Apache Software Foundation
Apache software powers Governance and Staffing
65% of web sites worldwide Board of Directors
Project Management
501(c)3 non-profit
Committees
foundation ASF Members
Reasons for creating ASF Committers
Create legal entity Contributors
Protect contributors from Funding
liability
All-volunteer
Protect Apache assets staffing/development
Membership: individual resources
Apache Incubator Donations
Corporate investment
31. Apache Way:
Beyond Open Source, Open Community
Transparency
Decision-making and actions are observable
Events of interest are published and recorded
Transparency invites collaboration
Meritocratic Governance
Influence on decisions is based on merit
Merit is earned in public
Community based governance
Community
Common interest, Community interest, Common
experience
“Community before code”
Collaboration
Systems supporting communication and coordination:
repositories, trackers, forums, build tools
You can reuse what you can see and influence
More eyeballs means better quality
32. Apache Organization
• Apache is a meritocratic organization
– Merit does not expire. You earn your keep and your credentials
• Start out as Contributor
– Patches, mailing list comments, testing, documentation, etc.
– No commit access
• Move onto Committer
– Commit access, evolve the code
• PMC Members
– Have binding VOTEs on releases/personnel
• Officer (VP, Project)
– PMC Chair
• ASF Member
– Have binding VOTE in the state of the foundation
– Elect Board of Directors
• Director
– Oversight of projects, foundation activities
33. Our experience with Apache ..
Give up control and get back contributions.
Being in apache by itself doesn’t guarantee sustainability but open
doors for sustainability.
Google Summer of code has bought in students, increased
documentation, identified confined projects.
Do not have to worry about getting sued by Oracle for using Java
API’s. Standing behind a shield of expert lawyers.
Companies make in-kind contributions, some have concrete plans,
some or just evangelizing. Both are good.
Todays, Cyberinfrastructure eco-system is not in a funding
situation to work on parallel independent implementation.
Shared implementation is hard to achieve, but well thought
architectures can achieve it.
Also encourage multiple implementations and let the communities
sort out. The winner sustains. Example: Apache Axis2, Apache
CXF
34. Apache Contributions Aren’t Just
Software
• Apache committers and PMC members
aren’t just code writers.
• Successful communities also include
– Important users
– Project evangelists
– Content providers: documentation, tutorials
– Testers, requirements providers, and
constructive complainers
• Using Jira and mailing lists
– Anything else that needs doing.
39. Key Airavata Features
Graphical user interface to construct, execute, control,
manage and reuse scientific workflows.
Desktop tools and browser-based web interface
components to manage applications, workflows and
generated data.
Sophisticated server-side tools to register, schedule and
manage scientific applications on high performance
computational resources.
Ability to Interface and interoperate with various external
(third party) data, workflow and provenance
management tools.
40. A Classic Scientific Workflow
Workflows are composite applications built out of
independent parts.
Parts are executables wrapped as network accessible services
The classic example is that codes A, B, and C need to
be executed in a specific sequence.
A, B, C: parallel codes compiled and executable on a cluster,
supercomputer, etc. by schedulers.
A, B, and C do not need to be co-located
A, B, and C may be sequential or parallel
A, B and C may have date or control dependencies
Data may need to be staged in and out
Some variations on ABC:
Conditional execution branches
Dynamic execution resource binding
Iterations (Do-while, For-Each) over all or parts of the sequence
Triggers, events, data streams
41. Challenges in Scientific Workflows
Accommodating wide range of
execution patterns
Iterations: for-each, do-while, dot and
Cartesian products
Interactivity, adaptivity, non-determinism
Accommodating error and
uncertainties
42. NextGen Workflow Systems:
Need for Interactivity Across Layers
Scientific workflow systems and compiled
workflow languages have focused on
modeling, scheduling, data movement,
dynamic service creation and monitoring of
workflows.
Building on these foundations Airavata
extends to a interactive and flexible workflow
systems.
Airavata Workflow Features include:
interactive ways of interfering and steering the
workflow execution
interpreted workflow execution model
high level instruction set
flexibility to execute individual workflow activity and
wait for further analysis.
43. Interactivity Contd.
Derivations during workflow Execution
that does not affect the structure of the
workflow
dynamic change workflow inputs, workflow rerun.
interpreted workflow execution model.
dynamic change in point of execution, workflow
smart rerun.
Fault handling and exception models.
Derivation that change the workflow
DAG during runtime
Reconfiguration of activity..
dynamic addition of activities to the workflow.
Dynamic remove or replace of activity to the
workflow
44. Interactivity
Mathematical uncertainty:
PDE’s from domain problems do not have analytical solution and thereby look
at numerical methods to find solutions
These solvers may not converge depending on method, PDE system, initial
conditions and expected output tolerances
statistical techniques lead to nondeterministic results.
closer observation at computational output ensure acceptability of results.
Domain uncertainty:
Scenarios of running against range of parameter values in an attempt to find
the most appropriate input set.
Initial execution providing estimate of the accuracy of the inputs and
facilitating further refinement.
Outputs are diverse and nondeterministic
Resource uncertainty:
Failures in distributed systems are norm than an exception
transient failures can be retried if computation is side-effect free/Idempotent.
persistent failures require migration
Real-time Model refinement
Real-time event processing systems not having data available prior to
initialization of model.
models evolve over time and can take advantage of more and more events
as they become available
45. Illustrating Interactivity
Asynchronous Applica on
refinements Steering
Orchestra on level Interac ons Job Level Interac ons
Parametric Provenance Workflow Job launch, Checkpoint/
Sweeps Steering gliding Restart
Model
Mathema cal Domain Resource
Refinement
Uncertain es
46. Apache Airavata in Action
Domain Description
Astronomy Image processing pipeline for One Degree
Imager instrument on XSEDE
Astrophysics Supporting workflow of Dark Energy Survey
simulations working group on XSEDE
Bioinformatics Supported workflow executions on Amazon
EC2 for BioVLAB project
Biophysics Manage large scale data analysis of analytical
ultracentrifugation experiments on XSEDE and
campus resources
Computational Manage workflows to support computational
Chemistry chemistry parameter studies for
ParamChem.org on XSEDE
Nuclear Physics Workflows for nuclear structure calculations
using Leadership Class Configuration
Interaction (LCCI) computations on DOE
resources
47. What will you hear today?
Google Summer of Code
Reward yourself with $5000 while making a case
for Future Employments & Graduate School
Admissions
Apache Airavata
48. How to crack GSoC?
1 2 3 4
• Engage Early
• Familiarize Projects
• Propose Ideas
•Win, Code, Earn…
Cherish !!!
Apache Airavata
49. Be Part of the project
Community
• Play with different popular open source software ..
• Experiment with the emerging technologies …
• Learn & Engage with a multidisciplinary community..
51. GSoC Win-Win Perspective
• Project Perspective:
o Paid software developer for the summer.
o Attracting a new member into the project
community.
• Student Perspective
o Opportunity to gain (open source) software
development experience.
o Good payment for rewarding work.
o Ability to network and become known within a
structured, distributed setting.
52. What to look for in a project?
Engage with project (not just the
mentor). Can they guide you with
tutorials and hand hold early on?
For instance, will you get to experience
“Apache Way”?
Is the project welcoming and
appreciative?
Is there a mileage for your extra effort
with long term commitments?
53. Pick what motivates you
Harness your skills and interests
If possible pick a project relevant and “required”
by aligning with your’ academic curriculum
As a final year (research) project
As a Masters-level research project
Create an interesting and challenging research
problem
Sense of satisfaction and achievements
Research publications
Presentations at ApacheCon and similar conferences
Committership
54. What does a good mentor look for?
Free & Paid Contributions – the reality
Long term participant in the project (not a
software developer for ~3 months)
Accomplish meaningful research-oriented
goals either within the project or cross-
cutting projects.
Teach open source/community
participation to the next generation
workforce
55. Join the mailing list
Google Group - sgw-gsoc-discuss:
https://groups.google.com/d/forum/sgw-gsoc-
discuss
Need more info – smarru@apache.org
Apache Airavata
Hinweis der Redaktion
Providing capabilities and services beyond flops We provide the integrated environment allowing for the coherent use of the various resources and services supported by NSF.
Most popular these days is CIPRES- Phylogeny (Mark Miller)