The 7 Things I Know About Cyber Security After 25 Years | April 2024
Big Data and Computer Science Education
1. Big Data Meets Computer
Science
Jim Hendler
Tetherless World Professor of Computer, Web and
Data Sciences
Director, Rensselaer Institute for Data Exploration and
Applications
@jahendler
3. The Rensselaer IDEA 3
… Across Applications (corresponding to Challenges Identified in the
Rensselaer Plan 2024)
Healthcare
Analytics
Business
Systems
Built and Natural
Environments
Virtual and
Augmented Reality
Cyber-
Resiliency
Policy, Ethics and
Open Government
Materials
Informatics
Data-driven
Physical/Life
Sciences
4. The Rensselaer IDEA 4
Developing a Comprehensive “Data Science” Research Agenda
P. Fox and J. Hendler, The Science of Data Science, Big Data, 2(2), in press
5. The Rensselaer IDEA
Graduate Projects in IDEA
• IDEA and CCI (HPC): technologies to enable
Rensselaer researchers to work with data at larger
scales and in new ways
• Population-scale cognitive computing models for
“human intensive” agent-based simulations
• IDEA and EMPAC (Performing arts center): provide
next generation data exploration tools
• Multi-person data visualization tools for big-data
applications
• IDEA and Watson: New direction in Cognitive
Computation
• How do we go from Question/Answering to Open Web
Data exploration?
• IDEA and CBIS (Ctr for Biotechnology &
Interdisciplinary Studies): Data-driven Informatics
• Can we couple semantics and big data to find new medical
uses for already approved drugs?
6. The Rensselaer IDEA
External Projects and partnerships
Emergency Room Care
Language and Agents
Largescale Healthcare Analytics
In Discussion Jumpstart (Proposal underway)
Built and Natural Biome data-driven
science and engineering
Cognitive Computing Collaborative
Research Initiative
7. Campus Data
Infrastructure
Metadata
• Title
• Author
• Author Email
• Licence
• Subject
• Keyword
• Data Type
Dataset
CDF
RPI Object Deposit RPI Research Network
RPI-ID Request RPI-ID Request
Share
Knowledge
Join
Network
Allocate a universal accessible RPI-ID
Register Metadata
Upload Any Data
RPI Research Object
Registration and Deposit
RPI Research Collaboration
and Community Network
8. Requires going Beyond
the Database
Discovery
Integrate
Visualize
Explain
Thinking outside the Database box
Strata talk, 2013 - https://www.youtube.com/watch?v=Cob5oltMGMc
9. At new scales (and in
new ways)
Fox and Hendler, Changing the Equation on Scientific Visualization,
Science, 2/11 - http://www.sciencemag.org/content/331/6018/705.short)
10. A Whole New World
• But what about undergraduate
education
– where do we train the students who can
take on projects needing
• statistics and analytics
• informatics
• data science challenges
• machine learning
• unstructured data
• cognitive computation
• …
11. Computer Science
Education?
• Programming is a necessary skill
– not sufficient
• and we mostly teach it wrong…
– (For my heresies about teaching programming, see
“Let’s Help Computer Science Students Crack the
Code, 3/13 http://chronicle.com/article/Lets-Help-Computer-Science/137649/ )
• The computing environment of today is nothing like
the computing environment of the 70s,
– but the curriculum hasn’t changed much since I was in
school – but the fundamentals are NOT all the same
– data-oriented computations involve graphs, memory
intensive algorithms, machine learning, …
12. Deploying these ideas at
RPI
• Innovation in the interdisciplinary Information
Technology Program
– Renamed Information Technology and Web
Science, 2011
• for more on Web Science, see
– Berners-Lee et al., Creating a Science of the World Wide Web,
Science, 2006,
https://www.sciencemag.org/content/313/5788/769.summary;
– Hendler et. al, Web Science: An interdisciplinary Approach to
Understanding the Web, CACM, 7/2008,
http://cacm.acm.org/magazines/2008/7/5366-web-science/fulltext
13. IT and Web Science
• First IT academic program in U.S.
• First web science degree program in
U.S.; First undergraduate web science
degree anywhere
• BS in ITWS (20 concentrations) and MS
in IT (10 concentrations)
• PhD in Multi-Disciplinary Sciences
• http://itws.rpi.edu
– I was Director 2008-2012
– Now directed by Peter Fox (whose slides I stole
for this section)
14.
Technical Track Courses
Concentrations
Computer Engineering
Track
1) ECSE-2610 Computer Components and Operations
2) ENGR-2350 Embedded Control
3) ECSE-2660 Computer Architecture, Networking and
Operating Systems
Civil Engineering
Computer Hardware
Computer Networking (hardware focus)
Mechanical/Aeronautical Eng.
Computer Science Track 1) CSCI-2200 Foundations of Computer Science
2) CSCI-2300 Introduction to Algorithms
3) CSCI-2500 Computer Organization
Cognitive Science
Computer Networking (software focus)
Information Security
Machine and Computational Learning
Information Systems Track 1) CSCI-2200 Foundation of Computer Science
2) CSCI-2500 Computer Organization
3) Four credits from the following:
• CSCI-2220 Programming in Java (2 credits)
• CSCI-2961 Program in Python (2 credits)
• CSCI-2300 Introduction to Algorithms (4 credits)
• ITWS-49XX Web Systems Development II (4 credits)
Arts
Communication
Economics
Entrepreneurship
Finance
Management Information
Systems
Medicine
Pre-law
Psychology
STS
Web Science Track 1) CSCI-2200 Foundations of Computer Science
2) CSCI-2500 Computer Organization
3) One of the following:
• CSCI-49XX Web Systems Development II
• Web/Data Course approved by ITWS Curriculum
Committee
Data Science
Science Informatics
Web Technologies
15. CHANGES TO THE MASTER’S IN
INFORMATION TECHNOLOGY
PROGRAM
• In Spring 2013 the MS in IT core curriculum was revised
to include Data Analytics.
• Networking core classes were replaced with Data
Analytics core classes: Data Science, Database Mining,
X-informatics, and Data Analytics (a new class offered in
Spring 2014).
• The MS in IT program also added two new
concentrations: Data Science and Analytics and
Information Dominance.
• The Information Dominance concentration was
developed for a new Navy program that will be educating
a select group of 5-10 naval officers a year with the skills
needed for military cyberspace operations. Two officers
started in Fall 2013 and three began in Spring 2014.
16. IT Core Area Course Number Course Title
Term(s)
Offered
Database Systems CSCI-4380 Database Systems Fall/Spring
Data Analytics ITWS-6350 Data Science Fall
Software Design and
Engineering
CSCI-4440 Software Design and Documentation Fall
ITWS-6400 X-Informatics Spring
Management of
Technology*
ITWS-6300
Business Issues for Engineers and Scientists
(Professional Track Only)
Fall/Spring
Human Computer
Interaction
COMM-6420 Foundations of HCI Usability Fall
COMM-696X Human Media Interaction Spring
MS in IT Required Core Courses
* For the research track, replace ITWS-6300 Business Issues for Engineers and Scientists with one of the two semester courses ITWS-
6980 Master’s Project or ITWS-6990 Master’s Thesis.
Advanced Core options for students who have previously completed a Core Course
IT Core Area Course Number Course Title
Term(s)
Offered
Database Systems
CSCI-6390 Database Mining Fall
ITWS-6350 Data Science Fall
ITWS-696X Semantic E-Science Fall
Data Analytics
CSCI-6390 Database Mining Fall
ITWS-6400 X-Informatics Spring
ITWX-696X Data Analytics Spring
Software Design
CSCI-6500 Distributed Computing Over the Internet Fall
ECSE-6780 Software Engineering II Fall
ITWS-696X Semantic E-Science Fall
Management of
Technology
MGMT-6080 Networks, Innovation and Value Creation Fall
MGMT-6140 Information Systems for Management Spring
Human Computer
Interaction
COMM-6620 Information Architecture Spring
COMM-6770 User-Centered Design Fall
COMM-696X Interactive Media Design Summer
17. Concentration Course Number Course Name Term(s)
Offered
Data
Science and
Analytics
Data and Information analytics extends analysis (descriptive and
predictive models to obtain knowledge from data) by using
insight from analyses to recommend action or to guide and
communicate decision-making. Thus, analytics is not so much
concerned with individual analyses or analysis steps, but with an
entire methodology. Key topics include: advanced statistical
computing theory, multivariate analysis, and application of
computer science courses such as data mining and machine
learning and change detection by uncovering unexpected
patterns in data.
Select two or three of the following courses:
ITWS-6350 Data Science Fall
ITWS-6400 X-Informatics Spring
ITWS-696X Data Analytics Spring
ITWS-696X Semantic E-Science Fall
ITWX-696X
Advanced Semantic
Technologies*
Spring
If only two of the above were chosen, select one more of
the following courses:
COMM-6620 Information Architecture Spring
CSCI-4020 Computer Algorithms Spring
CSCI-4150 Introduction to AI Fall
CSCI-6390 Database Mining Fall
CSCI-4220 or CSCI-
6220
Network Programming
or Parallel Algorithm
Design
Spring
ISYE-4220
Optimization Algorithms
and Applications
Fall
ISYE-6180
Knowledge Discovery
with Data Mining
Spring
MGMT-696X
Technology Foundations
for Business Analytics
Fall
MGMT-696X
Predictive Analytics
Using Social Media
Spring
Concentration Course Number Course Name Term(s)
Offered
Information
Dominance
The Information Dominance concentration prepares students for
careers designing, building, and managing secure information
systems and networks. The concentration includes advanced
study in encryption and network security, formal models and
policies for access control in databases and application systems,
secure coding techniques, and other related information
assurance topics. The combination of coursework provides
comprehensive coverage of issues and solutions for utilizing
high assurance systems for tactical decision-making. It
prepares students for careers ranging from secure information
systems analyst, to information security engineer, to field
information manager and chief information officer. It is also
appropriate for all IT professionals who want to enhance their
knowledge of how to use pervasive information in situational
awareness, operations scenarios, and decision-making.
Select two or three of the following courses:
ISYE-6180
Knowledge Discovery with Data
Mining
Spring
CSCI-6960
Cryptography and Network
Security I
Fall
ITWS-4370 Information System Security Spring
CSCI-4650 Networking Laboratory I
Fall/Spri
ng
MGMT-7760 Risk Management Fall
ISYE-4310
Ethics of Modeling for Industrial
Systems Engineering
Fall
If only two of the above were chosen, select one more of the
following courses:
CSCI-6390 Database Mining Fall
CSCI-6968
Cryptography and Network
Security II
Spring
CSCI-4660 Networking Laboratory II
Fall/Spri
ng
ECSE-6860
Evaluation Methods for Decision
Making
Fall
ISYE-6500
Information and Decision
Technologies for Industrial and
Service Systems
Fall/Spri
ng
CSCI-496X
Computational Analysis of
Social Processes
Fall
Two New MS in IT Concentrations
18. Also at RPI
• Data Science Research Center and Data Science
Education Center (dsrc.rpi.edu, 2009)
• http://www.rpi.edu/about/inside/issue/v4n17/datacente
r.html
– Over 45: research faculty, post-docs, grad students, staff,
undergraduates…
• Data is one of the Rensselaer Plan’s five thrusts
• Other key faculty
– Fran Berman (Center for Digital Society and RDA)
– Bulent Yener (DSRC Director)
– Peter Fox(ITWS Director)
19. More RPI Curriculua
• Environmental Science with Geoinformatics
concentration
• Bio, geo, chem, astro, materials - informatics
• GIS for Science
• Visualization (new summer program)
• Multi-disciplinary science program - PhD in
Data and Web Science
• DATUM: Data in Undergraduate Math! (Bennett)
• Missing – intermediate statistics
• Graphs – significant potential here – must teach!
20. 5-6 years in…
• Science and interdisciplinary from the start!
– Not a question of: do we train scientists to be
technical/data people, or do we train technical
people to learn the science
– It’s a skill/ course level approach that is needed
• We teach methodology and principles over
technology
• Data science must be a skill, and natural like
using instruments, writing/using codes
• Team/ collaboration aspects are key
• Foundations and theory must be taught
– for data, as well as programming