“The World’s most valuable resource is no longer oil, but data” The Economist
Program Overview & Objectives
Big Data has become a catchphrase to describe data so large that it is not amenable to processing or analysis using traditional database and software techniques; such Big Data is noted for its volume, varieties of data types, and rapid accumulation. The Big Data Professional Diploma is a multidisciplinary program intended for professionals in diverse fields such as finance, retail, science, engineering, or manufacturing, who need to analyze Big Data.
The professional diploma would:
- Study how to explore and manage large datasets being generated and used in the modern world.
- Introduce practical techniques used in exploratory data analysis and mining including data preparation, visualization, statistics for understanding data, and grouping and prediction techniques.
- Present approaches used to store, retrieve, and manage data in the real world including traditional database systems, query languages, and data integrity and quality.
- Study ways on how to use big data analytic to reach data-driven decision making.
- Discuss case studies that will examine issues in data capture, organization, storage, retrieval, visualization, and analysis in diverse settings such as, drug research, census data, social networking, finance, urban planning, energy, mobility, manufacturing, urban crime, security and management projects.
Modules are designed to use the most practical approach to deliver the material, using the most effective learning techniques. Participants can expect the following during the course of the program:
- Lecture modules delivered by subject matter experts.
- Use of cutting edge tools on various topics covered in the program.
- Access to research repository on the subject.
- Case studies discussion.
- Class room activities.
- Project work.
Program Modules & Timings
- 4 modules - 03 days each
- Duration: September - December, 2017
Exact dates will be determined at a later stage upon participants feedback.
Module 1 (26 - 28 September, 2017) - Introduction to data science and its applications
What is data science, what does a data scientist do and where do we use data science?
This module starts with an introduction to different types of data and specific tasks required when dealing with big data. We will present available technology and data science tools for non-software developers and software developers, we will discuss the main responsibilities of a data scientist and will review the main application domains related to data science.
- Structured and unstructured data, static and streaming data, data acquisition, storage and management, data mining, web scraping, data cleaning.
- Brief review of algorithms, programming, distributed computing, graph analysis and modeling, statistical analysis, natural language processing, machine learning, deep learning, artificial intelligence, cloud tools, APIs.
- Interpreting data, ethics, presenting and communicating results.
- Data science tools: R, Python, SAS, SPSS, Stata, Matlab, SQL, Hadoop, MapReduce, Cloudera, Hive, Pig, Spark, HTML, Java, C/C++, XML, Ruby, Perl Stata, Julia, OpenRefine, DataCleaner, Data Mining, RapidMiner, Scala, Excel, Tableau, BiGML, Plotly, Palladio, Tensorflow, etc.
- Application domains: applications of data science to other disciplines, like business (business intelligence - BI), criminal justice, health care, industry, Internet of Things (IoT IIoT), politics, etc.
Module 2 (24 - 26 October, 2017) Programing and visualizations in Data Science
What are some specific algorithms used in big data and what visualization techniques and tools help us present our results in the most efficient way?
This module starts with an overview of different visualization principles, then an introduction to algorithmic thinking and programming, followed by visualizations and specific tools that help a data scientist tell the story behind the data.
- Discussion of different ways to visualize data.
- Algorithms and coding, data mining, dimensionality and reduction, streaming algorithms, parallel computing, classification and clustering, machine learning, artificial intelligence.
- Data visual analysis and visualization tools: ggplot in R, Tableau, Qlikview Silk, GIS, JMP, SAS.
Module 3 (28 - 30 November, 2017) Statistics and Predictive Analytics
What are the main statistical approaches and tools in examining big data?
This module focuses on computational and statistical methods used in analyzing big data, presenting specific software for programmers and non-programmers.
- Fundamental statistics, (descriptive and inferential statistics, statistical tests, regression, statistical modeling and fitting), Bayesian thinking, pattern recognition and machine learning, Markov chains, time series, neural networks, classification and clustering.
- Statistical tools: R, Python, Matlab, Excel, Tensorow, Theano/Pylearn2, SAS, SPSS, JMP, BigML, etc.
Module 4 (19 - 21 December, 2017) Data-driven decision making
What are some specific real-life applications of big data in management science?
This module presents everyday problems from areas like management, marketing, business or industry. Participants will learn modeling techniques and will work both independently and in teams on various projects.
- Linear programming and optimization, network models, project scheduling, transportation and assignment problems, simulation, Monte Carlo simulations, decision analysis, performance management, time series and forecasting, Markov processes.
- Software: R, LINDO/LINGO, Matlab, Mathematica, Maple, OptaPlanner, Xpress MP, CPLEX, Gurobi, Tora.
Subject Matter Experts:
Dr. Mihail Barbosu
Dr. Mihail Barbosu completed his Ph.D. in France at Paris 6 University and Paris Observatory. He is Professor in the School of Mathematical Sciences and Director of the Data and Predictive Analytics Center at RIT. Previously he was Head of the School of Mathematical Sciences at RIT and Chair of the Department of Mathematics at State University of New York at Brockport.
Dr. Barbosu’s experience includes Mathematical Modeling, Data and Predictive Analytics, Academic Management, Dynamical Systems and Space Dynamics.
Dr. Hans-Peter Bischof
Dr. Hans-Peter Bischof received his Ph.D. in Computer Science from the University of Osnabrück, Germany. He is Professor and Chair of the Computer Science Master Program at RIT and member of RIT’s Center for Computational Relativity and Gravitation. His expertise are in Visualization of Scientific Data and Distributed Systems and High Performance Computing.
Dr. Ernest Fokoué
Dr. Ernest Fokoue earned his Ph.D. in Statistics from University of Glasgow, United Kingdom. He is an Associate Professor in the School of Mathematical Sciences at RIT and prior to joining RIT he was a faculty member in the Mathematics Department at Kettering University in Flint, Michigan.
Dr. Fokoue has an extensive experience in Statistical Machine Learning and Data Science, with a strong leaning towards Bayesian Statistical Paradigm and the Regularization Framework of Learning.