The mass amount of data being collected by industries, retailers, and organizations requires knowledgeable professionals who can manage, process, and analyze this information to identify and understand trends and to make meaningful business decisions.
Big data is noted for its volume, varieties of data types, and rapid accumulation. Big data has become a catchphrase to describe data collections that are so large they are not amenable to processing or analysis using traditional database and software techniques. The advanced certificate in big data analytics is a multidisciplinary program intended for professionals with BS degrees in computing or other diverse fields – such as finance, retail, science, engineering, or manufacturing – where knowledge in data analysis is in demand.
The advanced certificate is also meant for students who would like a formal qualification in this area. The program allows professionals with a bachelor's degree to enhance their career opportunities and professional knowledge with targeted graduate course work in a focused area without making a commitment to an MS program.
The goal of the program is to develop expertise in managing and analyzing big data. The curriculum consists of two required courses and two elective courses selected by the student in topic areas related to big data.
Big data analytics, advanced certificate, typical course sequence
Introduction to Big Data
This course provides a broad introduction to the exploration and management of large datasets being generated and used in the modern world. First, practical techniques used in exploratory data analysis and mining are introduced; topics include data preparation, visualization, statistics for understanding data, and grouping and prediction techniques. Second, approaches used to store, retrieve, and manage data in the real world are presented; topics include traditional database systems, query languages, and data integrity and quality. Case studies will examine issues in data capture, organization, storage, retrieval, visualization, and analysis in diverse settings such as urban crime, drug research, census data, social networking, and space exploration. Big data exploration and management projects, a term paper and a presentation are required.
Big Data Analytics
This course provides a graduate-level introduction to the concepts and techniques used in data mining. Topics include the knowledge discovery process; prototype development and building data mining models; current issues and application domains for data mining; and legal and ethical issues involved in collecting and mining data. Both algorithmic and application issues are emphasized to permit students to gain the knowledge needed to conduct research in data mining and apply data mining techniques in practical applications. Data mining projects, a term paper, and presentations are required.
Database System Implementation
This course provides a broad introduction to database management systems including data modeling, the relational model, and SQL. Database system implementation issues are covered next, where the focus is on data structures and algorithms used to implement database management systems. Topics include physical data organizations, indexing and hashing, query processing and optimization, database recovery techniques, transaction management, concurrency control, and database performance evaluation. Current research topics in database system implementation are also explored. Programming projects, a term paper, and presentations will be required.
Secure Data Management
This course examines policies, methods and mechanisms for securing enterprise and personal data and ensuring data privacy. Topics include data integrity and confidentiality; access control models; secure database architectures; secure transaction processing; information flow, aggregation, and inference controls; auditing; securing data in contemporary (relational, XML and other NO SQL) database systems; data privacy; and legal and ethical issues in data protection. Programming projects are required.
An introduction to the study of distributed systems. The course covers distributed system architectures such as client-server and peer-to-peer, distributed system design issues such as communication, fault tolerance, coordination, and deadlock, distributed system middleware such as remote method invocation (RMI) and tuple space, and the theory of distributed algorithms such as logical clocks and leader election. Programming projects are required.
Foundations of Parallel Computing
This course is a study of the hardware and software issues in parallel computing. Topics include an introduction to the basic concepts, parallel architectures and network topologies, parallel algorithms, parallel metrics, parallel languages, granularity, applications, parallel programming design and debugging. Students will become familiar with various types of parallel architectures and programming environments.
Data Cleaning and Preparation
This course provides an introduction to the concepts and techniques used in preparing data for subsequent data mining. Topics include the knowledge discovery process; data exploration and its role; data extraction, cleaning, integration and transformation; handling numeric, unstructured, text, web, and other forms of data; and ethical issues underlying data preparation and mining. Data cleaning projects, a term paper, and presentations are required.
Topics in Data Management
This course examines current topics in Data Management. This is intended to allow faculty to pilot potential new graduate offerings. Specific course details (such as prerequisites, course topics, format, learning outcomes, assessment methods, and resource needs) will be determined by the faculty member(s) who propose a specific topics course in this area. Specific course instances will be identified as belonging to the Data Management cluster, the Security cluster, or both clusters.
This course covers the purpose, scope, capabilities, and processes used in data warehousing technologies for the management and analysis of data. Students will be introduced to the theory of data warehousing, dimensional data modeling, the extract/transform/load process, warehouse implementation, dimensional data analysis, and summary data management. The basics of data mining and importance of data security will also be discussed. Hands-on exercises include implementing a data warehouse.
Data-driven Knowledge Discovery
Rapidly expanding collections of data from all areas of society are becoming available in digital form. Computer-based methods are available to facilitate discovering new information and knowledge that is embedded in these collections of data. This course provides students with an introduction to the use of these data analytic methods, with a focus on statistical learning models, within the context of the data-driven knowledge discovery process. Topics include motivations for data-driven discovery, sources of discoverable knowledge (e.g., data, text, the web, maps), data selection and retrieval, data transformation, computer-based methods for data-driven discovery, and interpretation of results. Emphasis is placed on the application of knowledge discovery methods to specific domains.
To be considered for admission to the advanced certificate in big data analytics, candidates must fulfill the following requirements: