This course introduces students to the problems and issues in managing large sets of data, focusing on modeling, storing, searching, and transforming large collections of data for analysis. The course will cover database management and information retrieval systems, including relational database systems, massively parallel/distributed computation models (e.g., MapReduce/Hadoop) and various NoSQL (e.g., key-value, document, column, and graph) systems that are designed to handle extremely large-scale and complex data collections. Emphasis is placed on the application of large-scale data management techniques to particular domains. Programming projects are required.
Introduction to Data Science: Management