CS 430: Introduction to Big Data and Data Mining

Class Program
Credits 3

This course provides an introduction of concepts, techniques and applications behind data mining, text mining, and web mining on big data sets. It presents techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness and scalability. This course is designed for computer science students, business students and other professions which request large data analysis skills, including stream data, sequence data, graph structured data, social network data, and multirelational data. Topics include data preprocessing, data warehousing, OLAP and data cube, association and correlation rules, classification, decision trees, clustering, prediction and anomaly detection. This course will also introduce the state of art software in Big Data such as Apache Hadoop. (Spring)