The course navigates through the basic concepts and principles behind the main data science models and techniques. Descriptive techniques such as clustering and frequent pattern mining are explained in more details while, in case of predictive techniques, the focus is put mainly on the concepts of a model, its parameters and hyper-parameters as well as the quality and validation of models including overfitting-underfitting and the bias.-variance trade-offs. Data quality and pre-processing issues related to various data types and modeling problems are also tackled. Details of the topics covered:
– Introduction to bi-variate and multi-variate analysis
– Data visualization and exploratory data analysis
– Dimensionality reduction and features selection
– Clustering: k-means, agglomerative, DBSCAN, cluster validation;
– Frequent Pattern Mining: itemsets, association rules, quality measures;
– Linear Classification and Regression: model, parameters and hyper-parameters, validation, overfitting-underfitting and the bias-variance trade-off;
https://neptun.elte.hu/MobilityCourses?Faculty=&Programme=&AcademicTerm=&Published=&SearchText=Introduction+to+data+science
At the end of the course, the learner will be able to apply bi-variate and multi-variate analysis techniques, use data visualization and exploratory data analysis to investigate datasets, select dimensionality reduction and feature selection methods, implement and validate clustering algorithms, discover and assess frequent itemsets and association rules, and develop, tune, and validate linear classification and regression models while addressing overfitting, underfitting, and the bias-variance trade-off.
Learners are expected to have basic knowledge of statistics, linear algebra, and programming, as well as prior familiarity with data handling and basic machine learning concepts
– Peter Flach (2012). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.
– Jiawei Han, Micheline Kamber, Jian Pei (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann.
– Pang-Ning Tan, Michael Steinbach, Vipin Kumar (2005). Introduction to Data Mining. Addison Wesley.
The course will combine lectures, practical lab sessions, guided exercises, case studies, and project-based learning. Lectures will introduce the theoretical foundations of data analysis and machine learning methods, while lab sessions will allow learners to apply these techniques to real or simulated datasets using appropriate software tools. Students will work individually and in groups to solve analytical problems, interpret results, and evaluate model performance. The course will also include discussions, demonstrations, and feedback sessions to support critical thinking and practical skill development.
Transcript of records