Introduction to Data Science

Introduction to Data Science

Delivery institution

Faculty of Informatics
Data Science and Engineering Department

Instructor(s):

Zakarya Farou

Start date

14 September 2026

End date

18 December 2026

Study field

CHARM priority field

Study level

Study load, ECTS

6

Short description

The course navigates through the basic concepts and principles behind the main data science models and techniques. Descriptive techniques such as clustering and frequent pattern mining are explained in more details while, in case of predictive techniques, the focus is put mainly on the concepts of a model, its parameters and hyper-parameters as well as the quality and validation of models including overfitting-underfitting and the bias.-variance trade-offs. Data quality and pre-processing issues related to various data types and modeling problems are also tackled. Details of the topics covered:

– Introduction to bi-variate and multi-variate analysis
– Data visualization and exploratory data analysis
– Dimensionality reduction and features selection
– Clustering: k-means, agglomerative, DBSCAN, cluster validation;
– Frequent Pattern Mining: itemsets, association rules, quality measures;
– Linear Classification and Regression: model, parameters and hyper-parameters, validation, overfitting-underfitting and the bias-variance trade-off;

Full description

https://neptun.elte.hu/MobilityCourses?Faculty=&Programme=&AcademicTerm=&Published=&SearchText=Introduction+to+data+science

Learning outcomes

At the end of the course, the learner will be able to apply bi-variate and multi-variate analysis techniques, use data visualization and exploratory data analysis to investigate datasets, select dimensionality reduction and feature selection methods, implement and validate clustering algorithms, discover and assess frequent itemsets and association rules, and develop, tune, and validate linear classification and regression models while addressing overfitting, underfitting, and the bias-variance trade-off.

Course requirements

Learners are expected to have basic knowledge of statistics, linear algebra, and programming, as well as prior familiarity with data handling and basic machine learning concepts

Places available

50

Course literature (compulsory or recommended):

– Peter Flach (2012). Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.
– Jiawei Han, Micheline Kamber, Jian Pei (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann.

– Pang-Ning Tan, Michael Steinbach, Vipin Kumar (2005). Introduction to Data Mining. Addison Wesley.

Planned educational activities and teaching methods:

The course will combine lectures, practical lab sessions, guided exercises, case studies, and project-based learning. Lectures will introduce the theoretical foundations of data analysis and machine learning methods, while lab sessions will allow learners to apply these techniques to real or simulated datasets using appropriate software tools. Students will work individually and in groups to solve analytical problems, interpret results, and evaluate model performance. The course will also include discussions, demonstrations, and feedback sessions to support critical thinking and practical skill development.

Course code

IPM-24ATIDSEG

Language

Assessment method

assignement

Final certification

Transcript of records

No. Apart from the official course completion certificate, no additional certificate is delivered.

Assessment date

11 January 2027

Modality

Learning management System in use

Canvas, Moodle, Microsoft Teams

Contact hours per week for the student:

2

Specific regular weekly teaching day/time

Monday/14:00-16:00

Time zone