The goal of the course is to provide students with comprehensive knowledge of fundamental techniques and methods in natural language processing (NLP) and machine learning. Throughout the semester, students will learn the Python programming language and its application to various text processing tasks, including web scraping, text preparation, tokenization, and lemmatization. The course covers vector models, probabilistic models, and different types of machine learning models, enabling students to summarize texts, classify texts, and apply various linguistic models and text modeling techniques. In addition to theoretical knowledge, students will deepen their understanding through practical exercises, preparing them to apply advanced NLP solutions in real-world contexts.
Course Topics and Schedule (based on a 14-week semester):
1) Introduction to Artificial Intelligence and NLP
2) Python I: Basic operations, variables, file reading
3) Python II: If-Else operations, strings, collections, lists, list functions, loops
4) Python III: Web Scraping, collections, tuples, dictionaries
5) Vector Models: Text preparation, tokenization, lemmatization
6) Vector Models: TF-IDF, Neural Word Embeddings
7) Probabilistic Models: Markov model, text classification
8) Probabilistic Models: Language models, text generation, poetry writing
9) Probabilistic Models: N-Gram-based word substitution
10) Machine Learning Models: Naive Bayes – How to determine if my model is adequate?
11) Machine Learning Models: Logistic Regression – Sentiment Analysis
12) Machine Learning Models: Text summarization
13) Machine Learning Models: Latent Dirichlet Allocation – Topic Modeling, Non-negative Matrix Factorization (NMF)
14) Machine Learning Models: Latent Semantic Analysis (Latent Semantic Indexing)
At the end of the course, the learner will be able to analyze large text datasets using natural language processing (NLP) and machine learning techniques.
At the end of the course, the learner will be able to apply Python programming skills to perform tasks such as web scraping, text preparation, tokenization, and lemmatization.
At the end of the course, the learner will be able to utilize vector models, probabilistic models, and machine learning models to summarize, classify, and model texts.
At the end of the course, the learner will be able to create language models for text generation and perform sentiment analysis using logistic regression.
At the end of the course, the learner will be able to implement machine learning techniques such as Naive Bayes, Latent Dirichlet Allocation (LDA), and Topic Modeling to explore patterns in textual data.
At the end of the course, the learner will be able to evaluate the effectiveness of NLP models by interpreting the results of machine learning algorithms.
At the end of the course, the learner will be able to collaborate in group projects to develop Python scripts, apply NLP methods, and present findings based on the analysis of real-world data.
No requirements.
Required Reading:
Elhadad, M. (2010). Natural Language Processing with Python, Steven Bird, Ewan Klein, and Edward Loper, O’Reilly Media.
Antić, Z. (2021). Python Natural Language Processing Cookbook. Packt Publishing Ltd.
Wittgenstein, L. (1998). Philosophical Investigations. Atlantis Budapest.
Chomsky, N. (1968). Linguistic Contributions to the Study of Mind: Future. Language and Thinking.
Gadamer, H.G. (2003). Truth and Method: Outline of a Philosophical Hermeneutics. Sapientia Humana Osiris.
Recommended Reading:
Shannon, C. (2001). A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review.
Barrios, F., López, F., Argerich, L., & Wachenchauzer, R. (2016). Variations of the Similarity Function of TextRank for Automated Summarization.
Steinberger, J., & Jezek, K. (2004). Using Latent Semantic Analysis in Text Summarization and Summary Evaluation.
Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam Filtering with Naive Bayes – Which Naive Bayes?
Aizawa, A. (2003). An Information-Theoretic Perspective of TF-IDF Measures.
Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing Order into Texts.
Gong, Y., & Liu, X. (2001). Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis.
Ramadhan, W.P., Novianty, S.A., & Setianingsih, S.C. (2017). Sentiment Analysis Using Multinomial Logistic Regression.
Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research.
Interactive sessions that encourage discussion, analysis, and practical exploration of NLP techniques and machine learning models. Students will engage with real-world examples, case studies, and academic papers.
Group Work:
Collaborative projects where students will work in teams to apply machine learning methods to analyze text datasets. This includes creating Python scripts, performing data processing tasks, and presenting their findings.
Practical Lab Sessions:
Hands-on programming exercises conducted in a computer lab setting. Students will implement the techniques learned in the lectures, such as web scraping, tokenization, text classification, and summarization, using Python.
Students will present their group projects to the class, offering an opportunity to develop presentation skills and receive feedback from peers and the instructor.
Transcript of records
If you notice any issues with the layout, content, or functionality of the page, please let us know.
Your input helps us improve and ensures a better experience for everyone.
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.