Syllabus

Title
0903 Data Analytics
Instructors
PD Mag.Dr. Gertraud Malsiner-Walli, M.Stat., Dr. Lucas Kook
Type
PI
Weekly hours
2
Language of instruction
Englisch
Registration
09/03/24 to 09/29/24
Registration via LPIS
Notes to the course
Dates
Day Date Time Room
Tuesday 10/15/24 08:00 AM - 11:00 AM D5.0.002
Thursday 10/17/24 04:00 PM - 06:00 PM TC.-1.61 (P&S)
Tuesday 10/22/24 08:00 AM - 11:00 AM D5.0.002
Thursday 10/24/24 04:00 PM - 06:00 PM TC.-1.61 (P&S)
Tuesday 10/29/24 08:00 AM - 11:00 AM D5.0.002
Thursday 10/31/24 04:00 PM - 06:00 PM TC.-1.61 (P&S)
Tuesday 11/05/24 08:00 AM - 11:00 AM D5.0.002
Thursday 11/07/24 04:00 PM - 06:00 PM TC.-1.61 (P&S)
Tuesday 11/19/24 08:00 AM - 11:30 AM D5.0.002
Tuesday 11/26/24 08:00 AM - 11:30 AM D5.0.002
Contents

This course provides a comprehensive introduction to statistical learning using the R programming language. Through hands-on coding exercises, students will gain practical experience in various statistical learning methods. The course is designed to equip students with the skills to perform data analysis and build predictive models using R and RStudio. In a data analytics project (40% of the final grade), students will select a dataset of their choice, apply the methods learned throughout the course, and create a comprehensive, reproducible report using RMarkdown.

Lecture 1: What is Statistical Learning?

Topics: - Introduction to statistical learning - Distinction between supervised and unsupervised learning - Applications and importance of statistical learning in various fields - Introduction to R and RStudio - Setting up RMarkdown for reproducible reports

Readings: - ISLR Chapter 1: Introduction - R for Data Science by Garrett Grolemund and Hadley Wickham: Chapter 1

Lecture 2: Regression and Classification

Topics: - Linear regression - Logistic regression - Performance measures for regression and classification - Hands-on coding exercises in R - Creating reproducible reports for regression and classification analyses

Readings: - ISLR Chapter 3: Linear Regression - ISLR Chapter 4: Classification - R for Data Science: Chapters 3-5 (Data Wrangling and Visualization)

Lecture 3: Tree-Based Methods

Topics: - Decision trees - Bagging and random forests - Model interpretation and evaluation - Hands-on coding exercises in R - Creating reproducible reports for tree-based methods

Readings: - ISLR Chapter 8: Tree-Based Methods - R for Data Science: Chapters 10-12 (Model Building)

Lecture 4: Unsupervised Learning

Topics: - Clustering methods: K-means, hierarchical clustering - Principal Component Analysis (PCA) - Applications and interpretation of unsupervised learning methods - Hands-on coding exercises in R - Creating reproducible reports for unsupervised learning methods

Readings: - ISLR Chapter 10: Unsupervised Learning - R for Data Science: Chapters 13-15 (Unsupervised Learning)

Course Materials

  • Textbook: An Introduction to Statistical Learning with Applications in R (ISLR) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
  • Supplementary Reading: R for Data Science by Garrett Grolemund and Hadley Wickham
  • Software: R, RStudio
Learning outcomes

By the end of this course, students will be able to:

  1. Understand the Foundations of Statistical Learning:
    • Explain the basic concepts and importance of statistical learning.
    • Differentiate between supervised and unsupervised learning methods.
  2. Perform Regression and Classification:
    • Implement linear regression models using R.
    • Develop logistic regression models to solve classification problems.
    • Evaluate and interpret the performance of regression and classification models.
  3. Apply Tree-Based Methods:
    • Construct decision tree models for classification and regression tasks.
    • Utilize advanced tree-based techniques such as bagging and random forests.
    • Analyze and interpret the results of tree-based methods.
  4. Implement Unsupervised Learning Techniques:
    • Perform clustering using K-means and hierarchical clustering methods.
    • Conduct Principal Component Analysis (PCA) for dimensionality reduction.
    • Interpret and visualize the results of unsupervised learning methods.
  5. Utilize R and RStudio for Data Analysis:
    • Efficiently use R and RStudio for statistical learning tasks.
    • Create reproducible reports and documents using RMarkdown.
    • Integrate data preprocessing, visualization, and modeling workflows in R.
  6. Independent data analytics project work
    • Employ best practices for reproducible research in data science.
    • Present findings from statistical learning models clearly and concisely.
    • Write detailed and coherent reports that convey the methodology and results of statistical analyses.
  7. Engage in Hands-On Learning:
    • Gain practical experience through hands-on coding exercises and assignments.
    • Apply theoretical knowledge to real-world data sets and problems.
    • Work collaboratively on projects and participate in class discussions.

By mastering these outcomes, students will be well-prepared to apply statistical learning techniques in various domains, conduct rigorous data analyses, and continue advancing their skills in data science and machine learning.

Attendance requirements

Students are allowed to skip one unit at most. This regulation holds also for the online mode. Students need to be present at the final presentation of the data analytics project.

Teaching/learning method(s)

At the beginning theoretical foundations of Machine Learning technologies will be presented. An introduction to R for Data Science will be given.

Data analytics project: Students will select a dataset of their choice, apply the methods learned throughout the course, and create a comprehensive, reproducible report using RMarkdown. The data analytics project should include data preprocessing, exploratory data analysis, model building, evaluation, and a summary of findings. After the first two lectures, the student groups are required to present a "dataset pitch" (their choice of dataset for the data analytics project). At the end of the course, students will present the result of their data analytics project.

Assessment
  • Homework (30%)
  • Data Analytics Project (40%)
  • Final Exam (30%)
Prerequisites for participation and waiting lists

Please be aware that for all courses in this SBWL registration is only possibly for students who successfully have completed the entry course (Einstieg in die SBWL: Data Science).

Note that for courses within the SBWL “Data Science” we can only accept students enrolled in one of WU’s bachelor programmes who qualify for starting an SBWL; particularly, we cannot accept students from other courses and programmes enrolled at WU as ‘Mitbeleger’ only.

Readings

Please log in with your WU account to use all functionalities of read!t. For off-campus access to our licensed electronic resources, remember to activate your VPN connection connection. In case you encounter any technical problems or have questions regarding read!t, please feel free to contact the library at readinglists@wu.ac.at.

Availability of lecturer(s)

Last edited: 2024-06-19



Back