2118 Data Processing 1
Univ.Prof. Dr. Axel Polleres, Assoz.Prof PD Dr. Stefan Sobernig
  • LV-Typ
  • Semesterstunden
  • Unterrichtssprache
17.09.2020 bis 20.09.2020
Anmeldung über LPIS
Hinweise zur LV
Planpunkt(e) Bachelor
Wochentag Datum Uhrzeit Raum
Dienstag 06.10.2020 10:00 - 13:30 Online-Einheit
Dienstag 06.10.2020 14:00 - 15:30 Online-Einheit
Dienstag 13.10.2020 10:00 - 13:30 Online-Einheit
Dienstag 13.10.2020 14:00 - 15:30 Online-Einheit
Dienstag 20.10.2020 10:30 - 14:00 Online-Einheit
Dienstag 20.10.2020 14:00 - 15:30 Online-Einheit
Dienstag 27.10.2020 14:00 - 17:30 Online-Einheit
Dienstag 27.10.2020 18:00 - 19:30 Online-Einheit
Dienstag 03.11.2020 11:30 - 14:00 Online-Einheit
Dienstag 03.11.2020 14:00 - 15:30 Online-Einheit
Dienstag 10.11.2020 14:00 - 17:30 Online-Einheit
Dienstag 10.11.2020 18:00 - 19:30 Online-Einheit
Dienstag 01.12.2020 09:00 - 16:00 Online-Einheit

Ablauf der LV bei eingeschränktem Campusbetrieb

  • The course (lectures and tutorials) will be held in distance mode at the scheduled time slots. We will schedule Web conference sessions at each time slot, one for the respective lecture, one for the subsequent tutorial.
  • All other details (esp.: participation rules, tasks and assessments) remain as described in the other sections of this course description.
If the overall situation permits at that time, and maintaining all precautionary measures is possible, we will try to organize for extracurricular outdoor meetings to compensate a little for the distance mode.
Presence and participation during the Web conference sessions will be recorded by a mix of mechanisms (e.g., Clicker surveys, cold calls during the tutorials).

Inhalte der LV

This fast-paced class is intended for getting students interested in data science up to speed:

We start with an introduction to the field of "Data Science" and into the overall Data Science Process.

The primary focus of the rest of the course is on gaining fundamental knowledge for Data processing, that is, preparation, cleansing and storage of data, which typically takes the largest part of any data science project. We will learn how to deal with different data formats and how to use methods and tools to integrate data from various sources, plus how to resolve quality issues such as duplicates, encoding errors, missing values, etc. within raw data.

The integrated data can then be used for further data analytics tasks (cf. course 2 in this SBWL).

The students will practice approaches and techniques using the Python programming language in an interactive environment.

All course material will be available at:

    Lernergebnisse (Learning Outcomes)

    Overall, students shall gain fundamental knowledge for dealing with different data formats and in using methods and tools to integrate data from various sources in this course. This includes:
    * Hands-on experience in processing and preparing data for data science tasks with Python.
    * An understanding of how to use the Python standard library to write programs, access the various data science tools.
    * Working knowledge of the Python tools ideally suited for data science tasks, including:
        * Accessing data (e.g., tabular (CSV), tree (JSON), graph shaped (RDF) data but also databases)
        * Cleansing and normalizing data
        * Sorting, filtering and grouping data
        * Tools and algorithms for data transformation
        * Connection to and loading data into a database system and indexing techniques, for faster access of data in a database

    Regelung zur Anwesenheit

    The attendance of at least 80% of the course units is a mandatory criterion.

    Presence in the first lesson is required.


    The course will focus on in-class code walkthroughs to present high-quality, well-commented code that students can later reference.
    The course will balance between group and  individual assignments.
    The students will be able to apply new learned concepts and methods directly in the class using real world Open Data data sources.

    Leistung(en) für eine Beurteilung

    The assessment will be based on the following:

    • 10% quizzes or clicker questions
    • 85% homework assignments (individual as well as group assignments)
    • 5% entry exam / peer grading

    The attendance of at least 80% of the course units is a mandatory criterion.

    Erreichbarkeit des/der Vortragenden

    If you have questions on the course or on the homework, proceed as follows:

    • All general questions should be posted to the dedicated forum.
    • Send an email to the DP1 team at with the subject line containing: "[Data Processing 1]"


    Zuletzt bearbeitet: 14.10.2020