Lecture 0: Course Introduction#

Gittu George, January 9 2024

Teaching squad#

Instructor#

  • I am Gittu George, Ph.D

  • I am a Full-time Lecturer at Department of Computer Science.

  • Email Me: ggeorg02@cs.ubc.ca

  • Office Hours: Tue 2 -3 pm

  • Research interests are at the intersection of computer science and genomics.

  • I primarily teach school of Computer Science and Master of Data Science students.

  • I re-developed this course based on the syllabus from Winter 2020.

Todays Agenda#

  • Course Overview

  • Data management in a big data environment

    • What is big data?

    • Which tool to use?

    • How big is big data?

  • Introduction to cloud computing

Course Overview#

My Goals for the Course#

  • To think critically about databases as part of an analytic workflow

  • Learn how to design, use and understand the inner working of the SQL based databases

  • Taking you from level zero to intermediate with the NoSQL databases (document and graph-based databases)

  • To work with the data to find the tools best suited to answering the questions you pose

What this course is not about#

  • SQL or python programming

  • Cloud computing

Course plan#

Date

Topic

Assessments due

January 9

Introduction to Big Data & cloud computing

January 11

Introduction to RDS and interaction with AWS

January 16

Faster SQL (Indexing)

January 18

(de)Normalization & Data Warehousing

January 23

Introduction to NoSQL and Graph Databases

January 25

Querying Graph Databases (Part 1)

January 30

Querying Graph Databases (Part 2)

February 1

Document Databases Intro

February 6

Querying Document Databases

February 8

Class Conclusions/ Special Topics

Course Model#

../_images/l2.png

Individual Assignments (50 %)#

Assignment 1 (16 %)#

Introduce you to AWS, working with Postgres in a Jupyter notebook.

  • Think about data in the context of a research problem.

  • Setup your AWS account.

  • Launch your database in AWS.

  • Use of Database dumps to setup your database.

  • Apply knowledge in indexing and warehousing to efficiently answer your questions in SQL.

Assignment 2 (17 %)#

  • An introduction to graph databases using the initial Twitter data.

  • Using the graph to answer questions about networks of interaction.

  • Producing interactive plots to represent knowledge.

  • Practice on CQL.

Assignment 3 (17 %)#

  • An introduction to Document databases.

  • Practice on MQL.

Worksheets (10 %)#

  • Every lecture will have a worksheet (in addition to the assignments) that will help you to prepare for your assignments and practice what you learned in class.

Conclusion#

  • Ask yourself If you are comfortable with the course logistics.

  • Join piazza if you haven’t done so. Link from Canvas or from announcements. If you have questions related to the Lecture, logistics, and assignments are expected to be asked in the piazza. Make sure you attach the labels correctly so that we can distinguish questions.

  • If there is anything else, please feel free to reach out to me at ggeorg02@cs.ubc.ca

  • I will be releasing lecture notes before Monday morining so that you can look into those before coming to class.

  • Check for the deadlines in canvas.

  • Join iclicker cloud.

  • Make sure you did necessary installations

../_images/ready.png