Lecture 0: Course Introduction#
Gittu George, January 9 2024
Teaching squad#
Instructor#
I am Gittu George, Ph.D
I am a Full-time Lecturer at Department of Computer Science.
Email Me: ggeorg02@cs.ubc.ca
Office Hours: Tue 2 -3 pm
Research interests are at the intersection of computer science and genomics.
I primarily teach school of Computer Science and Master of Data Science students.
I re-developed this course based on the syllabus from Winter 2020.
Todays Agenda#
Course Overview
Data management in a big data environment
What is big data?
Which tool to use?
How big is big data?
Introduction to cloud computing
Course Overview#
My Goals for the Course#
To think critically about databases as part of an analytic workflow
Learn how to design, use and understand the inner working of the SQL based databases
Taking you from level zero to intermediate with the NoSQL databases (document and graph-based databases)
To work with the data to find the tools best suited to answering the questions you pose
What this course is not about#
SQL or python programming
Cloud computing
Course plan#
Date |
Topic |
Assessments due |
---|---|---|
January 9 |
Introduction to Big Data & cloud computing |
|
January 11 |
Introduction to RDS and interaction with AWS |
|
January 16 |
Faster SQL (Indexing) |
|
January 18 |
(de)Normalization & Data Warehousing |
|
January 23 |
Introduction to NoSQL and Graph Databases |
|
January 25 |
Querying Graph Databases (Part 1) |
|
January 30 |
Querying Graph Databases (Part 2) |
|
February 1 |
Document Databases Intro |
|
February 6 |
Querying Document Databases |
|
February 8 |
Class Conclusions/ Special Topics |
Course Model#
Individual Assignments (50 %)#
Assignment 1 (16 %)#
Introduce you to AWS, working with Postgres in a Jupyter notebook.
Think about data in the context of a research problem.
Setup your AWS account.
Launch your database in AWS.
Use of Database dumps to setup your database.
Apply knowledge in indexing and warehousing to efficiently answer your questions in SQL.
Assignment 2 (17 %)#
An introduction to graph databases using the initial Twitter data.
Using the graph to answer questions about networks of interaction.
Producing interactive plots to represent knowledge.
Practice on CQL.
Assignment 3 (17 %)#
An introduction to Document databases.
Practice on MQL.
Worksheets (10 %)#
Every lecture will have a worksheet (in addition to the assignments) that will help you to prepare for your assignments and practice what you learned in class.
Conclusion#
Ask yourself If you are comfortable with the course logistics.
Join piazza if you haven’t done so. Link from Canvas or from announcements. If you have questions related to the Lecture, logistics, and assignments are expected to be asked in the piazza. Make sure you attach the labels correctly so that we can distinguish questions.
If there is anything else, please feel free to reach out to me at ggeorg02@cs.ubc.ca
I will be releasing lecture notes before Monday morining so that you can look into those before coming to class.
Check for the deadlines in canvas.
Join iclicker cloud.
Make sure you did necessary installations