Views
Contents
Views#
Gittu George, January 12 2023
Views#
Whether we want to use VIEWS depends on the plotting or querying frequency. For example, complex queries with changing data that will be run multiple times (or have varying “WHERE” statements) can use a VIEW.
Check out how much time this query is taking to run. Before all that lets initiate the connection to the database.
import os
import psycopg2
##Make sure you import and load your .env file
from dotenv import load_dotenv
load_dotenv()
conString = {'host':os.environ.get('DB_HOST'),
'dbname':os.environ.get('DB_NAME'),
'user':os.environ.get('DB_USER'),
'password':os.environ.get('DB_PASS'),
'port':os.environ.get('DB_PORT')}
print(conString["port"])
conn = psycopg2.connect(**conString)
cur = conn.cursor()
%%time
query="""SELECT tw.id,
unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
cur.fetchmany(5)
Let’s create a view
%%time
query="""CREATE OR REPLACE VIEW import.tickertweets AS
SELECT tw.id, unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
conn.commit()
Let’s see the query time when we call it from the VIEW.
%%time
query="""SELECT * FROM import.tickertweets;"""
cur.execute(query)
cur.fetchmany(5)
So, a TABLE is stored data. A VIEW is a stored query. A VIEW’s query is stored to be cached (saved to memory) for faster retrieval and optimized for speed.
Complex queries with fixed data that will be run multiple times (or have varying “WHERE” statement) can use a MATERIALIZED VIEW:
%%time
query="""DROP MATERIALIZED VIEW import.tickertweetsmat CASCADE;"""
cur.execute(query)
conn.commit()
%%time
query="""CREATE MATERIALIZED VIEW import.tickertweetsmat AS
SELECT tw.id,
unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
conn.commit()
Let’s see the query time when we call it from the VIEW.
%%time
query="""SELECT * FROM import.tickertweetsmat;"""
cur.execute(query)
cur.fetchmany(5)
A TABLE is stored data.
A VIEW is a stored query.
A MATERIALIZED VIEW is a query committed to stored data.
When to use what?
Choose a table when the underlying data don’t change (except for INSERTs)
Choose a MATERIALIZED VIEW when the query is slow, and you can tolerate some lag between UPDATE/INSERTs and updated values.
Choose a VIEW when you reuse the query and want to simplify calling it.
Check out these yourself?
How much time does it take to create a view vs. materialized view?
How much time does it take to query? You can use these queries to find that out.
How much space does it take? You can use these queries to find that out.
%%time
query="""SELECT pg_size_pretty (pg_relation_size('import.tickertweets'));"""
cur.execute(query)
print(cur.fetchone())
query="""SELECT pg_size_pretty (pg_relation_size('import.tickertweetsmat'));"""
cur.execute(query)
print(cur.fetchone())
Summary#
Views can be an important tool in managing long-running queries for visualization (and for other purposes)
Tradeoffs exist with respect to storage space and the rate at which data needs to be refreshed.
View |
Materialized View |
---|---|
Low Disk Space |
High Disk Space |
Slower Read Time |
Fast Read Time |
Can be refreshed |
Can be refreshed |