Views#

Gittu George, January 15 2024

Views#

Whether we want to use VIEWS depends on the plotting or querying frequency. For example, complex queries with changing data that will be run multiple times (or have varying “WHERE” statements) can use a VIEW.

Check out how much time this query is taking to run. Before all that lets initiate the connection to the database.

import os
import psycopg2

##Make sure you import and load your .env file
from dotenv import load_dotenv
load_dotenv()

conString = {'host':os.environ.get('DB_HOST'),
             'dbname':os.environ.get('DB_NAME'),
             'user':os.environ.get('DB_USER'),
             'password':os.environ.get('DB_PASS'),
             'port':os.environ.get('DB_PORT')}
print(conString["port"])
conn = psycopg2.connect(**conString)
cur = conn.cursor()
%%time
query="""SELECT tw.id, 
       unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
cur.fetchmany(5)

Let’s create a view

%%time
query="""CREATE OR REPLACE VIEW import.tickertweets AS
SELECT tw.id, unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
conn.commit()

Let’s see the query time when we call it from the VIEW.

%%time
query="""SELECT * FROM import.tickertweets;"""
cur.execute(query)
cur.fetchmany(5)

So, a TABLE is stored data. A VIEW is a stored query. A VIEW’s query is stored to be cached (saved to memory) for faster retrieval and optimized for speed.

Complex queries with fixed data that will be run multiple times (or have varying “WHERE” statement) can use a MATERIALIZED VIEW:

%%time
query="""DROP MATERIALIZED VIEW import.tickertweetsmat CASCADE;"""
cur.execute(query)
conn.commit()
%%time
query="""CREATE MATERIALIZED VIEW import.tickertweetsmat AS
SELECT tw.id, 
  unnest(regexp_matches(tw.text, '\$[A-Z]+\M', 'g')) AS substring
FROM import.tweets AS tw;"""
cur.execute(query)
conn.commit()

Let’s see the query time when we call it from the VIEW.

%%time
query="""SELECT * FROM import.tickertweetsmat;"""
cur.execute(query)
cur.fetchmany(5)
  • A TABLE is stored data.

  • A VIEW is a stored query.

  • A MATERIALIZED VIEW is a query committed to stored data.

When to use what?

  • Choose a table when the underlying data don’t change (except for INSERTs)

  • Choose a MATERIALIZED VIEW when the query is slow, and you can tolerate some lag between UPDATE/INSERTs and updated values.

  • Choose a VIEW when you reuse the query and want to simplify calling it.

Check out these yourself?

  • How much time does it take to create a view vs. materialized view?

  • How much time does it take to query? You can use these queries to find that out.

  • How much space does it take? You can use these queries to find that out.

%%time
query="""SELECT pg_size_pretty (pg_relation_size('import.tickertweets'));"""
cur.execute(query)
print(cur.fetchone())
query="""SELECT pg_size_pretty (pg_relation_size('import.tickertweetsmat'));"""
cur.execute(query)
print(cur.fetchone())

Summary#

  • Views can be an important tool in managing long-running queries for visualization (and for other purposes)

  • Tradeoffs exist with respect to storage space and the rate at which data needs to be refreshed.

View

Materialized View

Low Disk Space

High Disk Space

Slower Read Time

Fast Read Time

Can be refreshed

Can be refreshed