Using psycopg2 package#

Let first install the package using conda

%%sh
conda install -c anaconda psycopg2 

Let’s make sure that we can import it.

import psycopg2

Now we have psycopg2 imported into our notebook. psycopg2 package manages the interaction between python and our database. So the first takeaway message here is that psycopg2 has nothing to do with jupyter notebook; rather, it’s tied to python. So we use jupyter notebook as an interactive programming environment to query in python. Later if time permits, we will talk about SQL magics, which is a specific way of interacting with databases from jupyter notebook.

Okay! Let’s now get to the steps to connect to a database using psycopg2.

  • Create a connection

    This connection allows communication with the database; It opens up a network connection.

  • Create a cursor

    The cursor is an address for the memory on the database management server to say this is what we are looking at. This python object helps you execute the query and fetch the results from the database. You can read more about cursors here.

  • Formulate your query

    Formulate the SQL query that you want to execute in the database.

  • Execute

    Pass your query to execute() method of cursor object, run the query in the database.

  • Fetch/commit/rollback

    The query that we performed in execute doesn’t return the query right away. To return it, we need to perform a fetch. If the query is to write something, then we need to commit it. If some transaction went wrong, then we need to roll back it.

Let’s now check out these by creating a ticker table, loading data to it, and doing some querying. Let’s first create schema import,

Note

Below you need to replace the conString values with your host,dbname,user, password, and port you used while creating your RDS instance.

# Create a connection
conString = {'host':'mbandtweet.xxxxx.amazonaws.com',
             'dbname':'postgres',
             'user':'postgres',
             'password':'password',
             'port':5432}
conn = psycopg2.connect(**conString)
# Create a cursor
cur = conn.cursor()
# - Formulate your query
query = """CREATE SCHEMA IF NOT EXISTS classwork"""
# - Execute
cur.execute(query)
# - commiting.
conn.commit()

In the above code we created schema import. You probably already know about it, but if you want to read more on when & why use schema, check this out. Let’s create table,

## Here we create the table tickers
cur.execute("""CREATE TABLE IF NOT EXISTS classwork.tickers(
               actsymbol text PRIMARY KEY,
               securityname text,
               exchange text,
               cqssymbol text,
               etf text,
               roundlotsize text,
               testissue text,
               nasdaqsymbol text)""")
conn.commit()

Reading data from database#

As discussed before, an execute doesn’t return the data right away, and we need to perform a fetch. There are mainly 3 flavors of fetch

  • fetchone

  • fetchmany

  • fetchall

query = """SELECT * FROM classwork.tickers"""
cur.execute(query)
row = cur.fetchone()
print(row)
query = """SELECT * FROM classwork.tickers"""
cur.execute(query)
row = cur.fetchmany(5)
print(row)
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)

Sometimes you might not need everything that is returned from a database. For example, you might want to transform each row returned from the database. In these cases fetching each row at a time will be of help.

query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
for rows in cur:
    print(rows[1])

But in general most of the cases, you can go with fetchall, provided you write an efficient SQL query to execute and get just the columns and rows that you are interested in.

Before we move to the next topic, let me show you how rollback works. Say, for instance, your query ends up failing for some reason.

query = """SELECT * FROM classwrk.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)

You realized it and corrected it.

query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)

But this query, even though it’s correct it won’t end up going through and will get this error saying your current transaction is aborted

../_images/error.png

You need to do a rollback() to back to the previous stable state and then execute your query.

conn.rollback()
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)

Important

Whenever you get an error, as we showed before, InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block , make sure you do a rollback.

Tip

It’s not a bad idea to develop your SQL query in pgadmin, toad or any other GUI interface and then bring it in once you know it’s ready.