Using psycopg2 package#
Let first install the package using conda
%%sh
conda install -c anaconda psycopg2
Let’s make sure that we can import it.
import psycopg2
Now we have psycopg2
imported into our notebook. psycopg2
package manages the interaction between python and our database. So the first takeaway message here is that psycopg2 has nothing to do with jupyter notebook; rather, it’s tied to python. So we use jupyter notebook as an interactive programming environment to query in python. Later if time permits, we will talk about SQL magics,
which is a specific way of interacting with databases from jupyter notebook.
Okay! Let’s now get to the steps to connect to a database using psycopg2.
Create a connection
This connection allows communication with the database; It opens up a network connection.
Create a cursor
The cursor is an address for the memory on the database management server to say this is what we are looking at. This python object helps you execute the query and fetch the results from the database. You can read more about cursors here.
Formulate your query
Formulate the SQL query that you want to execute in the database.
Execute
Pass your query to execute() method of cursor object, run the query in the database.
Fetch/commit/rollback
The query that we performed in execute doesn’t return the query right away. To return it, we need to perform a fetch. If the query is to write something, then we need to commit it. If some transaction went wrong, then we need to roll back it.
Let’s now check out these by creating a ticker table, loading data to it, and doing some querying. Let’s first create schema import,
Note
Below you need to replace the conString
values with your host
,dbname
,user
, password
, and port
you used while creating your RDS instance.
# Create a connection
conString = {'host':'mbandtweet.xxxxx.amazonaws.com',
'dbname':'postgres',
'user':'postgres',
'password':'password',
'port':5432}
conn = psycopg2.connect(**conString)
# Create a cursor
cur = conn.cursor()
# - Formulate your query
query = """CREATE SCHEMA IF NOT EXISTS classwork"""
# - Execute
cur.execute(query)
# - commiting.
conn.commit()
In the above code we created schema import. You probably already know about it, but if you want to read more on when & why use schema, check this out. Let’s create table,
## Here we create the table tickers
cur.execute("""CREATE TABLE IF NOT EXISTS classwork.tickers(
actsymbol text PRIMARY KEY,
securityname text,
exchange text,
cqssymbol text,
etf text,
roundlotsize text,
testissue text,
nasdaqsymbol text)""")
conn.commit()
Reading data from database#
As discussed before, an execute doesn’t return the data right away, and we need to perform a fetch. There are mainly 3 flavors of fetch
fetchone
fetchmany
fetchall
query = """SELECT * FROM classwork.tickers"""
cur.execute(query)
row = cur.fetchone()
print(row)
query = """SELECT * FROM classwork.tickers"""
cur.execute(query)
row = cur.fetchmany(5)
print(row)
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)
Sometimes you might not need everything that is returned from a database. For example, you might want to transform each row returned from the database. In these cases fetching each row at a time will be of help.
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
for rows in cur:
print(rows[1])
But in general most of the cases, you can go with fetchall, provided you write an efficient SQL query to execute and get just the columns and rows that you are interested in.
Before we move to the next topic, let me show you how rollback works. Say, for instance, your query ends up failing for some reason.
query = """SELECT * FROM classwrk.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)
You realized it and corrected it.
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)
But this query, even though it’s correct it won’t end up going through and will get this error saying your current transaction is aborted
You need to do a rollback()
to back to the previous stable state and then execute your query.
conn.rollback()
query = """SELECT * FROM classwork.tickers LIMIT 5"""
cur.execute(query)
row = cur.fetchall()
print(row)
Important
Whenever you get an error, as we showed before, InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block
, make sure you do a rollback
.
Tip
It’s not a bad idea to develop your SQL query in pgadmin, toad or any other GUI interface and then bring it in once you know it’s ready.