Lecture 6: CQL 1#

Gittu George, January 19 2023

Attribution: This notebook is developed using some materials provided by neo4j.

Todays Agenda#

  • Creating nodes & relationships.

  • Creating constraints.

  • Querying nodes and relationships.

  • Working with patterns in queries.

  • Loading data using dumps.

  • Class activity:

Learning objectives#

  • Loading data and creating nodes and relationships

  • Using CQL to query the graph database

Here is a good cheat sheet that you can use.

Creating nodes & relationships#

We learned how to define

  • Nodes

  • Labels

  • Relationships

  • Relationship types

  • Properties

Here is the simplified syntax for creating a node:

CREATE (node)

e.g.:

// We are creating nodes
CREATE (git:PERSON {name: 'Gittu', age: 21, profession:'Postdoctoral Fellow'})
CREATE (mik:PERSON {name: 'Mike', age: 21,profession:'Associate Professor'})
CREATE (tom:PERSON {name: 'Tom', age: 24,profession:'Postdoctoral Fellow'})
CREATE (pan:Animal {name: 'Panda',method_of_movement: 'Climb',height:20,mainly_found: 'China'})
CREATE (pol:Animal {name: 'Polar Bear',method_of_movement: 'Walk',height:20,mainly_found: 'Canada'})
CREATE (kan:Animal {name: 'Kangaroo',method_of_movement: 'Hop',height:30,mainly_found: 'Australia'})

// We are creating a relationship that connects these nodes

CREATE (git)-[:LIKE {percent: 100}]->(pan),
(mik)-[:LIKE {percent: 90}]->(pol),
(tom)-[:HATE {percent: 100}]->(pol),
(tom)-[:LIKE {percent: 50}]->(kan),
(pol)-[:LIKE {percent: 100}]->(git)

With this, we finished creating our dummy graph data in the cloud. To see what we have created, we need to query the database.

Let’s start with how to examine the data model,

CALL db.schema.visualization()

Loading movie dataset#

Let’s create better dummy data in neo4j to learn how to query the database using cypher. We will use a movie dataset that neo4j provides. Let’s load the data first. Please download this file and copy it to your neo4j query space.

Warning

Please make sure you only run this one time, Because every time it’s running, it creates nodes and relationships again.

Creating constraints#

Use CONSTRAINT ON & ASSERT to create constraints.

  • Uniqueness constraint - Ensures that a value for a property is unique for all nodes of that type.

CREATE CONSTRAINT ON (n:Person) ASSERT (n.name) IS UNIQUE
CREATE CONSTRAINT ON (n:Movie) ASSERT (n.title) IS UNIQUE
  • Existence constraint - Ensures that when a node or relationship is created or modified, it must have certain properties set.

CREATE CONSTRAINT ExistsMovieTagline ON (m:Movie) ASSERT exists(m.tagline)
  • Node key - Ensures that a set of values for properties of a node of a given type is unique.

CREATE CONSTRAINT UniqueNameBornConstraint
ON (p:Person) ASSERT (p.name, p.born) IS NODE KEY

For retrieving constraints defined for the graph

CALL db.constraints()

For dropping constraints

DROP CONSTRAINT constraintname

Inspecting schema#

Let’s inspect the schema for the movie data.

CALL db.schema.visualization()

Includes nodes (entities,objects,things) such as:

  • Movie

  • Person

People are related to Movies using these relationships:

  • DIRECTED

  • ACTED_IN

  • WROTE

  • REVIEWED

Querying nodes and relationships#

MATCH & RETURN#

Use the MATCH to match the nodes you ask for, and RETURN to retrieve the nodes.

Here we retrieve all the nodes.

MATCH (n) // returns all nodes in the graph
RETURN n

As you can remember, (n) here stands for all nodes. Now, what if we want to retrieve only the nodes with Person labels?

MATCH (p:Person) // returns all Person nodes in the graph
RETURN p

You can add whatever nodes you are interested in based on properties. Here we are returning nodes filtered on a property value.

MATCH (m:Movie {released: 2003, tagline: 'Free your mind'})
RETURN m

Until now, we have been returning nodes. If we want to return the properties in those nodes

MATCH (p:Person {born: 1965})
RETURN p.name, p.born

You can specify aliases or column headings using AS

MATCH (p:Person {born: 1965})
RETURN p.name, p.born AS `birth year`

Now let’s add in some relationships as well to return relationships.

MATCH (p:Person)-[rel:ACTED_IN]->(m:Movie {title: 'The Matrix'})
RETURN p, rel, m

We can also have nodes with no variables or anonymous nodes. In the above example, we are interested in only the persons who acted in ‘The Matrix’.

MATCH (p:Person)-[:ACTED_IN]->(:Movie {title: 'The Matrix'})
RETURN p.name

You can also query on multiple relationships

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN|DIRECTED]->(m:Movie)
RETURN p.name, m.title

We can also have anonymous relationships.

MATCH (p:Person)--(m:Movie {title: 'The Matrix'})
RETURN p, m
MATCH (m:Movie)<--(p:Person {name: 'Tom Hanks'})
RETURN p, m

type() returns the type of a relationship.

MATCH (p:Person)-[rel]->(:Movie {title:'The Matrix'})
RETURN p.name, type(rel)

Traversing multiple relationships.

Since we have a graph, we can traverse through nodes to obtain relationships further into the traversal. In addition, you can assign a variable to the path and return the path as follows:

MATCH path = (p:Person)-[:FOLLOWS]->(:Person)-[:FOLLOWS]->(:Person {name:'Jessica Thompson'})
RETURN path

Let’s try adding some properties for relationships;

MATCH (p:Person)-[:REVIEWED {rating: 65}]->(:Movie {title: 'The Da Vinci Code'})
RETURN p.name

WHERE to Filter Queries#

Just like in SQL, we use WHERE to filter queries.

We already know how to write this query.

MATCH (p:Person)-[:ACTED_IN]->(m:Movie {released: 2008})
RETURN p, m

Can we write this using WHERE?

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008
RETURN p, m

Important

You can only refer to named nodes or relationships in a WHERE clause, so you must specify a variable for any node or relationship you are testing in the WHERE clause.

If we can write the same thing without using WHERE, then why do we need WHERE?

With the WHERE clause, you can specify potentially complex conditions for the query.

Here are some examples;

Using AND & OR in WHERE clause.

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008 OR m.released = 2009
RETURN p, m
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released >= 2003 AND m.released <= 2004
RETURN p.name, m.title, m.released

Important

IS NOT NULL, we can use in testing the existence of a property or a pattern. For the opposite of IS NOT NULL we use IS NULL.

Let’s test existance of a property using IS NOT NULL.

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND m.tagline IS NOT NULL
RETURN m.title, m.tagline

Using STARTS WITH in WHERE clause.

MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name

Let’s use toLower() to convert a property to lower case.

MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name

Note

Converting a property to lower case may not be performing well, as if an index has been created for this property, it will not be used at runtime.

See also

There are other string matching operators like ENDS WITH, CONTAINS

We can also test it with regular expressions.

MATCH (p:Person)
WHERE p.name =~'Tom.*'
RETURN p.name

Here =~ is used to filter words

Note

If you specify a regular expression. The index will never be used. In addition, the property value must fully match the regular expression.

Here is an example where we want to return all Person nodes of people who wrote movies:

MATCH (p:Person)-[:WROTE]->(m:Movie)
RETURN p.name, m.title

Next, we modify this query to exclude people who directed that particular movie: (Let’s test non existance of a pattern using NOT exists())

MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title

Here is another example where we want to find Gene Hackman and the movies he acted in with another person who also directed the movie: ((Let’s test existance of a pattern using exists()))

MATCH (gene:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(other:Person)
WHERE gene.name= 'Gene Hackman'
AND exists( (other)-[:DIRECTED]->(m) )
RETURN gene, other, m

In this example, we only want to retrieve Person nodes of people born in 1965 or 1970: (Using IN)

MATCH (p:Person)
WHERE p.born IN [1965, 1970]
RETURN p.name as name, p.born as yearBorn

In the following example you are using IN to see if there is ‘Neo’ in all the roles in ‘ACTED_IN’ relationship. (WHERE ‘Neo’ IN r.roles)

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE 'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name

iclicker 1#

Working with Patterns in Queries#

  • Traversal in a MATCH clause

We already understand this query from the last class, but understanding how the graph engine performed this traversal can help write an efficient CQL query.

MATCH (follower:Person)-[:FOLLOWS]->(reviewer:Person)-[:REVIEWED]->(m:Movie)
WHERE m.title = 'The Replacements'
RETURN follower.name, reviewer.name

Specifying multiple MATCH patterns#

MATCH (a:Person)-[:ACTED_IN]->(m:Movie),
(m)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name

Note

When multiple patterns are specified in a MATCH clause, no relationship is traversed more than once.

We can also specify the above query using a single pattern.

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name
  • Why multiple match patterns?

Even though both queries mentioned above mean the same, there are situations when you can’t represent it in a single match pattern, especially when looking at complex queries. For e.g.:

Suppose we want to retrieve the movies that Meg Ryan acted in and their respective directors and the other actors that acted in these movies. Here is the query to do this:

MATCH (meg:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person),
(other:Person)-[:ACTED_IN]->(m)
WHERE meg.name = 'Meg Ryan'
RETURN m.title as movie, d.name AS director , other.name AS `co-actors`

Note

You can specify aliases or column headings using AS; you will be seeing it a lot in all upcoming queries.

Tip

It’s a good practice to use AS wherever you can, as it can guide your thoughts and print results in a meaningful way.

Multiple MATCH clauses,#

MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie)
MATCH (actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name

But we can also write the same using multiple patterns,

MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie),
(actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name as `Actor name`

A best practice is to traverse as few nodes as possible so in this example, using multiple MATCH patterns is best.

Important

Usually, the multiple MATCH clauses are used when dealing with Subqueries using WITH. We will discuss it soon.

Specifying varying length paths#

Syntax:

//Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB and beyond
(nodeA)-[:RELTYPE*]->(nodeB) 
//Retrieve the paths of length 2 with the relationship, :RELTYPE from nodeA to nodeB:
(nodeA)-[:RELTYPE*2]->(nodeB)
// Retrieve the paths of length 3 with the relationship, :RELTYPE from nodeA to nodeB:
(nodeA)-[:RELTYPE*3]->(nodeB)

Experiment with the below query by varying depths, and see how it behaves.

MATCH (follower:Person)-[:FOLLOWS*3]-(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name

Now what if we want to retrieve the paths of lengths 1, 2, or 3 with the relationship, :RELTYPE from nodeA to nodeB, nodeB to nodeC, as well as, nodeC to _nodeD) (up to three hops):

Syntax:

(nodeA)-[:RELTYPE*1..3]->(nodeB)
MATCH (follower:Person)-[:FOLLOWS*1..3]-(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name

Finding the shortest path#

This shortestPath() function is very useful in many cases where there are many ways to reach from node A to node B.

MATCH p = shortestPath((m1:Movie)-[*]-(m2:Movie))
WHERE m1.title = 'A Few Good Men' AND
m2.title = 'The Matrix'
RETURN p

Note

Here we specify * for the relationship. This means to use any relationship for the traversal.

Working with dumps.#

You can use the GUI interface to load dumps. You need to do this for your assignment 3.

Go to your Neo4j Aura web console. Click on Import Database –> Select a .dump file –> select the file from your local file system

../_images/dump1.png

Additional Info (OPTIONAL):

You don’t want to know this, and I am not expecting you to know this, but I am putting this notebook here just as a reference if you want to get an idea of the steps I took to load our Twitter data into the graph database. But if we load the entire Twitter data, then the number of nodes will shoot up as we got lots of user IDs and we got limitations on the number of nodes that we load to our Neo4J aura, and hence I did a sampling so that we can make sure that our nodes are under the limit.

I loaded all this data for you and took a dump. You will be using the CQL that we will be learning in our class to work on the assignments.

Can you?#

Write Cypher statements to:

  • Retrieve nodes from the graph.

  • Filter nodes retrieved using property values of nodes.

  • Retrieve property values from nodes in the graph.

  • Filter nodes retrieved using relationships.

  • Specify multiple MATCH patterns.

  • Specify multiple MATCH clauses.

Write Cypher WHERE clauses to test:

  • Equality.

  • Multiple values.

  • Ranges.

  • Labels.

  • Existence of a property.

  • String values.

  • Regular expressions.

  • Patterns in the graph.

  • Inclusion in a list.

Let’s check out the reference card again to see the covered topics.

Class activity#

Note

Check worksheet 6 in canvas for detailed instructions on activity and submission instructions.

  • Load movie data to your Neo4j

  • Practice CQL.

Deleting Nodes and Relationships (OPTIONAL)#

MATCH (p:Person)
WHERE p.name = 'Jane Doe'
DELETE p

Tip

The most efficient way to delete a node and its corresponding relationships is to specify DETACH DELETE. When you specify DETACH DELETE for a node, the relationships to and from the node are deleted, then the node is deleted.

Can you try deleting this node? Can you delete it? Why can’t you?

MATCH (p:Person)
WHERE p.name = 'Liam Neeson'
DELETE p
MATCH (p:Person)
WHERE p.name = 'Liam Neeson'
DETACH DELETE p

More on CREATE (OPTIONAL- Additional information)#

We already saw about creating nodes and relationships. This section shows you that you can create relationships with existing nodes.

To demonstrate, let’s create some nodes.

CREATE (m:Movie:Action {title: ' Batman Begins'})
RETURN m.title

CREATE
(:Person {name: 'Michael Caine', born: 1933}),
(:Person {name: 'Liam Neeson', born: 1952}),
(:Person {name: 'Katie Holmes', born: 1978}),
(:Person {name: 'Benjamin Melniker', born: 1913})

If you want to add more labels to a node, you can query for the node you want to add, and after that, use SET to set the label.

Syntax for SET

SET x:Label // adding one label to node referenced by the variable x
SET x:Label1:Label2 // adding two labels to node referenced by the variable x
MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
SET m:Fantasy
RETURN labels(m)

Syntax: Removing labels from a node

// remove the label from the node referenced by the variable x
REMOVE x:Label
// remove the two labels from the node referenced by the variable x
REMOVE x:Label1, x:Label2
MATCH (m:Action)
REMOVE m:Action, m:Fantasy
RETURN labels(m)

Syntax: Adding properties to a node

SET x.propertyName = value
SET x.propertyName1 = value1 , x.propertyName2 = value2
SET x = {propertyName1: value1, propertyName2: value2}
SET x += {propertyName1: value1, propertyName2: value2}
MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
SET m.released = 2005, m.lengthInMinutes = 140
RETURN m
MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
SET m = {title: 'Batman Begins',
released: 2005,
lengthInMinutes: 140,
videoFormat: 'DVD',
grossMillions: 206.5}
RETURN m

Retrieving properties of a node

MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
RETURN properties(m)

Updating properties - JSON-style

MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
SET m += { grossMillions: 300,
awards: 66}
RETURN m

Syntax: Removing properties from a node

REMOVE x.propertyName
SET x.propertyName = null
MATCH (m:Movie)
WHERE m.title = 'Batman Begins'
SET m.grossMillions = null
REMOVE m.videoFormat
RETURN m

Creating Relationships

MATCH (a:Person), (m:Movie)
WHERE a.name = 'Katie Holmes' AND m.title = 'Batman Begins'
CREATE (a)-[rel:ACTED_IN {roles: ['Rachel','Rachel Dawes']}->(m)
RETURN a.name, rel, m.title

The SET and REMOVE syntax is the same as what we discussed with nodes.

MATCH (a:Person), (m:Movie)
WHERE a.name = 'Christian Bale' AND m.title = 'Batman Begins'
CREATE (a)-[rel:ACTED_IN]->(m)
SET rel.roles = ['Bruce Wayne','Batman']
RETURN a, rel, m
MATCH (a:Person)-[rel:ACTED_IN]->(m:Movie)
WHERE a.name = 'Christian Bale' AND m.title = 'Batman Begins'
REMOVE rel.roles
RETURN a, rel, m