Neo4j - Cypher Query Language

Cypher is a declarative graph query language that allows for expressive and efficient querying and updating of the graph store.It is designed to be a humane query language, suitable for both developers and operations professionals.

- clarity on what to retrieve from a graph, not on how to retrieve it.

- constructs are based on English prose and neat iconography.

- Design goal: make simple things easy, and the complex things possible.

Important Resources
Neo4j Docs: http://neo4j.com/docs/stable/cypher-query-lang.html

Tutorials:

http://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm

https://www.airpair.com/neo4j/posts/getting-started-with-neo4j-and-cypher

http://www.lynda.com/Neo4j-tutorials/introduction-Cypher-query-language/155604/175275-4.html

Videos:

https://www.youtube.com/watch?v=UJ81zWBMguc&list=PLAWPhrZnH759YHRieMBzsQRvr56JcYx5l

Introduction

Structure

Borrows structure from SQL - queries are built using various clauses.
Clauses are chained together, and the they feed intermediate result sets between each other.
query which finds a user called John and John’s friends


MATCH (john {name: 'John'})-[:friend]->()-[:friend]->(fof)

RETURN john.name, fof.name

Read and Update

Cypher query part can’t both match and update the graph at the same time.
If query only performs read, cypher will be lazy and not actually match the pattern until results are asked.
In update queries, all reading will be done before a write actually happens.
WITH statement - barrier between a plan and the finished execution of that plan. Here we specify how we want to aggregate data before filtering happens.

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
WITH n, count(friend) as friendsCount
WHERE friendsCount > 3
RETURN n, friendsCount

MATCH (n {name: 'John'})-[:FRIEND]-(friend)
WITH n, count(friend) as friendsCount
SET n.friendCount = friendsCount
RETURN n.friendsCount

Returning Data

RETURN clause - Period symbol at the end of the query.

Has three sub-clauses - SKIP/LIMIT and ORDER BY

Transactions

Any query that updates the graph, runs in a transaction.
Existing transaction in running context is used. If absent, new one is created.
updating query will always either fully succeed, or not succeed at all.

Uniqueness

While matching pattern, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
Example: looking for a user’s friends of friends should not return said user. It happens in Facebook.

MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-()-[r2:FRIEND]-(friend_of_a_friend)
RETURN friend_of_a_friend.name AS fofName

If it is desired that we want to have the user name returned as well, we can spread matching over multiple MATCH clauses.

MATCH (user:User { name: 'Adam' })-[r1:FRIEND]-(friend)
MATCH (friend)-[r2:FRIEND]-(friend_of_a_friend)
RETURN friend_of_a_friend.name AS fofName

Parameters

Cypher supports querying with parameters. It works with use of { }.
Parameters can be used for:

Literals and Expressions in WHERE clause. WHERE n.name = { name }
Start clause.
index queries and
Node/relationship ids.

Examples:

WHERE n.name =~ { regex }
WHERE n.name STARTS WITH { name }
CREATE ({ props })
SET n = { props }
SKIP { s } LIMIT { l }

Syntax

Values: Supported values are all distinct and are as under:

Numeric
String
Boolean
Nodes
Relationship
Paths
Maps from Strings to other values
Collection of any other type of value.

Expressions

Escape sequences.
Case Expressions

Simple case

CASE test
WHEN value THEN result
[WHEN ...]
[ELSE default]
END

Generic case

CASE
WHEN predicate THEN result
[WHEN ...]
[ELSE default]
END

Identifiers

When we reference parts of a pattern or a query, we name them. These names are called identifiers.
If multiple query parts are chained together using WITH, identifiers have to be listed in the WITH clause to be carried over to the next part.

Operators

Mathematical: +, -, *, / and %, ^
Comparison: =, <>, <, >, <=, >=, IS NULL, IS NOT NULL, STARTS WITH, ENDS WITH and CONTAINS
Boolean: AND, OR, XOR, NOT
String: +, =~ (regular expression matching)
Collections: + (Concatenation)

Comments: //this is a comment
Patterns: Describes the shape of data we want to fetch.

Pattern for nodes: e.g. (a)
Pattern for related nodes: e.g. (a)-->(b), (a)-->(b)<--(c), (a)-->()<--(c)
Labels: e.g. (a:User:Admin)-->(b)
Specifying properties: e.g. (a { name: "Andres", sport: "Brazilian Ju-Jitsu" })
Describing relationship: e.g. (a)-[r:TYPE1|TYPE2]->(b)
Path: e.g. p = (a)-[*3..5]->(b)

Collections: created by using brackets and separating the elements with commas

RETURN range(0,10)[3] //returns 3
RETURN range(0,10)[-3] //returns 8
RETURN range(0,10)[0..3] //returns [0,1,2]
RETURN range(0,10)[0..-5] //returns [0,1,2,3,4,5]
RETURN range(0,10)[-5..]
RETURN range(0,10)[..4]

Working with NULL: a missing unknown value

expression NULL = NULL yields NULL and not TRUE
Logical operation with NULL: treated as unknown.
IN operator: NULL IN [] is FALSE. Otherwise we always get NULL

General Clauses

RETURN

return node, return relationship
return all elements (RETURN *)
Column alias (RETURN a.age AS age)
Optional properties (RETURN a.age)
Other expressions e.g. RETURN a.age > 30, "I'm a literal",(a)-->()
Unique results e.g. RETURN DISTINCT b

ORDER BY: sub-clause following RETURN or WITH
LIMIT: e.g. LIMIT 3, LIMIT toInt(3 * rand())+ 1
SKIP: e.g. SKIP 3, SKIP 1 LIMIT 2, SKIP toInt(3*rand())+ 1
WITH: manipulate the output before it is passed on to the following query parts

Filter using ORDER BY, LIMIT and aggregate functions

MATCH (david { name: "David" })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson

MATCH (n)
WITH n
ORDER BY n.name DESC LIMIT 3
RETURN collect(n.name)

UNWIND: expands a collection into a sequence of rows.

```
UNWIND[1,2,3] AS x
RETURN x
```

UNION: combines the result of multiple queries.

Allow duplicates

MATCH (n:Actor)
RETURN n.name AS name
UNION ALL MATCH (n:Movie)
RETURN n.title AS name

Do not allow duplicates

MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name

USING: influences the decisions of the planner when building an execution plan for a query

Query using one or more index hints

MATCH (m:German)-->(n:Swede)
USING INDEX m:German(surname)
USING INDEX n:Swede(surname)
WHERE m.surname = 'Plantikow' AND n.surname = 'Taylor'
RETURN m

Label scan

MATCH (m:German)
USING SCAN m:German
WHERE m.surname = 'Plantikow'
RETURN m

Reading Clauses

MATCH: primary way of getting data into the current set of bindings

-- is used without regard for relationship.
--> mentions that relationship is interesting.

OPTIONAL MATCH

searches for the pattern described in it, while using NULLs for missing parts of the pattern.
e.g. OPTIONAL MATCH (a)-->(x), OPTIONAL MATCH (a)-[r:ACTS_IN]->()

WHERE

adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause or filters the results of a WITH clause.

START: Find starting points through legacy indexes.

```
START n=node:nodes(name = "A")
RETURN n
```

Aggregation

COUNT: count the number of rows

count nodes
Group count relationship types
count entities
count not null values

statistics

sum
average
percentileDisc: e.g. RETURN percentileDisc(n.property, 0.5) //0.5 = median
percentileCont
stdev
max
min

collect: collects all the values into a list. It will ignore NULLs.
DISTINCT

Writing Clauses

CREATE
MERGE: use of ON CREATE and ON MATCH
SET: update labels on nodes or properties on nodes and relationships.
DELETE - deletes node or relationship.
REMOVE - removes a property or label.
FOREACH - Iterates the nodes

MATCH p =(begin)-[*]->(END )
WHERE begin.name='A' AND END .name='D'
FOREACH (n IN nodes(p)| SET n.marked = TRUE )

CREATE UNIQUE: mix of MATCH and CREATE — it will match what it can, and create what is missing.
PERIODIC COMMIT: USING PERIODIC COMMIT 500

Functions

Predicates: boolean functions that return true or false for a given set of input.

ALL: syntax - ALL(identifier in collection WHERE predicate)
ANY: syntax - ANY(identifier in collection WHERE predicate)
NONE: syntax - NONE(identifier in collection WHERE predicate)
SINGLE: syntax - SINGLE(identifier in collection WHERE predicate)
EXISTS: syntax - EXISTS( pattern-or-property )

Scalar functions

TYPE: returns the string representation of relationship type. TYPE( relationship )
COALESCE: returns first non-NULL values in the list of expressions passed to it.

COALESCE( expression [, expression]* )

STARTNODE: returns the start node of a relationship

MATCH (x:foo)-[r]-()
RETURN startNode(r)

ENDNODE: returns the end node of a relationship
others: SIZE, LENGTH, ID, HEAD, LAST, TIMESTAMP, TOINT, TOFLOAT

Collection functions

NODES: returns all nodes in the path

MATCH p=(a)-->(b)-->(c)
WHERE a.name='Alice' AND c.name='Eskil'
RETURN nodes(p)

RELATIONSHIPS: returns all relationships in the path

MATCH p=(a)-->(b)-->(c)
WHERE a.name='Alice' AND c.name='Eskil'
RETURN relationships(p)

KEYS: returns a collection of propety names attached to a node/relationship

MATCH (a)
WHERE a.name='Alice'
RETURN keys(a)

LABELS: returns a collection of labels attached to a node

MATCH (a)
WHERE a.name='Alice'
RETURN labels(a)

EXTRACT: returns a collection of properties

syntax:EXTRACT( identifier in collection | expression )

MATCH p=(a)-->(b)-->(c)
WHERE a.name='Alice' AND b.name='Bob' AND c.name='Daniel'
RETURN extract(n IN nodes(p)| n.age) AS extracted

FILTER: returns all the elements in a collection that comply to a predicate

syntax: FILTER(identifier in collection WHERE predicate)

MATCH (a)
WHERE a.name='Eskil'
RETURN a.array, filter(x IN a.array WHERE size(x)= 3)

TAIL: returns all but the first element in a collection.
RANGE: returns numerical values in range with a non zero step value.

syntax: RANGE( start, end [, step] )
```
RETURN range(0,10), range(2,18,3)
```

REDUCE: case of an accumulator

syntax: REDUCE( accumulator = initial, identifier in collection | expression )

MATCH p=(a)-->(b)-->(c)
WHERE a.name='Alice' AND b.name='Bob' AND c.name='Daniel'
RETURN reduce(totalAge = 0, n IN nodes(p)| totalAge + n.age) AS reduction

Mathematical functions:operate on numerical expressions only, and will return an error if used on any other values
String functions: operate on string expressions only, and will return an error if used on any other values. TOSTRING() accepts numbers as well.

Schema

Neo4j 2.0 introduced an optional schema for the graph, based around the concept of labels.
Labels are used in the specification of indexes, and for defining constraints on the graph.
Together, indexes and constraints are the schema of the graph.
Indexes

Create an index
```
CREATE INDEX ON :Person(name)
```
Drop an index
```
DROP INDEX ON :Person(name)
```
Use an index

In WHERE clause with equality and also with inequality

MATCH (person:Person)
WHERE person.name > 'B'
RETURN person

with IN

MATCH (person:Person)
WHERE person.name IN ['Andres', 'Mark']
RETURN person

with STARTS WITH

MATCH (person:Person)
WHERE person.name STARTS WITH 'And'
RETURN person

while checking for the existence of a property

MATCH (p:Person)
WHERE HAS (p.name)
RETURN p

Constraints

enforces data integrity with the use of constraints.
Constraints can be applied to either nodes or relationships.
Unique Node property constraints

CREATE CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

DROP CONSTRAINT ON (book:Book) ASSERT book.isbn IS UNIQUE

Node Property Existence constraint

CREATE CONSTRAINT ON (book:Book) ASSERT exists(book.isbn)

DROP CONSTRAINT ON (book:Book) ASSERT exists(book.isbn)

Relationship property existence

CREATE CONSTRAINT ON ()-[like:LIKED]-() ASSERT exists(like.day)

Statistics

statistical information that Neo4j keeps is:

The number of nodes with a certain label.
Selectivity per index.
The number of relationships by type.
The number of relationships by type, ending or starting from a node with a specific label.

We can configure some options and also sample using shell.

Query Tuning

Each Cypher query gets optimized and transformed into an execution plan by the Cypher execution engine. To minimize the resources used for this, make sure to use parameters instead of literals when possible.
How are queries executed?

Each query is turned into an execution plan by the execution planner.
Execution planning strategies:

Rule: This planner has rules that are used to produce execution plans. The planner considers available indexes, but does not use statistical information to guide the query compilation.
Cost: This planner uses the statistics service in Neo4j to assign cost to alternative plans and picks the cheapest one

Can be configured by using query.planner.version (2.2 version has cost planner for some queries)
or by adding CYPHER planner=cost or CYPHER planner=rule before the query.

How to profile a query?

We can look at the execution plan in following ways:

EXPLAIN: Append Cypher statement with EXPLAIN. It does not run the statement but shows the plan.
PROFILE: If you want to run the statement and see which operators are doing most of the work, use PROFILE. This will run your statement and keep track of how many rows pass through each operator, and how much each operator needs to interact with the storage layer to retrieve the necessary data

Execution Plans

Neo4j breaks down the work of executing a query into small pieces called operators. Each operator is responsible for a small part of the overall query. The operators are connected together in a pattern called a execution plan.
Rows: The number of rows that the operator produced. Only available if the query was profiled.
EstimatedRows: cost-based compiler uses this estimate to choose a suitable execution plan.
DBHits: Each operator will ask the Neo4j storage engine to do work such as retrieving or updating data. A database hit is an abstract unit of this storage engine work.
http://www.kennybastani.com/2014/07/understanding-how-neo4j-cypher-queries.html
http://thought-bytes.blogspot.in/2013/01/optimizing-neo4j-cypher-queries.html

Pages

Potpourri

Neo4j - Cypher Query Language

Satyam Shandilya

No comments:

Post a Comment

About Me

Popular Posts

Labels Cloud

Social

Instagram