Neo4j - Topics, Resources and Quick Notes

In this blog, I'll try to put together the Neo4j topics to read and the resources for it. I'll hope that it can act as a trigger to learn Neo4j and to do a quick recap as and when desired.

Important Resources

Books

Learning Neo4j by Rik Van Bruggen.

Definitive Guide - Graph Databases for RDBMS developers by Michael Hunger

Neo4j Docs
http://neo4j.com/docs/stable/introduction.html

Tutorials
https://www.lynda.com/Neo4j-tutorials/Up-Running-Neo4j/155604-2.html
http://www.tutorialspoint.com/neo4j/
https://www.airpair.com/neo4j/posts/getting-started-with-neo4j-and-cypher
http://technoracle.blogspot.in/2012/04/getting-started-with-neo4j-beginners.html

Videos

https://www.youtube.com/watch?v=UJ81zWBMguc&list=PLAWPhrZnH759YHRieMBzsQRvr56JcYx5l
https://www.devcasts.io/tag/neo4j/

Sample source code

https://github.com/imsatyam/neo4j_basic

Graph and Graph Theory

History

7 Bridges of Konigsberg by Euler
Eulerian path

Field of Study

Social Science: Interaction, influence and idea sharing between people.
Biological: graphs describe metabolic pathways. Link
Computer Science: Path finding algorithms to analyze effect of change in design of artifacts.
Flow: Flow network, Maximum flow problem
Route Problems: Hamiltonian path problem, Route inspection problem, Shortest-path problem, Travelling salesman problem, dijkstra, A*

Graph Database

online DBMS with CRUD operations working on graph data model.
Generally built for OLTP systems.
Engineered with transactional integrity and operational availability in mind.
Properties

Graph storage:

native storage: defined to store and manage graph.
relational or OO storage. It is obviously slower.

Graph Processing Engine: Native graph processing a.k.a. index free adjacency is the most efficient way to process graphs and nodes physically point to each other.

Advantages

Minutes-to-millisecond performance.
Accelerated development cycles.
Extreme business responsiveness.
Enterprise ready (ACID, availability, horizontal read scalability, Storage of billion entities)

Common Use Cases

Fraud Detection, Real-time recommendation engines, Master Data Management, Identity and Access Management, Graph based search

Where not to use?

Large set-oriented queries - RDBMS is better.
Simple aggregate-oriented queries - Document database is better.

Neo Databases

Network-oriented (ordered in complex n/w and deep trees) and semi-structured data.
Neo is an embedded persistence engine.

Installation and Getting Started

http://neo4j.com/docs/stable/server-installation.html

Data Model http://neo4j.com/developer/guide-data-modeling/

Best Practices

Design for query ability.

as an employee, I want to know who in the company I work for has similar skills to me so that we can exchange knowledge

Align relationship with use cases.
Look for n-ary relationship.
Granulate nodes.
Use in-graph indexes when appropriate

Pitfalls

Rich properties
Node representing multiple concepts e.g. country, language and currency.
Unconnected graph.
Dense node pattern. - Madonna and her fans problem.

Cypher Query language

This is a vast topic in itself.
Tried to cover more of it at http://www.i-satyam.blogspot.in/2016/03/neo4j-cypher-query-language.html

References

Capabilities

Data Security: Neo4j does not deal with data encryption explicitly, but supports all means built into the Java programming language and the JVM to protect data by encrypting it before storing.
Data Integrity: transactional architecture ensures that data is protected and provides for fast recovery from an unexpected failure.
Data Integration:Event based synchronization, Periodic synchronization, Periodic full export/import data.
Availability and Reliability: Cold Spare, Hot Spare, High Availability Cluster
Capacity: File Size, Read Speed, Write Speed, Data Size

Transaction Management

read-committed isolation level
Neo4j Java API enables explicit locking of nodes and relationships which gives the opportunity to simulate the effects of higher levels of isolation by obtaining and releasing locks explicitly.
Default Locking Behavior:

When adding, changing or removing a property on a node or relationship a write lock will be taken on the specific node or relationship.
When creating or deleting a node a write lock will be taken for the specific node.
When creating or deleting a relationship a write lock will be taken on the specific relationship and both its nodes.

Handling Deadlock

TransactionTemplate class

TransactionTemplate template = new TransactionTemplate(  ).retries( 5 ).backoff( 3, TimeUnit.SECONDS );

We can also use our own retry-loop code.

Creating unique nodes

Single Threaded Environment ensures it.
Unique constraints and cypher can also help with this.
Uniqueness is guaranteed by using a legacy index in case of putIfAbsent.

And there is a lot more to learn! Hope this kick-starts the learning.

Pages

Potpourri

Neo4j - Topics, Resources and Quick Notes

Satyam Shandilya

No comments:

Post a Comment

About Me

Popular Posts

Labels Cloud

Social

Instagram