In this blog, I'll try to put together the Neo4j topics to read and the resources for it. I'll hope that it can act as a trigger to learn Neo4j and to do a quick recap as and when desired.
Important Resources
Books
Learning Neo4j by Rik Van Bruggen.
Definitive Guide - Graph Databases for RDBMS developers by Michael Hunger
Tutorials
https://www.lynda.com/Neo4j-tutorials/Up-Running-Neo4j/155604-2.html
http://www.tutorialspoint.com/neo4j/
https://www.airpair.com/neo4j/posts/getting-started-with-neo4j-and-cypher
http://technoracle.blogspot.in/2012/04/getting-started-with-neo4j-beginners.html
Videos
https://www.youtube.com/watch?v=UJ81zWBMguc&list=PLAWPhrZnH759YHRieMBzsQRvr56JcYx5l
https://www.devcasts.io/tag/neo4j/
https://www.devcasts.io/tag/neo4j/
Sample source code
- Graph and Graph Theory
- History
- 7 Bridges of Konigsberg by Euler
- Eulerian path
- Field of Study
- Social Science: Interaction, influence and idea sharing between people.
- Biological: graphs describe metabolic pathways. Link
- Computer Science: Path finding algorithms to analyze effect of change in design of artifacts.
- Flow: Flow network, Maximum flow problem
- Route Problems: Hamiltonian path problem, Route inspection problem, Shortest-path problem, Travelling salesman problem, dijkstra, A*
- Graph Database
- online DBMS with CRUD operations working on graph data model.
- Generally built for OLTP systems.
- Engineered with transactional integrity and operational availability in mind.
- Properties
- Graph storage:
- native storage: defined to store and manage graph.
- relational or OO storage. It is obviously slower.
- Graph Processing Engine: Native graph processing a.k.a. index free adjacency is the most efficient way to process graphs and nodes physically point to each other.
- Advantages
- Minutes-to-millisecond performance.
- Accelerated development cycles.
- Extreme business responsiveness.
- Enterprise ready (ACID, availability, horizontal read scalability, Storage of billion entities)
- Common Use Cases
- Fraud Detection, Real-time recommendation engines, Master Data Management, Identity and Access Management, Graph based search
- Where not to use?
- Large set-oriented queries - RDBMS is better.
- Simple aggregate-oriented queries - Document database is better.
- Neo Databases
- Network-oriented (ordered in complex n/w and deep trees) and semi-structured data.
- Neo is an embedded persistence engine.
- Installation and Getting Started
- Data Model http://neo4j.com/developer/guide-data-modeling/
- Best Practices
- Design for query ability.
- as an employee, I want to know who in the company I work for has similar skills to me so that we can exchange knowledge
- Align relationship with use cases.
- Look for n-ary relationship.
- Granulate nodes.
- Use in-graph indexes when appropriate
- Pitfalls
- Rich properties
- Node representing multiple concepts e.g. country, language and currency.
- Unconnected graph.
- Dense node pattern. - Madonna and her fans problem.
- Cypher Query language
- This is a vast topic in itself.
- Tried to cover more of it at http://www.i-satyam.blogspot.in/2016/03/neo4j-cypher-query-language.html
- References
- Capabilities
- Data Security: Neo4j does not deal with data encryption explicitly, but supports all means built into the Java programming language and the JVM to protect data by encrypting it before storing.
- Data Integrity: transactional architecture ensures that data is protected and provides for fast recovery from an unexpected failure.
- Data Integration:Event based synchronization, Periodic synchronization, Periodic full export/import data.
- Availability and Reliability: Cold Spare, Hot Spare, High Availability Cluster
- Capacity: File Size, Read Speed, Write Speed, Data Size
- Transaction Management
- read-committed isolation level
- Neo4j Java API enables explicit locking of nodes and relationships which gives the opportunity to simulate the effects of higher levels of isolation by obtaining and releasing locks explicitly.
- Default Locking Behavior:
- When adding, changing or removing a property on a node or relationship a write lock will be taken on the specific node or relationship.
- When creating or deleting a node a write lock will be taken for the specific node.
- When creating or deleting a relationship a write lock will be taken on the specific relationship and both its nodes.
- Handling Deadlock
- TransactionTemplate class
- We can also use our own retry-loop code.
- Creating unique nodes
- Single Threaded Environment ensures it.
- Unique constraints and cypher can also help with this.
- Uniqueness is guaranteed by using a legacy index in case of putIfAbsent.
And there is a lot more to learn! Hope this kick-starts the learning.
0 comments:
Post a Comment