Getting Started with Neo4j in .NET with Neo4jClient Library

I have been looking into Neo4j, a graph database, for a while and here is what impressed me the most while trying to work with it through the Neo4jClient .NET library.
13 December 2015
7 minutes read

I am really in love with the side project I am working on now. It is broken down to little "micro" applications (a.k.a. microservices), uses multiple data storage technologies and being brought together through Docker. As a result, the entire solution feels very natural, not-restricted and feels so manageable.

One part of this solution requires to answer a question which involves going very deep inside the data hierarchy. To illustrate what I mean, have a look at the below graph:

movies-with-only-agency-employees-2

Here, we have an agency which has acquired some actors. Also, we have some movies which employed some actors. You can model this in various data storage systems in various ways but the question I want to answer is the following: "What are the movies which employed all of its actors from Agency-A?". Even thinking about the query you would write in T-SQL is enough to melt your brain for this one. It doesn’t mean that SQL Server, MySQL, etc. are bad data storage systems. It’s just that this type of questions are not among those data storage systems' strengths.

Enters: Neo4j

Neo4j is an open-source graph database implemented in Java and accessible from software written in other languages using the Cypher query language through a transactional HTTP endpoint (Wikipedia says). In Neo4j, your data set consists of nodes and relationships between these nodes which you can interact with through the Cypher query language. Cypher is a very powerful, declarative, SQL-inspired language for describing patterns in graphs. The biggest thing that stands out when working with Cypher is the relationships. Relationships are first class citizens in Cypher. Consider the following Cypher query which is brought from the movie sample in Neo4j web client:

You can bring up the this movie sample by just running ":play movie graph" from the Neo4j web client and walk through it.

MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies

This will list all Tom Hanks movies. However, when you read it from left to right, you will pretty much understand what it will do anyway. The interesting part here is the ACTED_IN relationship inside the query. You may think at this point that this is not a big deal as it can probably translate the below T-SQL query:

SELECT * FROM Movies m
INNER JOIN MovieActors ma ON ma.MovieId = m.Id
WHERE ma.ActorId = 1;

However, you will start seeing the power as the questions get interesting. For example, let’s find out Tom Hanks’ co-actors from the every movie he acted in (again, from the same sample):

MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name

It’s just mind-blowingly complicated to retrieve this from a relational database but with Cypher, it is dead easy. You can start to see that it’s all about building up nodes and declaring the relationships to get the answer to your question in Neo4j.

Neo4j in .NET

As Neo4j communicates through HTTP, you can pretty much find a client implementation in every ecosystem and .NET is not an exception. Amazing people from Readify is maintaining Neo4jClient OSS project. It’s extremely easy to use this and the library has a very good documentation. I especially liked the part where they have documented the thread safety concerns of GraphClient. It is the first thing I wanted to find out and there it was.

Going back to my example which I mentioned at the beginning of this post, I tried to handle this through the .NET Client. Let’s walk through what I did.

You can find the below sample under my DotNetSamples GitHub repository.

First, I initiated the GraphClient and made some adjustments:

var client = new GraphClient(new Uri("http://localhost:7474/db/data"), "neo4j", "1234567890")
{
    JsonContractResolver = new CamelCasePropertyNamesContractResolver()
};

client.Connect();

I started with creating the agency.

var agencyA = new Agency { Name = "Agency-A" };
client.Cypher
    .Create("(agency:Agency {agencyA})")
    .WithParam("agencyA", agencyA)
    .ExecuteWithoutResultsAsync()
    .Wait();

Next is to create the actors and ACQUIRED relationship between the agency and some actors (in below case, only the odd numbered actors):

for (int i = 1; i <= 5; i++)
{
    var actor = new Person { Name = $"Actor-{i}" };

    if ((i % 2) == 0)
    {
        client.Cypher
            .Create("(actor:Person {newActor})")
            .WithParam("newActor", actor)
            .ExecuteWithoutResultsAsync()
            .Wait();
    }
    else
    {
        client.Cypher
            .Match("(agency:Agency)")
            .Where((Agency agency) => agency.Name == agencyA.Name)
            .Create("agency-[:ACQUIRED]->(actor:Person {newActor})")
            .WithParam("newActor", actor)
            .ExecuteWithoutResultsAsync()
            .Wait();
    }
}

Then, I have created the movies :

char[] chars = Enumerable.Range('a', 'z' - 'a' + 1).Select(i => (Char)i).ToArray();
for (int i = 0; i < 3; i++)
{
    var movie = new Movie { Name = $"Movie-{chars[i]}" };

    client.Cypher
        .Create("(movie:Movie {newMovie})")
        .WithParam("newMovie", movie)
        .ExecuteWithoutResultsAsync()
        .Wait();
}

Lastly, I have related existing movies and actors through the EMPLOYED relationship.

client.Cypher
    .Match("(movie:Movie)", "(actor1:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-a")
    .AndWhere((Person actor1) => actor1.Name == "Actor-1")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor1), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

client.Cypher
    .Match("(movie:Movie)", "(actor1:Person)", "(actor3:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-b")
    .AndWhere((Person actor1) => actor1.Name == "Actor-1")
    .AndWhere((Person actor3) => actor3.Name == "Actor-3")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor1), (movie)-[:EMPLOYED]->(actor3), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

client.Cypher
    .Match("(movie:Movie)", "(actor2:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-c")
    .AndWhere((Person actor2) => actor2.Name == "Actor-2")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor2), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

When I run this, I now have the data set that I can play with. I have jumped back to web client and ran the below query to retrieve the relations:

MATCH (agency:Agency)-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)
RETURN agency, actor, movie

One of the greatest features of the web client is that you can view your query result in a graph representation. How cool is that? You can exactly see the smilarity between the below result and the graph I have put together above:

image

Of course, we can run the same above query through the .NET client and grab the results:

var results = client.Cypher
    .Match("(agency:Agency)-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)")
    .Return((agency, actor, movie) => new
    {
        Agency = agency.As<Agency>(),
        Actor = actor.As<Person>(),
        Movie = movie.As<Movie>()
    }).Results;

Going Beyond

However, how can we answer my "What are the movies which employed all of its actors from Agency-A?" question? As I am very new to Neo4j, I struggled a lot with this. In fact, I was not even sure whether this was possible to do in Neo4J. I asked this as a question in Stackoverflow (as every stuck developer do) and Christophe Willemsen gave an amazing answer which literally blew my mind. I warn you now as the below query is a bit complex and I am still going through it piece by piece to try to understand it but it does the trick:

MATCH (agency:Agency { name:"Agency-A" })-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)
WITH DISTINCT movie, collect(actor) AS actors
MATCH (movie)-[:EMPLOYED]->(allemployees:Person)
WITH movie, actors, count(allemployees) AS c
WHERE c = size(actors)
RETURN movie.name

The result is as you would expect:

image

Still Dipping My Toes

I am hooked but this doesn’t mean that Neo4j is the solution to my problems. I am still evaluating it by implementing a few features on top of it. There are a few parts which I haven’t been able to answer exactly yet:

  • How does this scale with large data sets?
  • Can I shard the data across servers?
  • Want are the hosted options?
  • What is the story on geo location queries?

However, the architecture I have in my solution allows me to evaluate this type of technologies. At worst case scenario, Neo4j will not work for me but I will be able to replace it with something else (which I doubt that it will be the case).

Resources