Getting Started with Neo4j in .NET with Neo4jClient Library

I have been looking into Neo4j, a graph database, for a while and here is what impressed me the most while trying to work with it through the Neo4jClient .NET library.
2015-12-13 19:07
Tugberk Ugurlu


I am really in love with the side project I am working on now. It is broken down to little "micro" applications (a.k.a. microservices), uses multiple data storage technologies and being brought together through Docker. As a result, the entire solution feels very natural, not-restricted and feels so manageable.

One part of this solution requires to answer a question which involves going very deep inside the data hierarchy. To illustrate what I mean, have a look at the below graph:

movies-with-only-agency-employees-2

Here, we have an agency which has acquired some actors. Also, we have some movies which employed some actors. You can model this in various data storage systems in various ways but the question I want to answer is the following: "What are the movies which employed all of its actors from Agency-A?". Even thinking about the query you would write in T-SQL is enough to melt your brain for this one. It doesn’t mean that SQL Server, MySQL, etc. are bad data storage systems. It’s just that this type of questions are not among those data storage systems' strengths.

Enters: Neo4j

Neo4j is an open-source graph database implemented in Java and accessible from software written in other languages using the Cypher query language through a transactional HTTP endpoint (Wikipedia says). In Neo4j, your data set consists of nodes and relationships between these nodes which you can interact with through the Cypher query language. Cypher is a very powerful, declarative, SQL-inspired language for describing patterns in graphs. The biggest thing that stands out when working with Cypher is the relationships. Relationships are first class citizens in Cypher. Consider the following Cypher query which is brought from the movie sample in Neo4j web client:

You can bring up the this movie sample by just running ":play movie graph" from the Neo4j web client and walk through it.

MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies

This will list all Tom Hanks movies. However, when you read it from left to right, you will pretty much understand what it will do anyway. The interesting part here is the ACTED_IN relationship inside the query. You may think at this point that this is not a big deal as it can probably translate the below T-SQL query:

SELECT * FROM Movies m
INNER JOIN MovieActors ma ON ma.MovieId = m.Id
WHERE ma.ActorId = 1;

However, you will start seeing the power as the questions get interesting. For example, let’s find out Tom Hanks’ co-actors from the every movie he acted in (again, from the same sample):

MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name

It’s just mind-blowingly complicated to retrieve this from a relational database but with Cypher, it is dead easy. You can start to see that it’s all about building up nodes and declaring the relationships to get the answer to your question in Neo4j.

Neo4j in .NET

As Neo4j communicates through HTTP, you can pretty much find a client implementation in every ecosystem and .NET is not an exception. Amazing people from Readify is maintaining Neo4jClient OSS project. It’s extremely easy to use this and the library has a very good documentation. I especially liked the part where they have documented the thread safety concerns of GraphClient. It is the first thing I wanted to find out and there it was.

Going back to my example which I mentioned at the beginning of this post, I tried to handle this through the .NET Client. Let’s walk through what I did.

You can find the below sample under my DotNetSamples GitHub repository.

First, I initiated the GraphClient and made some adjustments:

var client = new GraphClient(new Uri("http://localhost:7474/db/data"), "neo4j", "1234567890")
{
    JsonContractResolver = new CamelCasePropertyNamesContractResolver()
};

client.Connect();

I started with creating the agency.

var agencyA = new Agency { Name = "Agency-A" };
client.Cypher
    .Create("(agency:Agency {agencyA})")
    .WithParam("agencyA", agencyA)
    .ExecuteWithoutResultsAsync()
    .Wait();

Next is to create the actors and ACQUIRED relationship between the agency and some actors (in below case, only the odd numbered actors):

for (int i = 1; i <= 5; i++)
{
    var actor = new Person { Name = $"Actor-{i}" };

    if ((i % 2) == 0)
    {
        client.Cypher
            .Create("(actor:Person {newActor})")
            .WithParam("newActor", actor)
            .ExecuteWithoutResultsAsync()
            .Wait();
    }
    else
    {
        client.Cypher
            .Match("(agency:Agency)")
            .Where((Agency agency) => agency.Name == agencyA.Name)
            .Create("agency-[:ACQUIRED]->(actor:Person {newActor})")
            .WithParam("newActor", actor)
            .ExecuteWithoutResultsAsync()
            .Wait();
    }
}

Then, I have created the movies :

char[] chars = Enumerable.Range('a', 'z' - 'a' + 1).Select(i => (Char)i).ToArray();
for (int i = 0; i < 3; i++)
{
    var movie = new Movie { Name = $"Movie-{chars[i]}" };

    client.Cypher
        .Create("(movie:Movie {newMovie})")
        .WithParam("newMovie", movie)
        .ExecuteWithoutResultsAsync()
        .Wait();
}

Lastly, I have related existing movies and actors through the EMPLOYED relationship.

client.Cypher
    .Match("(movie:Movie)", "(actor1:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-a")
    .AndWhere((Person actor1) => actor1.Name == "Actor-1")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor1), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

client.Cypher
    .Match("(movie:Movie)", "(actor1:Person)", "(actor3:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-b")
    .AndWhere((Person actor1) => actor1.Name == "Actor-1")
    .AndWhere((Person actor3) => actor3.Name == "Actor-3")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor1), (movie)-[:EMPLOYED]->(actor3), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

client.Cypher
    .Match("(movie:Movie)", "(actor2:Person)", "(actor5:Person)")
    .Where((Movie movie) => movie.Name == "Movie-c")
    .AndWhere((Person actor2) => actor2.Name == "Actor-2")
    .AndWhere((Person actor5) => actor5.Name == "Actor-5")
    .Create("(movie)-[:EMPLOYED]->(actor2), (movie)-[:EMPLOYED]->(actor5)")
    .ExecuteWithoutResultsAsync()
    .Wait();

When I run this, I now have the data set that I can play with. I have jumped back to web client and ran the below query to retrieve the relations:

MATCH (agency:Agency)-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)
RETURN agency, actor, movie

One of the greatest features of the web client is that you can view your query result in a graph representation. How cool is that? You can exactly see the smilarity between the below result and the graph I have put together above:

image

Of course, we can run the same above query through the .NET client and grab the results:

var results = client.Cypher
    .Match("(agency:Agency)-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)")
    .Return((agency, actor, movie) => new
    {
        Agency = agency.As<Agency>(),
        Actor = actor.As<Person>(),
        Movie = movie.As<Movie>()
    }).Results;

Going Beyond

However, how can we answer my "What are the movies which employed all of its actors from Agency-A?" question? As I am very new to Neo4j, I struggled a lot with this. In fact, I was not even sure whether this was possible to do in Neo4J. I asked this as a question in Stackoverflow (as every stuck developer do) and Christophe Willemsen gave an amazing answer which literally blew my mind. I warn you now as the below query is a bit complex and I am still going through it piece by piece to try to understand it but it does the trick:

MATCH (agency:Agency { name:"Agency-A" })-[:ACQUIRED]->(actor:Person)<-[:EMPLOYED]-(movie:Movie)
WITH DISTINCT movie, collect(actor) AS actors
MATCH (movie)-[:EMPLOYED]->(allemployees:Person)
WITH movie, actors, count(allemployees) AS c
WHERE c = size(actors)
RETURN movie.name

The result is as you would expect:

image

Still Dipping My Toes

I am hooked but this doesn’t mean that Neo4j is the solution to my problems. I am still evaluating it by implementing a few features on top of it. There are a few parts which I haven’t been able to answer exactly yet:

  • How does this scale with large data sets?
  • Can I shard the data across servers?
  • Want are the hosted options?
  • What is the story on geo location queries?

However, the architecture I have in my solution allows me to evaluate this type of technologies. At worst case scenario, Neo4j will not work for me but I will be able to replace it with something else (which I doubt that it will be the case).

Resources

Integration Testing with MongoDB with MongoDB.Testing Library

I have put together a library, MongoDB.Testing, which makes it easy to stand up a MongoDB server, create a random database and clean up the resources afterwards. Here is how you can start using it.
2015-12-05 21:06
Tugberk Ugurlu


Considering the applications we produce today (small, targeted, "micro" applications), I value integration tests way more than unit tests (along with acceptance tests). They provide much more realistic testing on your application with the only downside of being hard to pinpoint which part of your code is the problem when you have failures. I have been writing integration tests for the .NET based HTTP applications which use MongoDB as the data storage system on same parts and I pulled out a helper into library which makes it easy to stand up a MongoDB server, create a random database and clean up the resources afterwards. The library is called MongoDB.Testing and it’s on NuGet, GitHub. Usage is also pretty simple and there is also a a few samples I have put together.

Install the library into your testing project through NuGet:

Install-Package MongoDB.Testing -pre

Write a mongod.exe locator:

public class MongodExeLocator : IMongoExeLocator
{
    public string Locate()
    {
        return @"C:\Program Files\MongoDB\Server\3.0\bin\mongod.exe";
    }
}

Finally, integrate this into your tests:

[Test]
public async Task HasEnoughRating_Should_Throw_When_The_User_Is_Not_Found()
{
    using (MongoTestServer server = MongoTestServer.Start(27017, new MongodExeLocator()))
    {
        // ARRANGE
        var collection = server.Database.GetCollection<UserEntity>("users");
        var service = new MyCounterService(collection);
        await collection.InsertOneAsync(new UserEntity
        {
            Id = ObjectId.GenerateNewId().ToString(),
            Name = "foo",
            Rating = 23
        });

        // ACT, ASSERT
        Assert.Throws<InvalidOperationException>(
            () => service.HasEnoughRating(ObjectId.GenerateNewId().ToString()));
    }
}

That’s basically all. MongoTestServer.Start will do the following for you:

  • Start a mongod instance and expose it through the specified port.
  • Creates a randomly named MongoDB database on the started instance and exposes it through the MongoTestServer instance returned from MongoTestServer.Start method.
  • Cleans up the resources, kills the mongod.exe instance when the MongoTestServer instance is disposed.

If you are doing a similar sort of testing with MongoDB, give this a shot. I want to improve this based on the needs. So, make sure to file issues and send some lovely pull requests.

My Talk on Profiling .NET Server Applications from Umbraco UK Festival 2015

I was at Umbraco UK Festival 2015 in London a few weeks ago to give a talk on Profiling .NET Server Applications and the session is now available to watch.
2015-11-11 13:13
Tugberk Ugurlu


I was at Umbraco UK Festival 2015 in London a few weeks ago to give a talk on Profiling .NET Server Applications. It was a really great experience for me as this was my first time presenting on this topic which I love and enjoy very much. Also, the conference venue was a church which made it really interesting for a presentation and I would be lying if I tell you that I didn’t feel like a deacon up there on the stage :) The fantastic news is that all sessions were recorded and all of them are available to watch now including my Profiling .NET Server Applications talk:

You can find the slides under my Speaker Deck account and I also encourage you to download the free e-book which gives you 52 quick tips and tricks for .NET application performance:

image

Finally, here are some further resources to look at if the talk was interesting for you:

ASP.NET 5 Identity MongoDB Implementation

ASP.NET Identity will have a new version with ASP.NET 5 which is going to be version 3.0.0 and I gave it shot to implement ASP.NET Identity MongoDB data store.
2015-11-05 20:04
Tugberk Ugurlu


As with everything, ASP.NET Identity will have a new version with ASP.NET 5 which is going to be version 3.0.0. There are some changes on the interfaces but it’s not as drastic as others. By default, it provides Entity Framework implementation which I assume going to be compatible with any data storage system that can plug into Entity Framework (which is good). However, you need to provide a custom implementation if you want to support a data storage system which doesn’t support Entity Framework. MongoDB is one of them and I gave it shot to implement ASP.NET Identity MongoDB store. The result was really good:

mongodb-aspnet-identity

Library is available on NuGet as Dnx.Identity.MongoDB package and it supports beta8 runtime release. For now, it’s part of a sample project I am working on but it will probably make it into its own repository soon. I also have a sample here which is the fork of the original Identity sample. You can look at this commit to see what I had to do to make it support my custom provider which is not that bad, if you ignore the dependency injection dance I had to make. That’s because my implementation of ASP.NET Identity library doesn’t support IRoleStore. Don’t worry, you will not need this as you already have IUserClaimStore and also, there is an open issue to change the dependency injection hook a bit so that the IRoleStore would be optional.

Here are a few more details about this ASP.NET 5 Identity MongoDB implementation:

  • Currently, there is no documentation.
  • MongoUserStore is thread safe. You can register this as a singleton.
  • At the moment, there is no logging and nice exception handling for the implementation.
  • The implementation only supports dnx452 or above and it doesn’t support corefx as MongoDB .NET Client has no support for that.
  • The library doesn’t support SetUserNameAsync as the user name is also the Id of the user. So, you cannot change the Id.
  • The implementation requires you to pass an implementation of IMongoDatabase through the constructor and it persists data into a collection which is named "users". There is going to be a way to change the name of the collection soon in upcoming releases.
  • E-mail uniqueness is ensured through a unique MongoDB index. However, this will not function properly if you shard the users collection.
  • The implementation doesn’t persist changes unless you call one of the UpdateAsync, CreateAsync or DeleteAsync methods (which is what UserManager does).

Give it a try and you can file issues here for now and send pull requests. Enjoy :)

ASP.NET 5 and Log Correlation by Request Id

ASP.NET 5 is full of big new features and enhancements but besides these, I am mostly impressed by little, tiny features of ASP.NET 5 Log Correlation which is provided out of the box. Let me quickly show you what it is in this post.
2015-10-28 00:44
Tugberk Ugurlu


ASP.NET 5 is full of big new features and enhancements like being able to run on multiple operating systems, incredible CLI tools, hassle-free building for multiple framework targets, build only dependencies and many more. Besides these, I am mostly impressed by little, tiny features of ASP.NET 5 because these generally tend to be ignored in this type of rearchitecturing works. One of these little features is log correlation. Let me quickly show you what it is and why it made me smile.

BIG ASS CAUTION! At the time of this writing, I am using DNX 1.0.0-beta8 version. As things are moving really fast in this new world, it’s very likely that the things explained here will have been changed as you read this post. So, be aware of this and try to explore the things that are changed to figure out what are the corresponding new things.

Also, inside this post I am referencing a lot of things from ASP.NET GitHub repositories. In order to be sure that the links won’t break in the future, I’m actually referring them by getting permanent links to the files on GitHub. So, these links are actually referring the files from the latest commit at the time of this writing and they have a potential to be changed, too. Read the "Getting permanent links to files" post to figure what this actually is.

Brief Introduction to Logging in ASP.NET 5 World

If you want to skip this part, you can directly go to "Log Correlation" section below.

As you probably know, ASP.NET 5 also has a great support for logging. The nicest thing about this new logging abstraction is that it’s the only logging abstraction which every provided library and framework is relying on. So, when you enable logging in your application, you will enable it in all components (which is perfect)! Here is a sample in my MVC 6 application. I am just adding MVC to pipeline here, enabling logging by hooking Serilog and configuring it to write the logs to console:

using System;
using Microsoft.AspNet.Builder;
using Microsoft.Framework.DependencyInjection;
using Microsoft.Framework.Logging;
using Serilog;

namespace LoggingCorrelationSample
{
    public class Startup
    {
        public Startup(ILoggerFactory loggerFactory)
        {
            var serilogLogger = new LoggerConfiguration()
                .WriteTo
                .TextWriter(Console.Out)
                .MinimumLevel.Verbose()
                .CreateLogger();

            loggerFactory.MinimumLevel = LogLevel.Debug;
            loggerFactory.AddSerilog(serilogLogger);
        }

        public void ConfigureServices(IServiceCollection services)
        {
            services.AddMvc();
        }

        public void Configure(IApplicationBuilder app)
        {
            app.UseMvc();
        }
    }
}

When I run the application and hit a valid endpoint, I will see bunch of things being logged to console:

image

Remember, I haven’t logged anything myself yet. It’s just the stuff I hooked in which were already relying on ASP.NET 5 logging infrastructure. This doesn’t mean I can’t though. Hooking into logging is super easy since an instance of ILoggerFactory is already inside the DI system. Here is an example class which I have for my application and it is responsible for getting the cars (forgive the stupid example here but I am sure you will get the idea):

public class CarsContext : IDisposable
{
    private readonly ILogger _logger;

    public CarsContext(ILoggerFactory loggerFactory)
    {
        _logger = loggerFactory.CreateLogger<CarsContext>();
        _logger.LogDebug("Constructing CarsContext");
    }

    public IEnumerable<string> GetCars()
    {
        _logger.LogInformation("Found 3 cars.");
        
        return new[]
        {
            "Car 1",
            "Car 2",
            "Car 3"
        };
    }
    
    public void Dispose()
    {
        _logger.LogDebug("Disposing CarsContext");
    }
}

I will register this class so that it can get the dependencies it needs and also, it can be injected into other places:

public void ConfigureServices(IServiceCollection services)
{
    services.AddMvc();
    services.AddScoped<CarsContext, CarsContext>();
}

Finally, I will use it inside my controller:

public class CarsController : Controller
{
    private readonly CarsContext _carsContext;
    
    public CarsController(CarsContext carsContext)
    {
        _carsContext = carsContext;
    }
    
    [Route("cars")]
    public IActionResult Get()
    {
        var cars = _carsContext.GetCars();
        return Ok(cars);
    }
}

Just seeing how beautifully things are coming together is really great! When I run the application and hit the /cars endpoint now, I will see my logs appearing along side the framework and library logs:

image

Same goes for your middlewares. You can naturally hook into logging system from your middleware thanks to first class middleware DI support.

public class RequestUrlLoggerMiddleware 
{
    private readonly RequestDelegate _next;
    private readonly Microsoft.Framework.Logging.ILogger _logger;
    
    public RequestUrlLoggerMiddleware(RequestDelegate next, ILoggerFactory loggerFactory) 
    {
        _next = next;
        _logger = loggerFactory.CreateLogger<RequestUrlLoggerMiddleware>();
    }
    
    public Task Invoke (HttpContext context)
    {
        _logger.LogInformation("{Method}: {Url}", context.Request.Method, context.Request.Path);
        return _next(context);
    }
}

Notice that we have a log message template rather the actual log message here. This is another great feature of the new logging system which is pretty much the same as what Serilog have had for log time.

When we run this, we should see the middleware log appear, too:

image

Log Correlation

Without doing anything else first, let me also write logs to Elasticsearch by pulling in Serilog Elasticsearch sink and hooking it in. After hitting the same endpoint, I have the below result inside my Elasticsearch index:

image

You can see that each log message has got richer and we can see new things like RequestId which will allow you to correlate your logs per request. This information is being logged because the hosting layer starts a new log scope for each request. RequestId is particularly useful when you have an unexpected behavior with an HTTP request and you want to see what was happening with that request. In order to take advantage this, you should send the the RequestId along side your response (ideally among the response headers). The below is a sample middleware which you can hook into your pipeline in order to add RequestId to your response:

public class RequestIdMiddleware
{
    private readonly RequestDelegate _next;

    public RequestIdMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task Invoke(HttpContext context)
    {
        var requestIdFeature = context.Features.Get<IHttpRequestIdentifierFeature>();
        if (requestIdFeature?.TraceIdentifier != null)
        {
            context.Response.Headers["RequestId"] = requestIdFeature.TraceIdentifier;
        }

        await _next(context);
    }
}

Note that, IHttpRequestIdentifierFeature is the way to get a hold of RequestId in beta8 but in upcoming versions, it’s likely to change to HttpContext.TraceIdentifier.

If you look at the response headers now, you should see the RequestId header there:

HTTP/1.1 200 OK
Date: Wed, 28 Oct 2015 00:32:22 GMT
Content-Type: application/json; charset=utf-8
Server: Kestrel
RequestId: 0b66784c-eb98-4a53-9247-8563fad85857
Transfer-Encoding: chunked

Assuming that I have problems with this request and I have been handed the RequestId, I should be able to see what happened in that request by running a simple query on my Elasticsearch index:

image

That’s pretty much it and as mentioned, this is one those tiny features which was always possible but painful to get it all right. If you are also interested, you can find the full source code of the sample under my GitHub repository.

Tags