A while ago, I have written up on Graphs and gave a few examples about their application for real world problems. I absolutely love graphs as they are so powerful to model the data for several key computer science problems. In this post, I want to talk about one of the most common graph algorithms, Depth-first search (DFS) and how and where it could be useful.
What is Depth-First Search (DFS)?
DFS is a specific algorithm for traversing and searching a graph data structure. Depending on the type of graph, the algorithm might differ. However, the idea is actually quite simple for a Directed Acyclic Graph (DAG):
- You start with a source vertex (let's call it "S")
- You visit the first neighbour vertex of that node (let's call this "N")
- You do the same for "N" and you keep going till you end up at a leaf vertex (L) (which is a vertex that has no edges to another vertex)
- Then you visit the second neighbour of L's parent vertex.
- You would be once you exhaust all the vertices.
I must admit that this is a bit simplified version of the algorithm even for a DAG. For instance, we didn't touch on the fact that we might end up actually visiting the same vertex multiple times if we don't take this into account in our algorithm. There is a really good visualization of this algorithm here where you can observe how the algorithm works in a visual way through a logical graph representation.
Application of Depth-First Search
There are various applications of DFS which are used to solve particular problems such as Topological Sorting and detecting cycle in a graph. There are also occasions where DFS is used as part of another known algorithm to solve a real world problem. One example to that is the Tarjan’s Algorithm to find Strongly Connected Components.
This is also a good resource which lists out different real world applications of DFS.
Other Graph Traversal Algorithms
As you might guess, DFS is not the only known algorithm in order to traverse a graph data structure. Breadth-First Search (BFS) is a another most known graph traversal algorithm which has the similar semantics to DFS but instead of going in depth on a vertex, it prefers visit the all the neighbors of the current vertex. Bidirectional search is another one of the traversal algorithms which is mainly used to find a shortest path from an initial vertex to a goal vertex in a directed graph.
Lately, I wanted to spend a little bit time on going back to fundamental computer science concepts. Hopefully, I will be able to write about these while I am looking into them in order to offload the knowledge from my brain to the magic hands of the Web :) I am going to start with Graphs, specifically Depth First Traversal (a.k.a. Depth First Search or DFS) and Breadth First Traversal (a.k.a Breadth First Search or BFS). However, this post is only about the definition of Graph and its application in software systems.
What is a Graph?
I am sure you are capable of Googling what a Graph is and ironically maybe that’s why you are reading this sentence now. However, I am not going to put the fancy explanation of a Graph here. Wikipedia already has a great definition on a Graph which can be useful to start with.
Let’s start with a picture:
This is a graph and there are some unique characteristics of this which makes it a graph.
- Vertices (a.k.a. Nodes): Each circle with a label inside the above picture is called a vertex or node. They are fundamental building blocks of a graph.
- Edges (a.k.a. Arc, Line, Link, Branch): A line that joins two vertices together is called as edge. An edge could be in three forms: undirected and directed. We will get to what these actually mean.
At this point you might be asking what is the difference between a graph and a tree? A tree is actually a graph with some special constraints applied to. A few of these that I know:
- A tree cannot contain a cycle but a graph can (see the A, B and E nodes and their edges inside the above picture).
- A tree always has a specific root node, whereas you don’t have this concept with a graph.
- A tree can only has one edge between its two nodes whereas we can have unidirectional and bidirectional edges between nodes within a graph
I am sure there are more but I believe these are the ones that matter the most.
As we can see with the tree example, graphs comes in many forms. There are many types of graphs and each type has its own unique characteristics and real world use cases. Undirected and directed graphs are two of these types as I briefly mentioned while explaining the edges. I believe the best example to describe the difference between them is to have a look at the fundamental concept of Facebook and Twitter.
Application of Graphs
Graphs are amazing, I absolutely love the concept of a graph! Everyone interacts with a system everyday which somehow makes use of graphs. Facebook, Google Maps, Foursquare, the fraud check system that your bank applies are all making use of a graph and there are many, many more. One application of graph concept which I love is a recommendation engine. There are many forms of this but a very basis one is called Collaborative Filtering. At its basis, it works under a notion of “Steve and Mark liked BMW, Mercedes and Toyota, you like BMW and Toyota, and you may like Mercedes, too?”.
There are some really good graph databases with their own query languages as well. One that I love about is Neo4j which uses Cypher query language to make its data available to be consumed. On their web site, there are a few key applications of Neo4j listed and they are fundamentally real world applications of the graph concept.
You can also come across some interesting problems in the space of mathematics which has solutions based on a type of graph like Seven Bridges of Königsberg problem (and I think this problem is the cornerstone in the history of graph theory).
It's very common that you get asked about your proudest achievement. I wanted to put mine here publicly so that I would have a place to direct people to. So, here it is :)
My proudest achievement to this day dates back to 2010. I was working at a local Travel Agency in Turkey while still studying Travel Management at the university and we had a Web site for our customers to book their airport transfers from/to their hotels by paying online. However, the application didn't allow our customers to book additional services with extra cost such as baby booster seat. In addition to this, we were unable to reflect our pricing accurately for particular conditions due to the limitations on the system. At the time, I was working at the reservations and booking department but I had a huge interest on software development, especially on web applications.
Lots of things happened after this and I achieved so much more such as being part of several successful teams to create valuable software products, being published, having the Microsoft MVP award for 5 years in a row, speaking at lots of international conferences, maintaining a successful blog for 7+ years and many more. However, nothing was able to beat that because it was a unique opportunity to be able to fight for something I truly believed in. Besides that, having a true leader as your manager is a unique opportunity. He trusted me and my skills, and when looking back at this now, it's very clear to see that I would never have become a good software developer without this trust and my confident in myself.
What's Your Proudest Achievement?
Well, it's your turn. Hopefully I encouraged you to share yours publicly as well. Please share yours as a comment here, preferably by linking to your blog post which you are about to write :)
If you are a software engineer, this is a very common question you will get to ask yourself a lot. This is going to be especially very frequent if you are being part of the recruitment process in your company. As you may know, I work at Redgate, and we have a good culture for development teams. Besides that, common characteristics of a good engineer with examples and counter examples for each engineering role are defined, too. This is a really good guidance for the employer to reflect their culture for a particular role. It’s also good for the employees to understand where they are on being an effective employee.
I got inspired by this and I wanted to share the list of principals I value and look for within a software engineer. Obvious disclaimer: this is not the list of principals that my employer values even if the most of them are pretty similar. As we got the disclaimer out of our way, let's see these principals:
- Knows the fundamental concepts, data structures and common algorithms rather than only being too good with a programming language or a specific framework w/o understanding the basics. In other words, know the basics and be polyglot.
- Has good communication skills - both verbal and written. Without this, it's impossible to be a good software engineer.
- Being pragmatic - Works incrementally and balances delivering value frequently with delivering high quality.
- Iterates fast - Values Continuous Integration (CI) and Continuous Delivery (CD), makes their code fail fast, enforce consistency and keep master branch releasable. Your release process should as easy as adding a git tag as a valid semantic version.
- Cares for sustainability - Strives for producing code which will sustain for years, even decades. Not one-off, works-now-who-knows-when-it-will-stop-working ones.
- Knows the business - Cares to understand the business domain and strives for establishing an ubiquitous language between the software product team and stakeholders.
- Strives for THE BEST UX - Makes user experience the part of the product completeness.
- Being a team player - Works with their peers, gets/gives code review from/to them. Develops their skills while they are developing their own. Should strive for being transparent to the team all the time.
- Knows the metrics but also has a vision - Should know the metrics and how to get them to make decisions. However, they should have a vision at the same time, too. They should not have the "Let’s ask the users” mindset as the default approach for product feature decisions. Remember, good artists copy; great artists steal! The problem you are trying to solve has been probably solved within the same or a different context. Find that, bend it and apply differently.
- Disagrees and commits when needed - Should not be shy about getting their opinions out and pursue them. However, they should also know that a decision has to be made, and when that’s the case, they should commit fully and try to get the best out of it even if it’s not the decision they wanted to see.
- Values open source and contributing back to software community - Has a blog, gives talks at conferences or user groups, contributes back to open source projects. Simply shares what they are proud of with others openly.
There are probably more but these are the most important ones that I care about and value at a very high level. However, I wonder what yours are, too. Therefore, please share them with me here by send me a comment.
I went to a restaurant today and one particular thing struck me there. It made me really interested in writing this brain dump blog post. It was about the fact that restaurants and software applications have a lot in common in terms of how they (should) function. One common characteristic I know is coming from very clever and kind Steve Sanderson on his talk on asynchronous web applications and I won’t go into details on that. I would like to focus on the other common functional characteristic I noticed today but before that, here is a brief background of the story :)
After I walked through the door and sit on a table, the next thing I did was order beers and some onion rings. Then, I went to my table with beers and nearly less than 2 minutes later, the onion rings were in front of me. All looks perfect. These were well-cooked, yummy onion rings.
After a few sips and finishing the onion rings, I ordered my main course which was a medium-well cooked, 20oz steak. It took about 10 to 15 minutes for that to arrive. At the end of the day, I was happy. It was all OK and I didn’t mind waiting for the steak more than onion rings because it made sense.
Why am I explaining all these? Because this is how most of the software applications should function, especially the HTTP based applications. The typical software application has to deal with data which is represented through the domain of the application. One way or the other, software application users interact with this data. The important fact here is that not all of this data is the same; just like the onion rings and steak were not the same for the restaurant. So, you shouldn’t make the assumption of serving all of the data in the same manner through your software application.
Onion Rings Case
Let’s go back to my restaurant example. It was obvious to me that the restaurant fried the rings before they are being ordered. They were doing this because they know the general order rate for the onion rings. So, they were making a judgment call and were frying the rings before they are being ordered. This raises the the risk of serving the onion rings to each customer with a different heat and freshness level. However, this doesn’t matter that much because:
- The restaurant has an expected number in mind and they were not going crazy about this (like frying millions of onion rings). So, they were holding the balance.
- The first priority here was to serve good product. Second thing here is to solve the problem of serving the rings fast and spending the least amount of staff resources. All of these two were possible to achieve at the same time.
The same concept they applied there should be applied in your applications and I bet most of us doing this already. In some cases, users just don’t care about the accuracy and the freshness of the application data. Take the Foursquare example. Do I care if I don’t see the newly-created coffee shop inside my search results when I try to find the nearest coffee shops? You mostly don’t as it probably gives you other alternatives which definitely helps you at the time. Would you care if the search result was returned after 30 seconds of waiting? I bet you would be most certainly pissed about it.
If the onion ring case is a very well success, why didn’t they apply the same concept for the steak and serve it as fast as rings? The answer is very simple: serving the steak fast doesn’t make the customer happy if the steak is not cooked in a way the customer wants. The customer wants to tune the 'doneness' level of the steak and as long as they have it, it won’t matter if the serving time takes longer than the onion rings. Similar case applies to software applications, too. For instance, I would like to see my orders on amazon accurately all the time. Making them inaccurate just to serve the data fast would make me unhappy because I don’t care about how fast you are as long as it is a tolerable amount.
Let’s not pretend that every piece of data for our software application is the same. Well-tune the requirements, consider the trade offs and come up with a solid, functional plan to serve your data instead of blindly getting them out through the same door. If you are interested in this, make sure to read on Eventually Consistency. Also, looking into Domain Driven Design is another thing I would recommend.
Finally, pointing to something new or making a point on what we generally do wrong is not the idea of the post here. It was about explaining the concept with a unique example.