Social Network using Titan Db: Part 2

Following the first part of this post here we’re going to give answers to more complex questions. The previously described questions have been a little bit changed:

  1. What are my followers with ages between 20 and 25 years old?
  2. How many people from Argentina is following me? And who are they?
  3. What are my followers since determined date?
  4. What’s the first follower of whom I have started following first?
  5. Can I reach the president of my country by common connections of my country? If so, show his information and the path.

Connections

Changes in the model are more than necessary since we need more information about people in our network and also about the relationships, so this is how the the connections look like now:

social-network-complex-case

Persons (represented as vertices) have now information about name, age, country and profession (also an Id that is not represented in the graphic). Relationships “following” / “followedBy” (represented as edges) have now a timestamp so we can find out since when someone is following you.

So for example, Gabi has started following Damian on July 1st of 2015 and Damian has started following him back on August 14th of 2015.

Answers by example

Based on the information in the diagram shown above we can choose someone and try to answer the questions by just looking at it in detail, so we can say:

  1. What are Gabi’s followers with ages between 20 and 25 years old? Ivan (21) and Guille (24).
  2. How many people from Argentina is following Gabi? And who are they? 3 people: Damian, Ivan and Guille.
  3. What are Gabi’s followers since January 2017? Ivan (March 2017) and Guille (February 2017).
  4. What’s the first follower of whom Mike has started following first? Mike has first started following Guille (July 2010) and Guille’s first follower is Gabi (January 2007).
  5. Can I reach the president of Argentina by Gabi’s common connections of his country? If so , show his information and the path. Yes, Capusotto. Path: Gabi –> Damian –> Gustavo –> Capusotto.

Solution

Now, that was easy because there’s only 9 people in our network, but as soon as it starts growing it’s going to be really difficult to tell but that’s not a problem at all, we can solve it using Titan Db.

This is how we create a person now that we have more information:

def createPerson(person: Person): Option[Person] = findPerson(person.id) match {
  case Some(v)  => None
  case None     =>
    g + (PersonLabel, PersonId          -> person.id,
                      PersonName        -> person.name,
                      PersonAge         -> person.age,
                      PersonCountry     -> person.country,
                      PersonProfession  -> person.profession)
    g.tx().commit()
    Some(person)
}
Filters

In order to filter out our results we will need to make use of the org.apache.tinkerpop.gremlin.process.traversal.P class that represents a Predicate. This is how we can find the followers with ages between 20 and 25 years old using P:

def findFollowersWithAgeRange(personId: PersonIdentifier, from: Int, to: Int): List[Person] = {
  g.V.has(PersonId, personId.id)
    .outE(FollowedBy)
    .inV()
    .has(PersonAge, P.gte(from)).has(PersonAge, P.lte(to))
    .map(_.mapPerson)
    .toList()
}

Basically what we are doing above is applying the filters just where we need them. Firstly we look up on all the vertices for a Person with a particular Id and once we get it, we continue traversing the graph on all the outgoing edges labelled with “FollowedBy” followed by an incoming vertex where the property “PersonAge” has a value between 20 and 25. Finally we map all the fields to our Person case class using a custom implicit conversion, see full source code (link at the end of the post).

Now finding the followers since determined date is quite similar, except this time we apply the predicate on the outgoing edges:

def findFollowersSince(personId: PersonIdentifier, since: Instant): List[Person] = {
 g.V.has(PersonId, personId.id)
   .outE(FollowedBy)
   .has(TimestampKey, P.gte(since.toEpochMilli))
   .inV()
   .map(_.mapPerson)
   .toList()
}

And in order to find the first follower of whom we has first started following we don’t need to use filters but sorted data by Timestamp (using orderBy) and iteration (for comprehension):

def findFirstFollowerOfTopOne(personId: PersonIdentifier): Option[(Person, Person)] = {
 val result = for {
   f <- g.V.has(PersonId, personId.id).outE(Following).orderBy(TimestampKey.value).inV()
   g <- g.V(f).hasLabel(PersonLabel).outE(FollowedBy).orderBy(TimestampKey.value).inV()
 } yield (f.mapPerson, g.mapPerson)
 result.headOption()
}

Note that in the second statement when we get the variable “g” we are not traversing the entire graph again but we are starting from a particular vertex “f”, using the g.V.apply(…) method.

Loops

In the end we have to find a way to our president by following common connections of our country. For this purpose we’ll use loops to traverse the graph in different directions given a predicate (repeat / until):

private def findPresident(personId: Long, country: String) = {
 g.V.has(PersonId, personId)
   .repeat(_.outE(Following).inV.has(PersonCountry, P.eq(country)).simplePath)
   .until(_.has(PersonProfession, "President"))
}

def findPresidentByCommonConnections(personId: PersonIdentifier, country: Country): Option[Person] = {
 findPresident(personId.id, country.value).map(_.mapPerson).headOption()
}

Now to get the path to our president we need to do as shown below:

def findPathToPresident(personId: PersonIdentifier, country: Country): List[Person] = {
 findPresident(personId.id, country.value)
   .path()
   .headOption()
   .toList
   .flatMap(_.persons)
}

implicit class PathOps(path: Path) {
 import scala.collection.JavaConversions._

 def persons: List[Person] = {
   path.objects.toList.collect {
     case v: Vertex => v.mapPerson
   }
 }
}

We get the first possible path to the president (headOption) and then we map every vertex in the path as a Person. This is done by a custom implicit conversion shown above in the implicit class PathOps. Note that we have to use JavaConversions to use the collect method.

This technique is well described on the Michael Pollmeier’s blog post.

Running the examples

There’s no main application in our project, only unit tests. So the only way to verify how the app works is by running the tests. My favorite way is to execute ‘sbt test’ on a terminal.

Alternatively, I created a gist with a main app where our flow chart is translated into code.

Final thoughts

There will be no more code following this post but something that I would like to do in the future is to benchmark the app just to see things like performance, reliability and latency, among others. Since we can integrate Titan Db with Cassandra and Spark for example, it would be really interesting to see how far we can go. But that would probably be another post, not sure yet.

As always, the full source code of the given example can be found on GitHub.

I hope you find this post interesting and feel free to comment about your experience with Titan Db, I’d like to hear about people using it in production!

Greetings from Cambodia! 🙂
Gabriel.

3 comments

  1. Pingback: Social Network using Titan Db: Part 1 | PartialFlow

Leave a comment