Sunday, May 13, 2018

Analysis of Flocking Patterns and Relations

Movement of humans and analysing the movement patterns is an interesting problem. It can be considered to flocking behaviour in species like birds, swarms. Brownian motion can be used to model such collective movements. Representing the raw data from the movement of humans across locations in tabular form or relational model is inefficient. So modelling such data on any cloud platform without a graph based model is limited. An example is BigQuery. Advantage of using BigQuery is that the Google takes care of the infrastructure and massive amount of data streams, and it is NoSQL at the storage layer, but it is still tabular in nature and a relational model is not the most apt way when it comes to modelling location data, movement patterns and making sense of the information for further use in recommendation engines and such.

Looking at various Graph databases at present, OrientDB looks very capable and performant when compared to the well known Neo4j. AllegroGraph is cool if we use RDF and SPARQL or if we use Prolog for reasoning, but it does not have support for Gremlin, which I think Prolog makes up for it, though that would have been a nice addition. The advantage of using an Apache TinkerPop-enabled data system is that we can use any backing datastore like OrientDB, Neo4j, Apache Spark, without having to use the datastore's own DSL. Gremlin graph traversal langauge is to Graph DB, what SQL is in a relational datastore and it makes working with such systems a pleasent experience instead of having to fiddle with DSLs for each different datastores one would encounter.