Monday, June 18, 2018

Conference Summary - Building Data products at Uber

This is my summary of HasGeek Open House conference on Building Data Products at Uber, by Hari Subramanian held on 15th this month.

1. Data size is in petabytes.
2. Results found in staging is not quite the same when using the same model in production due to various factors.
3. For deep learning, tensor flow is used. Results found in AWS and GCP are different.
4. They have build their own BI tools for visualisation.
5. Hive is extended in-house. Hive and Spark overlaps to a certain extend. There are few map-reduce jobs still used which is why Hive is used.
6. Uses own datacenter.

The talks was a high level overview of how Uber uses ML.