At a conference for developers at Facebook headquarters on Thursday, engineers working for the social networking giant revealed that it’s using a new homemade query engine called Presto to do fast interactive analysis on its already enormous 250-petabyte-and-growing data warehouse. Now more than 850 Facebook employees use Presto every day, scanning 320 TB each day.
Facebook created Hive several years ago to give Hadoop some data warehouse and SQL-like capabilities, but it is showing its age in terms of speed because it relies on MapReduce. Scanning over an entire dataset could take many minutes to hours, which isn’t ideal if you’re trying to ask and answer questions in a hurry.
With Presto, however, simple queries can run in a few hundred milliseconds, while more complex ones will run in a few minutes. It runs in memory and never writes to disk, making possible to analyze the data warehouse wich is 4000 times bigger than it was 4 years ago.
Read the entire article on Gigaom.