Oct 10, 2016

Supercomputers in the Cloud: Blowing Past the Billion Row Benchmark

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.


The old cliché that records are meant to be broken certainly applies to the tech industry. They‘re milestones that reset the bar for what’s possible but more importantly serve as a barometer of where things are headed.

So today we’re announcing a new “world record” for in-memory database performance. With some help from our friends at IBM, NVIDIA and Bitfusion we were able to scale up to 64 GPUs, across 32 servers, to filter, query and aggregate a 40 billion row dataset in just 271 milliseconds.

To put that in context, together with our partners, we can scan 147 billion rows a second.

It is what we call speed at scale and this is the type of performance that will define analytics in the petabyte age.

“It isn’t just about big data anymore; it has more to do with fast data," said Marc Jones, Director + Distinguished Engineer, IBM Cloud Platform. “The objection that a data problem is just too big, or takes too long to run, or is too complex, is the problem MapD is addressing head on. These benchmarks showcase an inflection point where GPUs will become the engine for the next generation of enterprise computing applications.”

The dataset in question was the US Flight Data set from 1987 to 2008, representing 128M rows and was replicated 312 times. The queries used were:

  1. Query Id is Q001 : query is 'select count(*) from flights2’
  2. Query Id is Q002 : query is 'select carriername, count(*) from flights2 group by carriername’
  3. Query Id is Q003 : query is 'select carriername, avg(arrdelay) from flights2 group by carriername'

Query 2 came back in 271 milliseconds and that is what translates to the 147 billion rows a second figure.

These benchmarks underscore the fact that our purpose built GPU database is in a league of its own. Everything we do is viewed through the lens of speed and scalability. Will it make us faster? Will it make us faster across twice our current capabilities?

The work we have done with our partners, resulting in our ability to easily pool and scale GPUs across multiple nodes, into a single system - is a significant breakthrough and game changer.

We can now give customers “supercomputer” like performance on a simple pay as you go billing model, thanks to the cloud.

Bitfusion put it best in their viral tweet on the subject:

The ability to explore data and run queries in near real-time gives companies in industries like financial services, government, telecommunications, retail and adtech the types of tools to complete more effectively, respond more rapidly and tackle challenges they previous considered too hard for their legacy compute platforms.

While we have written about the inflection point in compute, the truth is it is happening before our eyes and as we speak. This datapoint is just further evidence of the dawn of the Age of the GPU.


HEAVY.AI (formerly OmniSci) is the pioneer in GPU-accelerated analytics, redefining speed and scale in big data querying and visualization. The HEAVY.AI platform is used to find insights in data beyond the limits of mainstream analytics tools. Originating from research at MIT, HEAVY.AI is a technology breakthrough, harnessing the massive parallel computing of GPUs for data analytics.