Venkat Krishnamurthy

Jun 26, 2018

Announcing MapD 4.0: Geospatial Data Types, Improved Rendering Engine and Query Engine Advances

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.

The MapD analytics platform exists at the intersection of two trends that are now fundamental prerequisites for data analytics: interactivity and scale. Interactivity, is the ability to ask and answer questions at the speed of thought, and scale, refers to the ability to understand ever-increasing volumes of high-velocity data.

We’ve never thought about the MapD platform as ‘just’ a superfast SQL engine that can leverage GPUs or ‘just’ an immersive, interactive visualization solution. MapD has always been both, and we care about creating technology for a unified MapD experience around interactivity at scale.

With that in mind, we’re thrilled to announce MapD 4.0, which gives analysts transformational new ways to interact with very large datasets, especially those that include spatial data with temporal data (or more accurately, spatio-temporal data). Interacting with geospatial data on a map, and/or as a time series chart, provides context and visual signals that we can understand clearly and intuitively.

This version makes major strides in interactive geospatial analytics and also addresses the key needs of our growing enterprise and government customer base. Without a doubt, this is a significant release both in terms of new features across the entire MapD platform. Here are some of the key highlights:

New in MapD Core

Geospatial Types and Functions

If you’ve seen any of our interactive demos, starting from the original Tweetmap, you understand that data of any sort is created (and exists) in space and time. These dimensions provide the most natural and useful context to any analytics (“Where did it happen, and when?”).

With MapD 4.0, we’re taking the first big step towards making geospatial analytics available to any user, from an experienced GIS analyst or data scientist who can write complex SQL queries over location-enriched data, to a business user of MapD Immerse who wants to go a step beyond mainstream BI tools to derive insight quickly and visually from the same data.

MapD 4.0 adds native support for geospatial data types and related functions. Specifically, we’ve added support for the most commonly used planar geometry types - POINT, LINESTRING, POLYGON and MULTIPOLYGON (also referred to as vector data in GIS parlance). This means you can create tables with geospatial data as follows:

CREATE TABLE geo1 (
p1 POINT,
l1 LINESTRING,
poly1 POLYGON,
mpoly1 MULTIPOLYGON);

We’ve added a number of functions to support SQL queries of these data types for analytics, in the following categories:

Constructors -ST_GeomFromText, ST_GeogFromText
Geometry Editors -ST_Transform, ST_SetSRID
Accessors -ST_XMin, ST_XMinST_XMax, ST_YMin, ST_YMax, ST_StartPoint, ST_EndPoint, ST_PointN, ST_NPoints, ST_NRings, ST_SRID
Joins -ST_Distance, ST_Contains

In MapD 4.0, users can write SQL like the following, to process geospatial data with the same GPU-accelerated parallel processing power already possible with other data types:

SELECT ST_Contains(mpoly2, ST_GeomFromText(‘POINT(-71.064544 42.28787)’,
4326)) from geo;

UPDATE and DELETE

While a major part of the focus for MapD Core in version 4.0 was geospatial capabilities, we also heeded requests from our biggest users for full CRUD (CREATE, READ, UPDATE, DELETE) capabilities, particularly to support their operational analytics use cases.

As of 4.0, MapD supports both standard SQL syntax for UPDATE and DELETE, so you can run commands like:

UPDATE flights SET arrdelay = arrdelay - 2;
or
DELETE FROM nyc_taxi WHERE boroname = ‘Staten Island’;

These commands are targeted more at bulk update/deletes that you would encounter in an analytics workflow, rather than the single-row high-throughput updates that an OLTP database would handle.

Also, both UPDATE and DELETE are integrated with the MapD RBAC model (see below), so users can only perform these actions if they’ve been specifically granted privileges to do so.

Role Based Access Control

With MapD 4.0, we’re addressing another major long-standing user request -- support for Role-based access control (RBAC). MapD Core now supports Users and Roles as first class entities, and also provides an integrated permissions model for database objects.

Specifically, MapD Core supports access control on databases, tables, views and dashboards. Here’s an example sequence of administrative actions that are now possible with support for Object Permissions.

With the permissions model now in place, we’re also working on supporting external directory services for authorization as well as single sign-on support using standard frameworks like SAML 2.0. Watch out for these in the near future.

Query Engine Performance Enhancements

As always, SQL query performance is the calling card of MapD as a platform. In MapD 4.0, we’ve focused on making several major performance improvements to the MapD Core’s SQL engine. For example, better handling of large projection queries (that select a significant number of columns from a table) are anywhere from 2-5x faster than in earlier versions. We also realized performance gains from fragment skipping, as well as a significant speed ups on sharded joins. We will publish benchmarks that target real-world use cases shortly.

As usual, you can find all the details on these in the MapD documentation.

New in MapD Render

We made major improvements to MapD Render (our rendering engine based on the Vega visualization grammar). Our rendering engine now allows for truly interactive exploration of large geospatial datasets with millions or billions of rows, and is seamlessly integrated with the new geo data types and functions now supported in MapD Core.

It is easier to see the effect of this work than describe it with words. The video example below shows how MapD Immerse, in MapD 4.0, enables interactive visual exploration of billions of NYC taxi rides and the million New York City buildings where they ended.

As you can see, we’re rendering over a million building footprints in a Choropleth layer, while visually cross-filtering over a billion geolocated points of taxi pickup/dropoff data, all within milliseconds.

‍

Rendering Engine Improvements

We made significant improvements in both scale and performance in the MapD rendering engine to support MapD 4.0. Now, MapD can easily render (and re-draw) more than a million complex POLYGONs in < 1 second (as you can see in the example above) on a single node. The point map and heatmap chart types in Immerse display these and other related rendering optimizations to the end user.

Rendering Support for Polygons

The foundational performance and scalability work that went into MapD Render allows users to upload and explore large, complex shape-based datasets. The following example comes from the Eviction Lab dataset repository. On a choropleth map, the user can see evictions in the US, zooming down from the national level to every block group in Lansing, Michigan. They can then cross-filter on time to see trends.

‍

We’re currently working on extending this to distributed configurations of MapD, and also adding the LINESTRING geospatial data type. In MapD, you’ll soon be able to see and analyze large trip trajectory datasets like those in the following image:

Vega Transforms

In MapD 4.0, our rendering engine also supports Vega transforms. Before now, the Vega renderer would use an additional pre-flight query to gather statistics to be used in the rendered output (e.g., aggregates). While this query was fast, it is no longer necessary with Vega transform support, resulting in a simplified Vega rendering spec and improved performance. Further, the transforms can be used for computing standard statistics like mean, standard deviation, and quantiles for display in Vega-rendered charts. We’ll be improving this further in coming versions.

What’s New in MapD Immerse

Shared Dashboards

Using the Object Permissions capability we added in MapD Core, we’ve defined Dashboards as a distinct object type. This allows the users of MapD Immerse to share dashboards with other users or roles, for greater collaboration with the necessary security guardrails.

Sharing dashboards is as simple as opening a share dialog from within the dashboard and selecting from the list of users and roles.

Shared dashboards are read-only in 4.0, and for security purposes, a user receiving a shared dashboard has to have permissions on the underlying data in order to view the dashboard. In upcoming releases, we’re working on adding more than view-only interaction for those who receive shared dashboards and we’ll be making permission administration easier.

Importing Geospatial Data

In addition to integrating geospatial capabilities into MapD Core and integrating this into the MapD rendering engine, we’ve made major improvements in how users can import geospatial data from various specialized formats from within Immerse. Geospatial data can be loaded as usual from both local files and archives, and S3 buckets, and we’ve significantly improved the auto-detection related to geospatial data formats.

Enhancements to Geo Charts

The Choropleth chart (which you saw above) has undergone major enhancements in 4.0. It is now completely rendered server-side, which allows it to be up to 100x more performant (we have customers already looking to render 10 million shapes for real-time interaction). In addition, we’ve streamlined the UI to simplify the loading and usage of geospatial data files with this chart.

We’ve also made major performance improvements in rendering point map and geo heatmap charts, allowing the chart to easily handle datasets of a billion points or more. In addition, we’ve simplified the UI workflow involved in these charts with geospatial datasets.

Formatting for Chart Measures

We’ve added initial support for allowing user-specified formats for measures in various charts. From within the chart editor, you can select from a number of predefined format specifiers as well as prefixes and apply them to chart measures.

Wrapping Up

With over 250+ product improvements and bug fixes, MapD 4.0 represents a major step forward for the platform. This would not have been possible without our amazing engineering team, our committed customers, and the open-source contributor community that continues to drive us forward.

MapD Cloud is up to date with 4.0. Sign up for a free two week trial to begin using MapD 4.0 in a matter of minutes. As always, you can download the Community Edition of MapD, contact sales@mapd.com or your MapD rep for our fully featured Enterprise Edition. Open source MapD Core is available in our updated GitHub repo.

‍

Venkat Krishnamurthy

Venkat heads up Product Management at HEAVY.AI. He joined OmniSci from the CTO office at Cray, the supercomputing pioneer, where he was responsible for leading Cray’s push into Analytics and AI. Earlier, he was Senior Director at YarcData, a pioneering graph analytics startup, where he bootstrapped product and data science/engineering teams. Prior to YarcData, he was a Director of Product Management at Oracle, where he led the launch of the Oracle Financial Services Data Platform, and earlier spent several years at Goldman Sachs, where he led one of the earliest successful projects utilizing machine learning in Operational Risk incident classification. Venkat is a graduate of Carnegie Mellon University and the Indian Institute of Technology Chennai, and also a certified Financial Risk Manager.