Dr. Michael Flaxman & Emily Fang
Oct 7, 2019

Billion-Row Geospatial Datasets Finally Have a Platform

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.


OmniSci has always seen great value in participating in conferences for free and open source software, bridging technologies to contributing and passionate users across the world. On August 26, 2019, the OmniSci Community team hopped on a plane to Bucharest, Romania where FOSS4G 2019 International was being held.

Along with an exhibitor booth, our VP of Global Community Aaron Williams delivered a keynote on Using GPU-acceleration to Interact with OpenStreetMap at Planet-Scale.

In his keynote, Aaron gave the audience an exploratory view of the explosive growth in quantity of geospatial data, and how this is fueling the need to more frequently join geospatial data with traditional data. He explained how those new scale requirements demanded a massive improvement in speed, and how GPUs can be used to accelerate the querying of geospatial data using an open source in-GPU-memory SQL database. As the final clincher, he provided an interactive analysis of the entire, global OpenStreetMap dataset as an example of what is now possible.

You can watch the recording of his presentation here: https://www.youtube.com/watch?v=_r4IqjGqGEY&t=87s

Our Reflections on FOSS4G

This year’s FOSS4G highlighted some exciting new developments in both open data and tools.  We are starting to see a community consensus on both the need to scale FOSS4G efforts, and on some technical paths towards that end. In particular, there was an emphasis on “analysis-ready data” and the use of machine learning to extract insight from earth observation (EO) datasets.  How exactly to best combine these new workflows with both conventional “authoritative” data and with crowd-sourcing remain important open questions, but we saw multiple innovative initiatives in both of those areas.

There were multiple conversations about how to get EO data “analysis ready” and what that means to different stakeholders.  “Data cubes” initiatives (such as https://www.opendatacube.org/) are one tactic, focused on getting stacks of imagery data into harmonized formats which are conceptually straightforward to use (even if potentially huge).  These can be considered extensions of the last couple of years’ of work on Cloud-Optimized GeoTIFFs (https://www.cogeo.org/), adding a time dimension. The SpatioTemporal Asset Catalogs or STACs initiative (https://stacspec.org/)  is another front, concentrating on standardizing queries for EO data across multiple satellites and providers.  

STACs is built on Web Feature Services using WFS3 (recently rebranded as OGC Open Geo API).    At the OmniSci booth, we showed some preliminary work on using OmniSci as a Web Feature Server using WFS3.  As an example, we showed how you could build a very fast geocoder using WFS3 and the open address dataset. A simple flask implementation achieved lookup performance of less than 10 millisecond per address. This was done as freestanding code, but Geodesign Technologies is currently working to refactor the code as an open source provider plugin for PyGeoAPI (https://pygeoapi.io/) framework.  More generally, WFS3 is showing great promise in allowing interoperability not only within conventional GIS tools but also with more general business and web technologies.  For example, since WfS3 and our demo are fully REST based, the geocoding service could be embedded in a spreadsheet.

These “data” initiatives are of course only the start of the workflows required, so our team was happy to be able to attend two workshops demonstrating downstream FOSS tools. The first was “Solaris” - an initiative to simplify applying cutting-edge machine learning methods to EO data, with outputs as vector features.  This was a significant advance which allowed us to get through a full geoML analysis in a few hours. The example application space used high-resolution imagery to detect buildings, which we globally is still a major need and challenge (particularly for informal settlements).  

A second workshop we attended showed how to use FOSS tools for hydrological modeling (Silvia Franceschi - FOSS tools for modelling natural hazards: the HortonMachine library). This was interesting in that hydrological problems are world-wide and increasing under both population pressure and climate change.  Seems that the need for good terrain data and hydrologically-correct terrain models is universal. But given only that, you can make some pretty reasonable first-principles models of important hydrological characteristics.

Example of hydrologically-correct terrain model generation and analysis with HortonMachine tools.

So where is this leading? At OmniSci, we see the FOSS4G universe as expanding significantly.  From a base of classical desktop GIS, we have a broader pipeline emerging:

EO Datasets -> Analysis-Ready-Data Cubes (ARD) -> geoML Tools -> Continuously-Updated Feature Geometries -> Scalable Spatial Analysis Tools -> Decision Support Dashboards

In other words, we see an emerging future in which sensors of all kinds (remote and local) will provide data for interpretation by ML tools into raster and vector features which are kept continuously up to date.  Some of those will require review by humans, likely with crowdsourcing, while the vast majority will be automatically committed. These massive spatial and temporal datasets will live primarily in the cloud, but with advanced interoperability and query tools, end users should not have to care because they will be able to manipulate them as if local.

This is more web and service-oriented than conventional workflows, and ultimately serves end users beyond the conventional GIS community by providing timely information in dashboard form rather than as conventional static print or slippery maps.  At each transfer/product stage above, OGC standards efforts seem critical, since there is a need to leverage expertise still focussed within these steps, but simultaneously to keep data updates moving. There is impressive value to be unlocked when analysis and data visualization tools are continuously provided fresh data.

Still a long way to go before such pathways are smooth, reliable and standard.  But as a partisan of “geodesign,” which seeks to build decision support tools based on actionable data, we’re glad to see so many pieces coming into place. FOSS4G continues to be a great global show place where folks representing all of these areas come together and work on interoperable solutions.

Dr. Michael Flaxman & Emily Fang

Dr. Michael Flaxman, Founder of Geodesign Technologies and GIS Consultant at HEAVY.AI Dr. Michael Flaxman's primary research interest is in participatory tools for spatial simulation modeling as applied to the planning and design of cities and regions. He has served on the faculties of MIT, Harvard and the University of Oregon. Dr. Flaxman has practiced GIS-based planning in 17 countries, including one year as a Fulbright fellow in Canada. He previously served as industry manager for Architecture, Engineering and Construction at ESRI, the world’s largest developer of GIS technology. Dr. Flaxman received his doctorate in design from Harvard University in 2001 and holds a master’s in Community and Regional Planning from the University of Oregon and a bachelor’s in biology from Reed College. Emily Fang, Community & Events Manager at HEAVY.AI Emily is a Community & Events Manager at HEAVY.AI. She specializes in the strategy of community partner and technical events, to facilitate developer outreach and to grow the usage of HEAVY.AI's open source technology. She has worked in event programming, marketing, and business development within the tech and travel industry. Prior to that, she was a Community Specialist for Google Hardware and working in partnerships at Booking.com.