Venkat Krishnamurthy

Jan 19, 2021

Announcing OmniSci 5.5

Download HEAVY.AI Free, a full-featured version available for use at no cost.

NOTE: OMNISCI IS NOW HEAVY.AI

‍

We’re very happy to announce OmniSci 5.5, our final release for 2020. With several new capabilities in OmniSciDB and immerse, and also a major new OmniSci Render feature. This release sets the stage for more innovation in the new year across the entire platform. Without further delay, let’s dive right in!

OmniSciDB

Looking back at 2020, our engineering team made major strides in improving the robustness, performance and overall enterprise readiness of OmniSciDB, the core analytical SQL engine of the OmniSci platform. Through releases 5.2 to 5.4, we identified and addressed several intermittent issues related to memory pressure in GPU environments. We also added initial support for parallel executors in the query engine as part of a larger roadmap to better support intensive multi-user workflows. In addition, we made notable performance gains in core query processing and execution - both on CPU and GPU platforms.

In OmniSci 5.5, we added many new features to the query engine focusing both on enterprise readiness and better ecosystem integration.
‍

Apache Arrow over the wire for query results

Since its origin in 2016, Apache Arrow has quickly emerged as a major building block for modern analytical systems infrastructure - from databases to processing engines. The Arrow project provides high-performance building blocks for representing and transporting data used in analytics workflows, and also a multitude of efficient language bindings.

OmniSci was one of the earliest adopters of Apache Arrow as a way to interface with other systems. However, this initial capability was restricted to shared memory-based usage of Arrow (i.e, it required the OmniSciDB server and the consuming client to both exist on the same machine). With 5.5, we’re thrilled to announce that we now support results in Arrow format over the wire, in addition to our existing thrift-based result set serialization. The performance we’re seeing is extremely promising, even without any of the optimizations we’re planning to undertake in the new year.
‍

We’ve added support for this to our MapDConnector javascript API as of 5.5, and are working on integrating it into pymapd and our other APIs in subsequent releases.

Here is a simple example of using the queryDFAsync call in MapDConnector to get results back in Arrow format.
‍

We’re also working together with Dominik Moritz on updating the Falcon project he created for high-speed linked brushing/navigation to use Arrow from OmniSci.

In addition to the above, our partner Intel added support for low-overhead Arrow-based ingest to OmniSciDB. This will soon allow upstream systems to provide data to OmniSci as Arrow buffers. For example, Snowflake already supports this in their outbound JDBC and python connectors. We plan to make this available more widely in the new year.
‍

Query interrupt capability

Our customers love OmniSci for how it allows them to ask complex questions of their biggest datasets interactively with standard SQL. Yet, sometimes even the most well-intentioned analytic queries can go awry, and when you’re dealing with billion-row datasets on shared infrastructure, this can adversely impact cluster performance.

With 5.5, we have added the ability to interrupt queries within a session, using a set of flags - enable-runtime-query-interrupt, to allow query interrupt capability to be turned on, pending-query-interrupt-freq, running-query-interrupt-freq that define the frequency with which the query engine checks for either pending or running queries respectively.

For now, this will kill all queries associated with a specific session - with Immerse being the common use case. Closing an Immerse dashboard can be configured to now kill all outstanding queries associated with that dashboard (equivalently, the session corresponding to the dashboard). Administrators can use omnisql to kill queries for multiple sessions.

We’re continuing to refine this capability to check for interrupts at other key points in the query lifecycle beyond kernel execution, as well as making it generally more seamless, and ultimately transparent, to the end user.
‍

Concurrent queries and Update/Delete operations.

Building further on the roadmap to support multi-user scenarios, OmniSciDB now supports the ability to run SELECT queries concurrently with UPDATE and DELETE operations.

For now this ability is restricted to single node(i.e non-distributed) installations of OmniSci. We’ll remove this restriction in an upcoming version.
‍

Storage and IO improvements

In 5.5, we are rolling out a significant change to the storage layout of the OmniSciDB engine. Specifically, we added a MAX_ROLLBACK_EPOCHS parameter, providing the ability to limit the number of rollback states we keep on disk (this was essentially unlimited earlier, and while it still remains unlimited by default, it can now be overridden to any number of rollback epochs a user with sufficient permissions desires).

For example, to cap a table to be able to rollback a maximum of ten epochs, which also limits the amount of associated metadata stored, one can run:
‍
ALTER TABLE foo SET MAX_ROLLBACK_EPOCHS = 10;

This change has multiple benefits. First, it allows OmniSciDB to support more frequent inserts and updates - a pattern seen in streaming workflows that may regularly append small batches of data relatively frequently (e.g. once every 15 mins), or scenarios involving rolling updates. Next, this limits the overhead involved in accessing table metadata. This addresses a problem where tables that were in use for multiple years accumulated a lot of metadata that in turn affected query performance, particularly with storage with lower IOPS (for example, on lower performance tiers of cloud storage)

A key caveat here is that this change is not backwards compatible - once upgraded, versions prior to 5.5 cannot be rolled back. The migration to the new layout is itself automatically performed as part of the upgrade. So (and this is extremely critical!) - please plan accordingly, and make backups of your data directories before upgrading to 5.5. Enterprise customers can consult their support contact for guidance in this process.

Other notable fixes in the storage and IO area include a VALIDATE command that identifies out of sync epochs. In addition, we now have a SHOW TABLE DETAILS command that displays detailed, low-level storage information for a table.
‍

Other improvements in OmniSciDB

5.5 includes several other notable improvements in OmniSciDB.

We continue to improve on our support for User-Defined Functions and Table Functions in OmniSci. In 5.5, UDFs now support NONE-encoded strings. We added support for aggregates in UDTFs, as well as supporting multiple table inputs into the same UDTF.

We now support inserting a subset of columns in an existing table, and being able to change the order of inserted columns - available in our JDBC and binary load interfaces. Customers find this useful during append operations to existing tables with several columns, where they don’t want to specify all the columns in order to insert just a few. We are working to add this feature for SQL INSERT statements and it will be available in a future release.

OmniSci Render

An oft-overlooked feature of OmniSci’s architecture is the ability to execute queries on CPUs as a fallback mechanism. The OmniSciDB engine detects the presence/absence of GPUs and automatically switches execution accordingly, and this can also be switched manually from omnisql, our command line utility.

However, this has so far been an all-or-none proposition. Both the query engine and rendering engine run on GPUs by default - so if you run OmniSci on a single GPU machine, the memory is split between query execution, rendering and the data itself. On Nvidia GPUs with lower GPU memory capacity, this may result in memory pressure sooner because of larger datasets or complex queries. On the other hand, switching to CPU means that the query engine switched to CPU execution while the rendering engine is disabled since we lack support today for CPU-based rendering.

With OmniSci 5.5, we’ve made a major improvement in this regard. We decoupled the query engine from the rendering engine so that you can execute queries completely in CPU mode, while the GPU is used exclusively for rendering. To do this, you can either set cpu-only=’true’ on render-enabled builds, or use a special CPU-only render-enabled build (currently the latter is more performant, so please speak to your support contact if you’d like access).

Of course, this means some performance tradeoff, but it opens up greater flexibility in terms of infrastructure choices, particularly at smaller scales. For example, some AWS g4 instances pair a single Nvidia T4 GPU (16GB GPU RAM) with up to 96 cores and 256GB of DRAM. Depending on your workload, you can use these instances to get greater headroom from your infrastructure and lower your total cost of ownership (TCO).

This is just the beginning - we have a lot more in store on rendering improvements in the new year, following a major overhaul of our rendering architecture to leverage the revolutionary Vulkan framework!

OmniSci Immerse

Meanwhile, we added some notable new features in Immerse as well. In 5.5, you can now annotate data on the (new) Combo Chart. You can enable, add, edit and delete annotations and use them to illustrate key points about your data. Note, these annotations are attached to the data, so they move into or out of view depending on whether the data underlying the annotation is itself visible, during filter operations.
‍

We also made several new improvements to the new combo chart. Of note, the new combo chart now supports the often-used zoom feature from the original combo.
‍

Also, we added the ability to migrate an older combo/bar/histogram/stacked bar chart to the new combo chart (currently at an individual chart level). Our goal is to have the infrastructure of the new combo chart, based on Vega, be the foundation for all these existing, discrete chart types.
‍

In addition, Immerse includes numerous smaller features and performance improvements, including additional formatting controls related to margins between charts and attribute formatting in popups.

Data Science

Last July, we launched an experimental preview of OmniSci for Mac, as a way to scale down the complete OmniSci platform to run on a desktop or laptop. Given the ongoing interest we’ve seen, we continue to update the preview with each release and also extend the license expiration (it’s definitely a long preview!)

With 5.5, we’re also bringing our data science capabilities and tools to the Mac Preview! We’ll have a detailed blog about this very soon, but you will be able to use your Mac Preview for exploring data with Immerse, or use our integrated data science tools for Python to dive in deeper using JupyterLab, Ibis, Altair, Holoviews, Prophet and other tools.

Wrapping Up

As always, you can try OmniSci Enterprise, experiment with the OmniSci Mac Preview or get the code for OmniSciDB on github. Read the release notes and documentation for 5.5, and give us your feedback on our community forums.

Also, please stay tuned for what promises to be an exciting 2021 in our mission to make analytics instant, powerful and effortless for everyone.

Venkat Krishnamurthy

Filter posts by Category

Featured Posts

12 Data Visualization Color Palettes for Telling Better Stories with Your Data

Put a Hex on it: Introducing new Uber H3 Capabilities

Connect the Dots in Real-Time: Benchmarking Geospatial Join Performance in GPU-Accelerated HeavyDB against CPU databases

Empowering Discovery through Activity-Based Intelligence and AI