Venkat Krishnamurthy
Apr 22, 2021

Announcing OmniSci Version 5.6

Try HeavyIQ Conversational Analytics on 400 million tweets

Download HEAVY.AI Free, a full-featured version available for use at no cost.

GET FREE LICENSE

NOTE: OMNISCI IS NOW HEAVY.AI

We’re very happy to announce the release of OmniSci 5.6. 

Following our earlier announcements of OmniSci Free, and updates to the OmniSci Desktop Edition for Mac, OmniSci 5.6 is one of the most significant releases in our product’s history. In particular, we added a host of powerful new features in Immerse, as well as made incredible strides in performance on our geospatial operations within the OmniSciDB engine among a long list of notable features, bug fixes and performance improvements. You can download OmniSci Free today and try the new release for yourself here.

Let’s dive in and take a deeper look!

OmniSci Immerse

Parameterized data visualization

From inception, Immerse has leveraged the power of high-performance SQL in OmniSciDB under the hood to allow interactive data exploration at scale. Crossfilter, interactive zooming and panning and cohort analytics on spatiotemporal data are all trademark features of Immerse that allow users to go beyond just static charting and truly explore their data at the speed of their curiosity, while Immerse handles the generation of SQL under the hood. We’ve heard customers - whether they are business analysts or data scientists, point out how powerful this ‘no-code’ experience allows them to go from loading large, billion-row datasets into OmniSci to useful insights on an Immerse dashboard, literally in a matter of seconds. 

Yet, there are times when you want to go deeper than this. Often, you may need to further customize your charts or dashboard, or create more structured explorations such as comparative or ‘what-if’ type scenario analysis. Immerse allows this too - when setting up any new chart, you can already reach under the hood and create custom SQL expressions for measures and dimensions. While this capability is already powerful and widely used, we’ve always known that we could go further to allow users more dynamic ways to provide and control data inputs.

One way to approach this, is to think of any data visualization you’re building in terms of a functions (in programming terms), that can take different inputs and produce corresponding different results, which in this case are basically the charts themselves. In OmniSci, this analogy is even more apt because charts essentially wrap a SQL template in the background. That is, the underlying query template created by a chart is itself a function which can take a variety of user-specified inputs or parameters. 

Parameters are somewhat similar to filters today, in that a user can switch filter values at runtime. The key difference from filters though, is that because parameters are lower-level, they can actually change the underlying SQL query template itself in significant and useful ways.

In OmniSci 5.6, we are thrilled to finally land this incredibly powerful capability within Immerse. With 5.6, you can parametrize custom measures, dimensions and soon, other key parts of the chart and dashboard (including chart titles). This is enabled by new UI elements that allow you to define parameters, their corresponding expressions and subsequently to drive them from the newly added parameter side panel. 

Here’s a quick example below with data from GitHub. Notice how you can parametrize the value used to construct a bubble chart, and subsequently pick it from a dropdown list. More generally, parameters are typed - so that you automatically get sliders for numeric parameters, drop down selectors as appropriate for column-based parameters and so on. 


A common use case for parameters is beyond their use in a single dashboard - in particular, to keep the value of a chosen parameter consistent across different dashboards. Immerse 5.6 allows this too - you can link or unlink parameters so that their value stays consistent across different charts.

Improved custom SQL dialog

As part of the work to support parameterization, Immerse in 5.6 gets a new custom SQL dialog for measures and dimensions, replacing the previous inline editor. This allows you to pick column names, supports sql validation and the use of parameters within the dialog.



Angle orientation in pointmap charts

Immerse is the most powerful tool yet when it comes to analyze large spatiotemporal data. A common scenario is datasets that contain GPS trace data for moving entities - commonly vehicles, which often includes additional information like heading and speed. With 5.6, it is now possible to use heading information in rendered point maps to show direction. You can choose a wedge (automatically), or arrow symbol to visualize direction, making for richer visualizations. 


Approximate Median as aggregation type

Many of our customers have long requested support for Median as an aggregation type in addition to the ones available in Immerse today. Much more on this later in the OmniSciDB section, but with 5.6, Immerse supports a Median aggregation type. Please use this with care on charts, especially on large datasets. The median operation is computationally intensive (even in its approximate form), and we’re working on optimizing this capability in OmniSciDB in follow-on releases.



Other improvements

Besides the above highlighted features, there are several smaller but notable improvements in Immerse 5.6. 

Improved error handling - Earlier, a ‘broken’ chart on a dashboard as a result of a query issue would cause dashboard-level errors. As of 5.6, these errors are localized to the chart container. Here’s a (contrived) example where we change the name of a column underlying a chart - observe how the error message is now shown in place in the chart container.



Configurable start of week - Another long standing request from customers relates to configuring the day to represent the start of the week. You can now set this up in servers.json via the ui/start_of_week flag. 

Combo zoom/pan performance We’ve also made notable performance improvements in the new combo chart, in particular on zoom/pan operations.

Finally, Immerse 5.6 has a long list of bug fixes, for which you can read the details in the release notes.

OmniSciDB

Release 5.6 sees a raft of notable foundational improvements to storage, and query performance in the OmniSciDB engine. Let’s take each in turn:

Query Engine

Approximate median operator

As mentioned in the Immerse section, OmniSciDB now supports an approximate MEDIAN operator in aggregation queries, based on the t-digest based quantile estimation algorithm

Note that this implementation is currently focused only on the median operation, and restricted to single-node, and will execute on CPU, but we are working to remove these limitations soon in upcoming releases, and subsequently add support for general quantiles besides median as well. Also, the median operation involves expensive computation so please use this operator carefully especially within Immerse, when running against very large datasets.

Significant performance improvements in core geospatial joins

With 5.6, the performance of key geospatial joins (ST_CONTAINS, ST_INTERSECTS) has been significantly improved via optimized algorithms and a hash framework for range joins, which is now on default for point-in-polygon joins. 

Depending on the dataset cardinalities and the complexity of the polygons  involved, this can range anywhere from 10-100x speed up in certain cases. We are now working to further extend these improvements to other parts of the join framework in the 6.0 timeframe, including for 1-D range joins such as on time series data.

We plan to share the details of this in a dedicated engineering blog post, so stay tuned!

Further improvements in query interrupt functionality

Building on the initial work in OmniSci 5.5 to support interrupt of long-running queries, 5.6 adds support for non-kernel interrupt, which refers to lifecycle stages of query processing around the core query kernel. This allows interrupting load operations such as LOAD, CTAS, ITAS and also during result set reduction of queries. The corresponding enable-non-kernel-time-query-interrupt, is on by default.

Partial column specification for INSERT and INSERT FROM SELECT 

Beginning with 5.6, the query engine supports partial column specification for INSERT and INSERT FROM SELECT. NULL values will be added to columns not specified in the statements.

Additional query engine features/improvements

OmniSciDB 5.6 adds support for CURRENT_TIMESTAMP, CURRENT_DATE and CURRENT_TIME functions.
 

Storage and IO performance improvements

Performance and efficiency improvements around metadata management 

Continuing work done in OmniSci 5.5, there are several key improvements in the storage subsystem that contribute to both performance and stability overall. 

First, we now default max_rollback_epochs to a default value of 3 on new CREATE TABLE commands. Originally landed in 5.5, this caps the number of metadata epochs maintained by the system, and makes for faster startup time especially on tables that are actively updated/inserted to. Further, you can set and maintain this for existing tables as well with ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=<value> command. 

Next, OmniSciDB now allows automatic storage metadata updates and vacuuming. On update and delete queries, OmniSciDB will update metadata, vacuum, and re-use space as specified. This is configurable to be automatic via the enable-auto-metadata-update server flag. Note that any subsequent disabling of this flag can cause stale metadata accumulation and reduced query performance. The default threshold is when at least 10% of a fragment contains deleted rows and can be overridden with the vacuum-min-selectivity server option. Finally, you can initiate a manual compaction of unreclaimed space by using the OPTIMIZE command with a vacuum option (Please do not run optimize excessively during peak query load, given its potentially notable impact on performance)

Overall beginning with 5.6, the system will set and use appropriate defaults for the above flags. Also, system startup time should improve significantly for installations as a result of the above changes, by avoiding extra metadata reads.

For existing installations we recommend the following sequence of steps

  1.  Use ALTER TABLE <table> SET MAX_ROLLBACK_EPOCHS=3; on existing tables to cap the space usage. 
  2. Run OPTIMIZE TABLE [<table>] WITH (VACUUM='true'); to reclaim space. 

In order to manually configure the above flags appropriately, it is better to first profile the kinds of queries that are frequently executed on the system over an extended period, particularly updates and deletes. Once you have this information, you can combine these flags to appropriately tune for the best combination of values for the above. 

Storage and IO performance improvements

Here are some smaller yet notable enhancements and performance improvements. 

  • Loading high cardinality dictionary-encoded string columns should be significantly faster because of new optimizations in how the loader resizes hash tables
  • Support for narrowing casts (e.g. from DOUBLE to FLOAT types) in SQLImporter
  • Improved load performance through reduced contention on systems with large core counts, along with a maximum cap on 32 threads dedicated to the load process.

Wrapping up

A big thank you to our always incredible engineering team for pulling off a release of this scope and scale to kick off 2021 across both Immerse and OmniSciDB.

You can download OmniSci 5.6 for free from our downloads page. We’ll shortly update the Mac preview version as well. We recommend you backup your data directories for prior versions as a matter of good practice before any upgrade. Enterprise customers can reach out to OmniSci Support for help with any questions. 

Please visit our community forums to let us know your valuable feedback and for answers to any questions!

Venkat Krishnamurthy

Venkat heads up Product Management at HEAVY.AI. He joined OmniSci from the CTO office at Cray, the supercomputing pioneer, where he was responsible for leading Cray’s push into Analytics and AI. Earlier, he was Senior Director at YarcData, a pioneering graph analytics startup, where he bootstrapped product and data science/engineering teams. Prior to YarcData, he was a Director of Product Management at Oracle, where he led the launch of the Oracle Financial Services Data Platform, and earlier spent several years at Goldman Sachs, where he led one of the earliest successful projects utilizing machine learning in Operational Risk incident classification. Venkat is a graduate of Carnegie Mellon University and the Indian Institute of Technology Chennai, and also a certified Financial Risk Manager.