HEAVY.AI Big Data Integration

The HeavyDB open source database acts as a hot cache for analytical datasets and is capable of ingesting millions of records a second.

Seamless Big Data Integration Tools

Today’s data managers are challenged with a growing ecosystem of data sources and warehouses, making big data integration challenges more complex than ever. Your data lives in many data warehouses and data lakes; it continually flows in through streams or rests as point-in-time files. Regardless of the source, HEAVY.AI big data integration tools easily handle data ingestion of millions of records per second into the HeavyDB open source SQL engine.

Streaming Big Data Integration

Modern big data integration and processing tools must integrate with a wide variety of data sources and networks. Streaming data originates from sensors, network logs, social media, and web clickstreams from all over the globe. This can produce billions of records per week for large organizations. Streaming ingest engines, such as Apache Kafka, organize and distribute this information before finally funneling it into storage.

Although many big data integration platforms offer automated streaming data analytics tools, only HEAVY.AI can ingest this volume of data and make it available for interactive exploration by business analysts. HEAVY.AI provides an easy to use utility for Kafka data integration, allowing you to connect to a Kafka topic for real-time consumption of messages and rapid loading into a HEAVY.AI target table.

Integrating Big Data with Data Warehouses

Most of the world’s data is at rest, stored in data warehouses, enterprise databases, or Hadoop data lakes. The vast majority of this data has never been explored or analyzed, and it represents an incredible amount of untapped insight. HEAVY.AI easily supports batch import of data at rest, via these methods:

For Delimited Files:

  • Consume files such as CSV or TSV easily into HeavyDB using OmniSciql.
  • HeavyDB can import compressed files in TAR, ZIP, 7-ZIP, RAR, GZIP, BZIP2, or TGZ formats.

From Data Lakes or Data Warehouses:

  • Pull data from Apache Hadoop Distributed File System (HDFS) or from structured data warehouses with Apache Sqoop.

Get the HEAVY.AI Whitepaper

Learn more about putting an end to Indexing, Down-sampling, and Pre-aggregating data.

Download Now