Hive vs. HBase

Summary

Hive and HBase are two different Hadoop based technologies

  • Hive is an SQL-like engine that runs MapReduce jobs

  • HBase is a NoSQL key/value database on Hadoop.

  • Hive can be used for analytical queries while HBase for real-time querying.

NoSQL and Big Data Processing

  1. RDBMS

  2. NoSQL

  3. HBase is an open-source, distributed, column-oriented database built on top of HDFS based on BigTable!

  4. Hive: data warehousing application in Hadoop

    • Query language is HQL, variant of SQL

    • Tables stored on HDFS as flat files

  5. Hive + HBase. Reasons to use Hive on HBase:

    • A lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis

    • Give access to data in HBase usually only queried through MapReduce to people that don’t code (business analysts)

    • When needing a more flexible storage solution, so that rows can be updated live by either a Hive job or an application and can be seen immediately to the other

References