Hive vs. HBase#
Summary#
Hive and HBase are two different Hadoop based technologies
Hive is an SQL-like engine that runs MapReduce jobs
HBase is a NoSQL key/value database on Hadoop.
Hive can be used for analytical queries while HBase for real-time querying.
NoSQL and Big Data Processing#
RDBMS
NoSQL
HBase is an open-source, distributed, column-oriented database built on top of HDFS based on BigTable!
Hive: data warehousing application in Hadoop
Query language is HQL, variant of SQL
Tables stored on HDFS as flat files
Hive + HBase. Reasons to use Hive on HBase:
A lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis
Give access to data in HBase usually only queried through MapReduce to people that don’t code (business analysts)
When needing a more flexible storage solution, so that rows can be updated live by either a Hive job or an application and can be seen immediately to the other