Tuesday, July 28, 2015

HBase - Querying through Hive Tables


 In my previous blog link , we discussed about how to load and process the data using Apache Pig and finally load into Hbase table.  In this article we are going to explore how to query the HBase table using Hive. This provides an easy way to provide access to the end-users in your organization.

 Many of you may raise a question why storing the data in HBase and then query through Hive, rather you directly store the data in Hive tables.  The reason is simple, when the data grows huge, the adhoc queries are very slow and some it may make no sense to query from Hive Tables.   Instead HBase promises high speed on Random access of data while querying.


 Now lets go to the technical part..

 1. Let us create an hive table using the HBaseStorageHandler as shown below

hive> 
    > 
    > CREATE EXTERNAL TABLE customers(
    >  custmer_id string,
    >  last_name string,
    >  first_name string,
    >  age int,
    >  skill_type string,
    >  skill_set string)
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, cust_info:last_name, cust_info:first_name, cust_info:age, cust_prof:skill_type, cust_prof:skill_set')
    > TBLPROPERTIES ('hbase.table.name' = 'cust_master');
OK

Time taken: 14.419 seconds


    The key here is to map the Hive table columns accordingly to the columns in HBase Table with the right column family qualifiers as shown above in the  WITH SERDEPROPERTIES  section.


2.  Now you are good to query the data..

hive> select first_name,last_name,skill_type from customers where age >40;
OK
craig woods Tech Skills
lee persons Soft-Managerial Skills
Time taken: 0.463 seconds, Fetched: 2 row(s)



Hope this helps, see you next time with another interesting snippet.





No comments:

Post a Comment