HBase
Architecture:Master Slave Architecture
3 Major components:
-> Region Servers - Responsible to serve data to clients. Equivalent as data dones in HDFS.
-> Zookeeper maintains cluster state. Zookeeper ensemble is usually configured to maintain the state.
-> HMaster - As master system in the cluster.
Asssign regions
Load balancing
fault tolerance
health monitoring
Region servers those run on datanode machines will send heartbeats to zookeeper nodes. HMaster listens to heartbeats of Region Servers from zookeeper. Incase, heartbeat is not received for 3 seconds then HMaster treats the Region server as down.
Only 1 HMaster is always active, if active HMaster is down the inactive HMaster will become Active.
- HBase tables are horizontally divided into regios. Defaultregion size = 1GB
- A Single Region Server can have multiple regions of same table or different tables.
- Max number regions for a Region Server = 1000
- Regions of same table can be in same region server or different region server.
- Initially these regions will be allocated in same Region server, later for better load balancing purpose newly allocated region will be moved to another region server.
Writing Process to HBase:
Key components- WAL - Client when writes will write to WAL. Although it is not the area where the data is stored, it is done for the fault tolerant purpose. So, later if any error occurs while writing data, HBase always has WAL to look into.
- Memcache - Later WAL writes record to memcache. Memcache caches all write and edited records. Once memcache limit is reached, all data will be flushed into HFile and memcache becomes empty. As memcache gets filled, those many HFile will get created and in this way multiple HFiles will get generated. A single region can have multiple memcaches.
- HFile - Actual data is stored in these and are in HDFS
Hadoop is bad for small files, so comes Minor compaction into picture.
Minor Compaction: Merge all small files into one big file.
https://www.edureka.co/blog/hbase-architecture/
Writing data to HBase via
1. Inserting data to HBase table via hbase shell
Put command: put ’<table name>’,’row1’,’<colfamily:colname>’,’<value>’
Eg:
hbase(main):005:0> put 'emp','1','personal data:name','raju'
0 row(s) in 0.6600 seconds
hbase(main):006:0> put 'emp','1','personal data:city','hyderabad'
0 row(s) in 0.0410 seconds
hbase(main):007:0> put 'emp','1','professional
data:designation','manager'
0 row(s) in 0.0240 seconds
hbase(main):007:0> put 'emp','1','professional data:salary','50000'
0 row(s) in 0.0240 seconds
Read data:
get command: get ’<table name>’,’row1’
eg:
hbase(main):012:0> get 'emp', '1'
Read specific column: get 'table name', ‘rowid’, {COLUMN ⇒ ‘column family:column name ’}
eg:
hbase(main):015:0> get 'emp', 'row1', {COLUMN ⇒ 'personal:name'}
Read complete table data: scan 'emp'
Update Data: update an existing cell value using the put command
put ‘table name’,’row ’,'Column family:column name',’new value’
Eg:
hbase(main):002:0> put 'emp','row1','personal:city','Delhi'
Delete using delete command
Drop HBase table: disable it and then drop
hbase(main):018:0> disable 'emp'
0 row(s) in 1.4580 seconds
hbase(main):019:0> drop 'emp'
2. Spark and JAVA APIs.
API function used to insert data to HBase is Put()
Put() Sample code:
// Instantiating Configuration class
Configuration config = HBaseConfiguration.create();
// Instantiating HTable class
HTable hTable = new HTable(config, "emp");
// Instantiating Put class
// accepts a row name.
Put p = new Put(Bytes.toBytes("row2"));
// adding values using add() method
// accepts column family name, qualifier/row name ,value
p.add(Bytes.toBytes("personal"),
Bytes.toBytes("name"),Bytes.toBytes("raju2"));
p.add(Bytes.toBytes("personal"),
Bytes.toBytes("city"),Bytes.toBytes("hyderabad2"));
p.add(Bytes.toBytes("professional"),Bytes.toBytes("designation"),
Bytes.toBytes("manager2"));
p.add(Bytes.toBytes("professional"),Bytes.toBytes("salary"),
Bytes.toBytes("60000"));
// Saving the put Instance to the HTable.
hTable.put(p);
Also Bulkput() is also available which does bulk insersion of data:
List<String> list= new ArrayList<String>();
list.add("1," + columnFamily + ",a,1");
list.add("2," + columnFamily + ",a,2");
list.add("3," + columnFamily + ",a,3");
list.add("4," + columnFamily + ",a,4");
list.add("5," + columnFamily + ",a,5");
JavaRDD<String> rdd = jsc.parallelize(list);
Configuration conf = HBaseConfiguration.create();
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
hbaseContext.bulkPut(rdd,
TableName.valueOf(tableName),
new PutFunction());
public static class PutFunction implements Function<String, Put> {
private static final long serialVersionUID = 1L;
public Put call(String v) throws Exception {
String[] cells = v.split(",");
Put put = new Put(Bytes.toBytes(cells[0]));
put.addColumn(Bytes.toBytes(cells[1]), Bytes.toBytes(cells[2]),
Bytes.toBytes(cells[3]));
return put;
}
}
3. Bulk upload
using TSVimport commandline.
Wow this was so awesome blog. You have done a great job. Thanks for sharing...
ReplyDeleteHadoop Big Data Classes in Pune
Hadoop Big Data Training in Pune
ReplyDeleteExcellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking. Big data hadoop online training
We at COEPD provides finest Data Science and R-Language courses in Hyderabad. Your search to learn Data Science ends here at COEPD. Here, we are an established training institute who have trained more than 10,000 participants in all streams. We will help you to convert your passion to learn into an enriched learning process. We will accelerate your career in data science by mastering concepts of Data Management, Statistics, Machine Learning and Big Data.
ReplyDeletehttp://www.coepd.com/AnalyticsTraining.html
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteBig Data Hadoop training in electronic city
I like your attempt in providing good content. We have a similar site where we also provide good information on Big Data Hadoop
ReplyDeleteThanks for sharing this post.Learnoa has emerged out as a leading training platform for various online courses in the entire e-learning industry.
ReplyDeletePlz visit:-
Big data and Hadoop