Friday, April 7, 2017

ACID Implementation in HIVE

If we have to work with hive tables in the transactional mode we have to use two characteristics below:

– bucketing
– table property transactional=true
– Ambari – Hive – Configs – ACID Transactions = ON



We can test with these commands:


--Sets for update the engine and vectorized processing
set hive.execution.engine=tez;
set hive.vectorized.execution.enabled=true;
set hive.vectorized.execution.reduce.enabled=true;

--------------------------
--Target table to create
--------------------------
drop table tbl1;
create table tbl1
(
f1 int,
f2 string
)
clustered by (f2) into 1 buckets
stored as orc tblproperties ("transactional"="true");


----------------------------------------------------
--Simple load using the transactional way
----------------------------------------------------
insert into table tbl1 values (1, 'line1');
insert into table tbl1 values (2, 'line2');
insert into table tbl1 values (3, 'line3');

--------------------------
--First Result
--------------------------
Select * from tbl1;

1 line1
2 line2
3 line3
Time taken: 0.798 seconds, Fetched: 3 row(s)


--------------------------
--Simple update
--------------------------
update tbl1 set
f1 = 200
where f1 = 2;


--------------------------
--Second Result
--------------------------
select * from tbl1 ;

1 line1
200 line2
3 line3


--------------------------
--Simple delete
--------------------------
delete from tbl1 where f1 = 3;

--------------------------
--Third Result
--------------------------
select * from tbl1 ;
1 line1
200 line2

--------------------------------------------------------------
ACID functionality in HIVE
--------------------------------------------------------------
Add the below properties in "hive-site.xml" file & restart hive server

<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>

<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>2</value>
</property>




CREATE HIVE TABLE WITH CLUSTERED BY, ORC, TBLPROPERTIES
--------------------------------------------------------------
CREATE TABLE IF NOT EXISTS student_acid
( name string, id int, course string, year int )
CLUSTERED BY (name) INTO 4 BUCKETS
STORED AS ORC
LOCATION '/hive/kalyan/student_acid'
TBLPROPERTIES ('transactional' = 'true')
;


INSERT INTO TABLE student_acid VALUES
('arun', 1, 'mca', 1),
('anil', 2, 'mca', 1),
('sudheer', 3, 'mca', 2),
('santosh', 4, 'mca', 2)
;

UPDATE student_acid
SET year = 3, course = 'mech'
WHERE id = 4 ;

DELETE FROM student_acid WHERE name = 'anil';

Note: In cloudera this is restricted and gives the below error.

FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.

Documentation in Cloudera states, Hive ACID is not supported

Hive ACID is an experimental feature and Cloudera does not currently support it.

1 comment:

  1. I’m going to read this. I’ll be sure to come back. thanks for sharing. and also This article gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this nice article... แก้กรดไหลย้อน

    ReplyDelete