This can be achieved with out ORC file format and transaction=false, can be achieved only when the table is a partitioned table. This is a 2 step process:
1. Create data set with Updated entries using Union of non-updated records and New record in the partition.
select tbl2.street,tbl2.city,tbl2.zip,tbl2.state,tbl2.beds,tbl2.baths,tbl2.sq__ft,tbl2.sale_date,tbl2.price,tbl2.latitude,tbl2.longitude,tbl2.type from (select * from samp_tbl_part where type = "Multi-Family") tbl1 JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 ON tbl1.zip=tbl2.zip ///New Record
UNION ALL
select tbl1.* from (select * from samp_tbl_part where type = "Multi-Family") tbl1 LEFT JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 on tbl1.zip=tbl2.zip where tbl2.zip is NULL; ////Non-updated records
2. Insert overwrite the partition.
Eg:
CREATE EXTERNAL TABLE `samp_tbl_part`(
`street` string,
`city` string,
`zip` string,
`state` string,
`beds` string,
`baths` string,
`sq__ft` string,
`sale_date` string,
`price` string,
`latitude` string,
`longitude` string)
PARTITIONED BY (
`type` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://quickstart.cloudera:8020/user/hive/sampledata/realestate_part';
220 OLD AIRPORT RD AUBURN 95603 CA 2 2 960 Mon May 19 00:00:00 EDT 2008 285000 38.939802 -121.054575 Multi-Family
398 LINDLEY DR SACRAMENTO 95815 CA 4 2 1744 Mon May 19 00:00:00 EDT 2008 416767 38.622359 -121.457582 Multi-Family
8198 STEVENSON AVE SACRAMENTO 95828 CA 6 4 2475 Fri May 16 00:00:00 EDT 2008 159900 38.465271 -121.40426 Multi-Family
1139 CLINTON RD SACRAMENTO 95825 CA 4 2 1776 Fri May 16 00:00:00 EDT 2008 221250 38.585291 -121.406824 Multi-Family
7351 GIGI PL SACRAMENTO 95828 CA 4 2 1859 Thu May 15 00:00:00 EDT 2008 170000 38.490606 -121.410173 Multi-Family
CREATE EXTERNAL TABLE `samp_tbl`(
`street` string,
`city` string,
`zip` string,
`state` string,
`beds` string,
`baths` string,
`sq__ft` string,
`type` string,
`sale_date` string,
`price` string,
`latitude` string,
`longitude` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'='\n',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://quickstart.cloudera:8020/user/hive/sampledata/realestate'
1139 CLINTON RD SACRAMENTO 95825 FL 4 2 1776 Multi-Family Fri May 16 00:00:00 EDT 2008 221250 38.585291 -121.406824
Complete Insert statement:
INSERT OVERWRITE TABLE samp_tbl_part partition (type) select tbl2.street,tbl2.city,tbl2.zip,tbl2.state,tbl2.beds,tbl2.baths,tbl2.sq__ft,tbl2.sale_date,tbl2.price,tbl2.latitude,tbl2.longitude,tbl2.type from (select * from samp_tbl_part where type = "Multi-Family") tbl1 JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 ON tbl1.zip=tbl2.zip
UNION ALL
select tbl1.* from (select * from samp_tbl_part where type = "Multi-Family") tbl1 LEFT JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 on tbl1.zip=tbl2.zip where tbl2.zip is NULL;
really nice information
ReplyDeleteSpark and Scala Online Training
GOOD EXPLAIN
ReplyDeleteBig Data and Hadoop Online Training
APTRON Gurgaon teach you everything about Hadoop right from the earliest starting point to the very advanced level so you can be efficient with your work in challenging situations.
ReplyDeleteFor More Info: Hadoop Course in Gurgaon
Thanks for your information ,We are also providing Hadoop online training in our institute NareshIT. We are having the best and well trained experienced faculty to train you the Hadoop. By joining in this course you will get complete knowledge about Hadoop. we are giving 100% guidance in placement for our students.
ReplyDeleteWebsite : https://nareshit.com/hadoop-online-training/
Thank you for sharing this amazing blog.Keep updating more posts.
ReplyDeletehadoop admin online training
big data online course