This can be achieved with out ORC file format and transaction=false, can be achieved only when the table is a partitioned table. This is a 2 step process:
1. Create data set with Updated entries using Union of non-updated records and New record in the partition.
select tbl2.street,tbl2.city,tbl2.zip,tbl2.state,tbl2.beds,tbl2.baths,tbl2.sq__ft,tbl2.sale_date,tbl2.price,tbl2.latitude,tbl2.longitude,tbl2.type from (select * from samp_tbl_part where type = "Multi-Family") tbl1 JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 ON tbl1.zip=tbl2.zip ///New Record
UNION ALL
select tbl1.* from (select * from samp_tbl_part where type = "Multi-Family") tbl1 LEFT JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 on tbl1.zip=tbl2.zip where tbl2.zip is NULL; ////Non-updated records
2. Insert overwrite the partition.
Eg:
CREATE EXTERNAL TABLE `samp_tbl_part`(
`street` string,
`city` string,
`zip` string,
`state` string,
`beds` string,
`baths` string,
`sq__ft` string,
`sale_date` string,
`price` string,
`latitude` string,
`longitude` string)
PARTITIONED BY (
`type` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://quickstart.cloudera:8020/user/hive/sampledata/realestate_part';
220 OLD AIRPORT RD AUBURN 95603 CA 2 2 960 Mon May 19 00:00:00 EDT 2008 285000 38.939802 -121.054575 Multi-Family
398 LINDLEY DR SACRAMENTO 95815 CA 4 2 1744 Mon May 19 00:00:00 EDT 2008 416767 38.622359 -121.457582 Multi-Family
8198 STEVENSON AVE SACRAMENTO 95828 CA 6 4 2475 Fri May 16 00:00:00 EDT 2008 159900 38.465271 -121.40426 Multi-Family
1139 CLINTON RD SACRAMENTO 95825 CA 4 2 1776 Fri May 16 00:00:00 EDT 2008 221250 38.585291 -121.406824 Multi-Family
7351 GIGI PL SACRAMENTO 95828 CA 4 2 1859 Thu May 15 00:00:00 EDT 2008 170000 38.490606 -121.410173 Multi-Family
CREATE EXTERNAL TABLE `samp_tbl`(
`street` string,
`city` string,
`zip` string,
`state` string,
`beds` string,
`baths` string,
`sq__ft` string,
`type` string,
`sale_date` string,
`price` string,
`latitude` string,
`longitude` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'='\n',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://quickstart.cloudera:8020/user/hive/sampledata/realestate'
1139 CLINTON RD SACRAMENTO 95825 FL 4 2 1776 Multi-Family Fri May 16 00:00:00 EDT 2008 221250 38.585291 -121.406824
Complete Insert statement:
INSERT OVERWRITE TABLE samp_tbl_part partition (type) select tbl2.street,tbl2.city,tbl2.zip,tbl2.state,tbl2.beds,tbl2.baths,tbl2.sq__ft,tbl2.sale_date,tbl2.price,tbl2.latitude,tbl2.longitude,tbl2.type from (select * from samp_tbl_part where type = "Multi-Family") tbl1 JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 ON tbl1.zip=tbl2.zip
UNION ALL
select tbl1.* from (select * from samp_tbl_part where type = "Multi-Family") tbl1 LEFT JOIN (select * from samp_tbl where type = "Multi-Family") tbl2 on tbl1.zip=tbl2.zip where tbl2.zip is NULL;