Primary Key creation in HIVE.
There are 2 ways to create Primary key in HIVE
OPTION 1:
Using reflect UDFEg: select reflect("java.util.UUID", "randomUUID"),name, deptnum, branch, year from studentdata;
Disadvantage: Mappers execute in parallel so there is chance of repetative UUIDs.
OPTION 2:
Creating a column by concatning 2 rows to have a meaningful nameEg: select name, deptnum, hash(concat(name,year)) from studentdata;
Steps:
1. Create a Temporary External table with all the required fields and this table does not have Primary Key and
Eg: create External table studentdata(name string, dept INT, course string, year int) ...
2. Now create the main table,
create table studentdata_PK(rowid_pk string, name string, dept INT, course string, year int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> stored as AVRO
> location "/user/Leela/Hive/Student_AVRO";
3. Inserting data into the main table, OPTION 2 implemented in this case.
insert into studentdata_PK SELECT concat(name,year), * from studentdata;
Primary Key/Unique id creation in Spark
Get the existing max value from the Sink Table
val p = sinkdf.agg(max(colName)).rdd.map(x => x.mkString).collect
var maxval: String = p(0).toString
val windowSpec = Window.orderBy("ID")
srcdf.withColumn("Random_IDN", lit(row_number().over(windowSpec) + lit(maxval).cast(LongType)))
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/