Friday, April 7, 2017

A use case of MapReduce implementation

USECase 1:

Transferring data from RDBMS to HBase:

Case: Transactional Data is in RDBMS

Input in RDBMS- THis data is in multiple RDBMS tables and it has to be converged to HDFS,

Step 1:

1. Open MySql and create tables,


mysql -u root -p
 
(By default the password is empty and I have changed it to cloudera)
 
create table trans1(accno int, amount int);
insert into table trans1 values(101, 3000);
insert into trans1 values (102, 2000);
 
create table trans2(accno int, amount int) values(101, 5000);
insert into trans2 values(101, 5000);
insert into trans2 values(102, 1000);
 
create table trans3(accno int, amount int);
insert into trans3 values(103, 1000);
 

2. Importing data via Sqoop

 
////////This will import the table data to Transaction_db2 by creating directory Transaction_db2 //////

sqoop import --connect jdbc:mysql://localhost/Leela_db --username root --password cloudera \
--table trans2 --m 1 \
--target-dir 'hdfs://quickstart.cloudera:8020/user/Leela/Hive/Transaction_db2'
 
//////////NOw we need to append the table data trans3 to the existing file, so use --append keyword
///////THis will create 2 more files in the same directory.
////THis means each table data will be copied as seperate files////

sqoop import --connect jdbc:mysql://localhost/Leela_db --username root --password cloudera \
--table trans3 --m 1 \
--target-dir 'hdfs://quickstart.cloudera:8020/user/Leela/Hive/Transaction_db2' --append
 
sqoop import --connect jdbc:mysql://localhost/Leela_db --username root --password cloudera \
--table trans3 --m 1 \
--target-dir 'hdfs://quickstart.cloudera:8020/user/Leela/Hive/Transaction_db2' \
--append --query="select accno,amount from trans2 and \$CONDITIONS"

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ TO append the new table values to tthe existing file
///////Need to use --check-column <columnname> --incremental append


sqoop import --connect jdbc:mysql://localhost:3306/Leela_db --username=root \
--password=cloudera --query="select * from trans3 where accno > 100 AND \$CONDITIONS" \
--append --target-dir 'hdfs://quickstart.cloudera:8020/user/Leela/Hive/Transaction_db2' \
--num-mappers 1 --check-column "accno" --incremental append --last-value 102
 
AccNo    amt
101    3000
102    2000
103    1000
101    5000
102    1000

Output in Hbase
101  3000,5000
102  2000,1000
103  1000

No comments:

Post a Comment