Tuesday, July 11, 2017

RedShift Part1

Architecture:


Leader Node and compute nodes.

Compute nodes are same as data nodes in hadoop where data is saved.

Data is stored in compute nodes in the form of slices

Video 18: Data loading

Services are:
AWS Datapipeline
RDS SYNC


Video 19: Data distribution concepts


Even distribution
Key Distribution
All Distribution type - Similar as distributed cache in hadoop. The data would be copied to every compute node and will be used by all the slices.
Ideally the All distribution table should be heavily used and small, Relatively static(not frequently changed) and frequently joined.

Video 20 & 21: Redshift Basic usage


Exps:
sum(list),sum(colname)
operators:
"+","*","&","XOR","||" (Same as concat)
eg: fname || " " lastname || ":" avg(scores)

Syntax is almost same as SQL, however with minor changes.

create tble1(o_id integer not null,o_time timestamp not null, store_id integer,
product varchar not null    //No arrays
primary key(o_id))
distkey(store_id),    //This does Key distribution based on store_id
compound sortkey(store_id,o_id);    //sorts as per store_id and then o_id.

insert into tble1 values(1234,1435261730,885,"jkhg");

insert into tble1 values(1234,1435261730,885,"jkhg"),(1235,1435261730,888,"k;sg");   //Multiple insert values

copying data from other sources:

copy [table_name] from '[emr|dynamodb|s3://path/to/asset]' credentials

can load JSON,CSV,

Video 23: Creating a Redshift cluster


 A4 node cluser would cost $1/hour. After creation if turned off and on for 3 times in an hour then the charge would be $3. Need to use atleast for an hour after turning on.

Creating cluster.

Followed free trail path.

user: admin
password: Leelaadmin1

User name Password Access key ID Secret access key Console login link
------------------------------------------------------------------------------------------------------------------------------------
vish-admin }[!J6dh[My-d AKIAITSNVUXG3ZX3DN3A doU9Y/RSrL7EFoh7RP0LvQPPYWave8/o0Pvf4Qra https://303343717363.signin.aws.amazon.com/console
leela2-user NdtwdD$!aYe] AKIAJJXKTMBPEHGQQUTQ WXEEoxC4gAfIQcKswXFP/ShHkP/EmKbubTHBrVBJ https://303343717363.signin.aws.amazon.com/console
leela-admin O8+ipsq{UONX AKIAJR65X2FNTIQPLZVQ UACFV4mzBdssys9eJDRdGOSUPO4l6nbMb9fH8/Px https://303343717363.signin.aws.amazon.com/console


Launch Cluster

Step 2: Open a new link for VPC _ Security group

Step 3: Create users.

Open a new Amazon console link -> Identity & Access Management

Create users -> Policies -> attach policies

Step 4: Connecting using SQL work bench

In Redshift page -> connect client -> download JDBC jar file. (http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html#download-jdbc-driver)

Install SQL Workbench -> load the downloaded JAR file for connecting to redshift cluster.

connect with cluster user id and password which are:
user: admin
password: Leelaadmin1

Below link has some common commands for working on this.
https://drive.google.com/file/d/0BzG0wQkWbKpLVVllLXJUTTcyT2s/view?usp=sharing

Amazon documentation is under, http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html


7 comments:

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/

    ReplyDelete

  2. Interesting blog, here a lot of valuable information is available, it is very useful information.
    RedShift Training
    AWS RedShift Training
    Amazon RedShift Online Training

    ReplyDelete
  3. It had taken much research as well as standard potentials to search for the data warehouse consulting companies , which can help in the process of data transformation excellently.

    ReplyDelete
  4. Extremely intriguing to peruse this article.I might want to thank you for the endeavors you had made for composing this marvelous article. This article propelled me to understand more. keep it up.
    our sclinbio.com

    ReplyDelete