Few points:
1. pyspark will take input only from HDFS and not from local file system.
1. pyspark will take input only from HDFS and not from local file system.
PySpark Notes
To install Packages
Python Installation
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
Packages Installation
pip3 install <PACKAGENAME>
pip3 install requests --user
pip3 install pandas
Requests Example:
requests package is used while working with rest API's.
import requests
payload= f"grant_type=client_credentials&client_id={client_id_value}&client_secret={client_secret_value}&undefined="
headers={
'content-Type': ""
'Accept':"application/json;v=1"
}
response= requests.request("POST",url,data=payload, header,verify=false)
Pandas Examples:
import pandas as pd
pdf = pd.Dataframe(df_list)
pdf.to_csv(PATH, sep='\u0001', index=False, header = False)
SnowFlake connection:
import snowflake.connector
sfconnec = snowflake.connector.connect(user,
pwd,
account,
warehouse,
dbname,
schema,
role)
sfJdbcConnection = sfconnec.cursor
sqldata = sfJdbcConnection.execute("SELECT * FROM TBL1").fetchall
tableschema = ['Name', 'Address', 'Id']
panda_df = DataFrame(sqldata, columns=tableschema) #Converts to Pandas DF
Transpose/Pivot using Pandas:
The advantage of using Pandas is that it has numerous libraries and is powerful. one such example is Transpose operation where columns are to be converted as Rows.
For this we do not need to write any complex logic. In pandas 'melt' is the function that does this job for used
Eg:
import pandas as pd
tablecols = ['Name', 'Address', 'Id']
pd.melt(csv_df, tablecols)
Later rename the columns
source: https://www.journaldev.com/33398/pandas-melt-unmelt-pivot-function
Steps for AWS Lambda creation:
1. Create Function
2. Add Tags.
3. Add VPS Settings
4. Create Layer
5. Start writing function
To create a layer
pip3 install -U pandas -t ./
Now, Pandas library will be saved in the current directory, zip it and upload as layer to the Lambdas function.
Python Iteration through list
numbers = (1,2,3,4)
def addition_op(n):
return n+ n
out_num = map(addition_op, numbers)
out_nums2 = map(lambda x: x + x, numbers)
Python alternative of map(x => x) like in Scala
df.select("text").show()
df.select("text").rdd.map(lambda x: str(x).upper()).collect()
+-----+
| text|
+-----+
|hello|
|world|
+-----+
["ROW(TEXT='HELLO')", "ROW(TEXT='WORLD')"]
Example2:
df.show()
+-----+----------+
| text|upper_text|
+-----+----------+
|hello| HELLO|
|world| WORLD|
+-----+----------+
dd = df.rdd.map(lambda x: (x.text, x.upper_text))
dd.collect()
[('hello', 'HELLO'), ('world', 'WORLD')]
Parallelism in Python( Alternate to Futures in Scala)
use ThreadPoolExecutor
Eg:
from concurrent.futures ThreadPoolExecutor, ProcessPoolExecutor
with ThreadPoolExecutor(max_workers = 4) as exe:
exec.submit(get_Tbl1_data)
exec.submit(get_Tbl2_data)
exec.submit(get_Tbl3_data)
exec.submit(get_Tbl4_data)
exec.submit(get_Tbl5_data)
def get_Tbl1_data:
run_snow(conn, Tbl1_sql)
Snowflake Connection:
Used sqlalchemy for connecting to Snowflake from Python.
eg:
from sqlalchemy import create_engine
create_engine(URL(account="hostname", user='user', password='password',warehouse='warehousename',database='databasename',schema='schemaname'))
Install multiple packages in Python
pip3 install --upgrade -r /path/requirements.txt --user
requirements.txt
pandas==1.1.2
snowflake-sqlalchemy==1.2.3
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeletehttps://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Very Useful Blog
ReplyDeleteWe are gving the best Software training in bangalore. So that we guide for interview preparation.
Software Training Institute in Bangalore
Selenium Training in Bangalore
Hadoop Training in Bangalore
Devops Training in Bangalore
Python Training in Bangalore
RPA Training in Bangalore
AWS Training in Bangalore
TABLEAU Training in Bangalore
Spark Training in Bangalore
This is a really authentic and informative blog. Share more posts like this.
ReplyDeleteData Analytics Courses in Chennai
Big Data Analytics Courses in Chennai
Hadoop Admin Training in Chennai
Thank you for the useful information. Share more updates.
ReplyDeleteIdioms
Speaking Test