Hadoop and Spark by Leela Prasad: Creating Spark Scala SBT Project in Intellij

Sunday, January 13, 2019

Creating Spark Scala SBT Project in Intellij

Below are the steps for creation Spark Scala SBT Project in Intellij:

1. Open Intellij via Run as Administrator and create a New project of type scala and sbt.
If this option is not available, open Intellij and go to settings -> pluging and type the plugin Scala and install it.
Also Install sbt plugin from the plugins window.

2. Select scala version which is compatible with spark, eg if spark version is 2.3 then select scala version as 2.11 and not 2.12 as spark 2.3 is compatible with scala 2.11. So, selected 2.11.8

Sample is available under https://drive.google.com/open?id=19YpCwLzuFZSqBReaceVOFS-BwlArOEpf

Debugging Spark Application

Remote Debugging

http://www.bigendiandata.com/2016-08-26-How-to-debug-remote-spark-jobs-with-IntelliJ/

1. Generate JAR file and copy it to a location in cluster.

2. Execute the command,
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000

3. In Intellij Run -> Edit Configurations -> Select Remote and Create a configuration with port number 4000 and the Host name of the machine in which JAR is copied.

4. submit the Spark application Eg: spark-submit --class com.practice.SayHello demo_2.11-0.1.jar

5. Click on Debug in Intellij for the configuration create in step3 and this would connect to the Spark Application.

To write data to Hive tables from Spark Dataframe below are the 2 steps:

1. In spark-submit add the entry of hive site file as --files /etc/spark/conf/hive-site.xml
2. Enable Hive support in spark session enableHiveSupport(). eg:
val spark = SparkSession.builder.appName("Demo App").enableHiveSupport().getOrCreate()

Sample Code:

val date_add = udf((x: String) => {
val sdf = new SimpleDateFormat("yyyy-MM-dd")
val result = new Date(sdf.parse(x).getTime())
sdf.format(result)
} )

val dfraw2 = dfraw.withColumn("ingestiondt",date_add($"current_date"))

dfraw2.write.format("parquet").mode(SaveMode.Append).partitionBy("ingestiondt").option("path", "s3://ed-raw/cdr/table1").saveAsTable("db1.table1")

6 comments:

TechnogeekscsNovember 6, 2019 at 5:06 AM
Thanks for the List .. Its a good one, I always look forward to see your post.

Hadoop Big Data Classes in Pune
ReplyDelete
Replies
infocampusbangaloreDecember 19, 2019 at 9:06 PM
Way cool! Some very valid points! I appreciate you penning this article and also the rest of the site is really good.
UI Development Training in Bangalore
Reactjs Training in Bangalore
PHP Training in Bangalore
ReplyDelete
Replies
rainbow brammiFebruary 22, 2020 at 2:47 AM
thanks for sharing your knowledge
Spark and Scala Online Training
ReplyDelete
Replies
rainbowrFebruary 27, 2020 at 3:54 AM
NICE BLOG GOOD INFORMATION
ReplyDelete
Replies
Julia LoiMarch 3, 2020 at 10:44 PM
Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up.mobile phone repair in Auburn Hills
iphone repair in Auburn Hills
cell phone repair in Auburn Hills
tablet repair in Auburn Hills
ipad repair in Auburn Hills
mobile phone repair Auburn Hills
iphone repair Auburn Hills
cell phone repair Auburn Hills
phone repair Auburn Hills
tablet repair Auburn Hills

ReplyDelete
Replies
veeraJuly 7, 2020 at 4:52 AM
Nice Blog.keep updating more Blogs.
big data hadoop course
big data hadoop training
big data and hadoop online training
ReplyDelete
Replies

Add comment