Below are the steps for creation Spark Scala SBT Project in Intellij:
1. Open Intellij via Run as Administrator and create a New project of type scala and sbt.
If this option is not available, open Intellij and go to settings -> pluging and type the plugin Scala and install it.
Also Install sbt plugin from the plugins window.
2. Select scala version which is compatible with spark, eg if spark version is 2.3 then select scala version as 2.11 and not 2.12 as spark 2.3 is compatible with scala 2.11. So, selected 2.11.8
Sample is available under https://drive.google.com/open?id=19YpCwLzuFZSqBReaceVOFS-BwlArOEpf
Debugging Spark Application
Remote Debugging
http://www.bigendiandata.com/2016-08-26-How-to-debug-remote-spark-jobs-with-IntelliJ/
1. Generate JAR file and copy it to a location in cluster.
2. Execute the command,
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000
3. In Intellij Run -> Edit Configurations -> Select Remote and Create a configuration with port number 4000 and the Host name of the machine in which JAR is copied.
4. submit the Spark application Eg: spark-submit --class com.practice.SayHello demo_2.11-0.1.jar
5. Click on Debug in Intellij for the configuration create in step3 and this would connect to the Spark Application.
To write data to Hive tables from Spark Dataframe below are the 2 steps:
1. In spark-submit add the entry of hive site file as --files /etc/spark/conf/hive-site.xml
2. Enable Hive support in spark session enableHiveSupport(). eg:
val spark = SparkSession.builder.appName("Demo App").enableHiveSupport().getOrCreate()
Sample Code:
val date_add = udf((x: String) => {
val sdf = new SimpleDateFormat("yyyy-MM-dd")
val result = new Date(sdf.parse(x).getTime())
sdf.format(result)
} )
val dfraw2 = dfraw.withColumn("ingestiondt",date_add($"current_date"))
dfraw2.write.format("parquet").mode(SaveMode.Append).partitionBy("ingestiondt").option("path", "s3://ed-raw/cdr/table1").saveAsTable("db1.table1")
To write data to Hive tables from Spark Dataframe below are the 2 steps:
1. In spark-submit add the entry of hive site file as --files /etc/spark/conf/hive-site.xml
2. Enable Hive support in spark session enableHiveSupport(). eg:
val spark = SparkSession.builder.appName("Demo App").enableHiveSupport().getOrCreate()
Sample Code:
val date_add = udf((x: String) => {
val sdf = new SimpleDateFormat("yyyy-MM-dd")
val result = new Date(sdf.parse(x).getTime())
sdf.format(result)
} )
val dfraw2 = dfraw.withColumn("ingestiondt",date_add($"current_date"))
dfraw2.write.format("parquet").mode(SaveMode.Append).partitionBy("ingestiondt").option("path", "s3://ed-raw/cdr/table1").saveAsTable("db1.table1")
Thanks for the List .. Its a good one, I always look forward to see your post.
ReplyDeleteHadoop Big Data Classes in Pune
Way cool! Some very valid points! I appreciate you penning this article and also the rest of the site is really good.
ReplyDeleteUI Development Training in Bangalore
Reactjs Training in Bangalore
PHP Training in Bangalore
thanks for sharing your knowledge
ReplyDeleteSpark and Scala Online Training
NICE BLOG GOOD INFORMATION
ReplyDeleteUsually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up.mobile phone repair in Auburn Hills
ReplyDeleteiphone repair in Auburn Hills
cell phone repair in Auburn Hills
tablet repair in Auburn Hills
ipad repair in Auburn Hills
mobile phone repair Auburn Hills
iphone repair Auburn Hills
cell phone repair Auburn Hills
phone repair Auburn Hills
tablet repair Auburn Hills
Nice Blog.keep updating more Blogs.
ReplyDeletebig data hadoop course
big data hadoop training
big data and hadoop online training