Performance: Many articles say that "Spark Scala is 10 times faster than pySpark", but in reality and from Spark 2.x onwards this statement is no longer true. pySpark used to be buggy and poorly supported, but was updated well in recent times. However, for batch jobs where data magniture is more Spark Scala gives better performance.
Library Stack:
Pandas in Pyspark is an advantage.
Python's Visualization libraries complement pySpark. Where these are not available in Scala.
Python comes with some libraries that are well known for data analysis. Several Libraries are available like Machine learning and Natural Language Processing.
Learning python is believed to be easier than Scala.
Scala Supports powerful concurrency trough primitives like Akka's actors. Also has Future Execution context
This comment has been removed by the author.
ReplyDeleteIt is also a truth that one should not use one Python for commercial projects. Python is not a programming language. It is scripting language. The code you write is visible to everyone. It is better to use Python with <a href="https://www.gyansetu.in/courses/best-c-programming-certification-training-in-gurgaon/”>C/C++ Training in Gurgaon</a> extension to hide your critical intellectual.
ReplyDeleteThank you so much Sir for this valuable information. It will definitely help us to take up the right approach in Python learning.
ReplyDeleteManal testing Training in Gurgaon
Cucumber Training in Gurgaon
Hi Leela Mam,
ReplyDeleteI'm SSIS developer, I want to switch my career in big data, where should I go, pyspark or spark? I feel python is easy to learn, kindly suggest.
Rahul,
DeleteI Would suggest to start with pySpark.
The business world of today, post-COVID-19, is disrupted. This calls for a thorough analysis and management of costs.
ReplyDeleteMachine Learning Training in Gurgaon
Data Analytics Training in Gurgaon
As you gather data and use it to figure out inefficiencies, controlling costs becomes easy
Data Science, Artificial Intelligence, Machine Learning are the most buzzwords when it comes to technology. It’s a dream of many people to get a job in analytics because of high salaries, stable career.
ReplyDeleteData Science Training in Gurgaon
Thats not correct scala spark and pyspark get same performance but only problem pyspark doesn't support dataset api but scala and java support dataset api so little bit performance increase thats it
ReplyDeleteRegards
Venu
bigdata training institute in Hyderabad
spark training in Hyderabad