Thursday, March 4, 2021

Spark Scala vs pySpark

Performance: Many articles say that "Spark Scala is 10 times faster than pySpark", but in reality and from Spark 2.x onwards this statement is no longer true. pySpark used to be buggy and poorly supported, but was updated well in recent times. However, for batch jobs where data magniture is more Spark Scala gives better performance.


Library Stack:

Pandas in Pyspark is an advantage.

Python's Visualization libraries complement pySpark. Where these are not available in Scala.

Python comes with some libraries that are well known for data analysis. Several Libraries are available like Machine learning and Natural Language Processing.


Learning python is believed to be easier than Scala.


Scala Supports powerful concurrency trough primitives like Akka's actors. Also has Future Execution context


8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. It is also a truth that one should not use one Python for commercial projects. Python is not a programming language. It is scripting language. The code you write is visible to everyone. It is better to use Python with <a href="https://www.gyansetu.in/courses/best-c-programming-certification-training-in-gurgaon/”>C/C++ Training in Gurgaon</a> extension to hide your critical intellectual.

    ReplyDelete
  3. Thank you so much Sir for this valuable information. It will definitely help us to take up the right approach in Python learning.

    Manal testing Training in Gurgaon
    Cucumber Training in Gurgaon

    ReplyDelete
  4. Hi Leela Mam,
    I'm SSIS developer, I want to switch my career in big data, where should I go, pyspark or spark? I feel python is easy to learn, kindly suggest.

    ReplyDelete
  5. The business world of today, post-COVID-19, is disrupted. This calls for a thorough analysis and management of costs.
    Machine Learning Training in Gurgaon
    Data Analytics Training in Gurgaon
    As you gather data and use it to figure out inefficiencies, controlling costs becomes easy

    ReplyDelete
  6. Data Science, Artificial Intelligence, Machine Learning are the most buzzwords when it comes to technology. It’s a dream of many people to get a job in analytics because of high salaries, stable career.

    Data Science Training in Gurgaon

    ReplyDelete
  7. Thats not correct scala spark and pyspark get same performance but only problem pyspark doesn't support dataset api but scala and java support dataset api so little bit performance increase thats it
    Regards
    Venu
    bigdata training institute in Hyderabad
    spark training in Hyderabad

    ReplyDelete