Need to Prepare
Hadoop and Spark by Leela Prasad
Monday, February 17, 2025
Sunday, September 22, 2024
Classes and Object Oriented Python
In Python, you define a class by using the class
keyword followed by a name and a colon. Then you use .__init__()
to declare which attributes each instance of the class should have:
# dog.py
class Dog:
def __init__(self, name, age):
self.name = name
self.age = age
In the body of .__init__()
, there are two statements using the self
variable:
self.name = name
creates an attribute calledname
and assigns the value of thename
parameter to it.self.age = age
creates an attribute calledage
and assigns the value of theage
parameter to it.
To instantiate this Dog
class, you need to provide values for name
and age
. If you don’t, then Python raises a TypeError
:
>>> Dog()
Traceback (most recent call last):
...
TypeError: __init__() missing 2 required positional arguments: 'name' and 'age'
To pass arguments to the name
and age
parameters, put values into the parentheses after the class name:
>>> miles = Dog("Miles", 4)
>>> buddy = Dog("Buddy", 9)
When you instantiate the Dog
class, Python creates a new instance of Dog
and passes it to the first parameter of .__init__()
. This essentially removes the self
parameter, so you only need to worry about the name
and age
parameters.
What is the use of self in Python
When working with classes in Python, the term “self” refers to the instance of the class that is currently being used. It is customary to use “self” as the first parameter in instance methods of a class. Whenever you call a method of an object created from a class, the object is automatically passed as the first argument using the “self” parameter. This enables you to modify the object’s properties and execute tasks unique to that particular instance.
The __init()___ is similar to constructors in C++ or JAVA. When you instantiate the Dog
class, Python creates a new instance of Dog
and passes it to the first parameter of .__init__()
. This essentially removes the self
parameter, so you only need to worry about the name
and age
parameters.
Instance methods are functions that you define inside a class and can only call on an instance of that class. Just like .__init__()
, an instance method always takes self
as its first parameter.
# dog.py
class Dog:
species = "Canis familiaris"
def __init__(self, name, age):
self.name = name
self.age = age
# Instance method
def description(self):
return f"{self.name} is {self.age} years old"
# Another instance method
def speak(self, sound):
return f"{self.name} says {sound}"
Creating object and calling the methods
>>> miles = Dog("Miles", 4)
>>> miles.description()
'Miles is 4 years old'
>>> miles.speak("Woof Woof")
'Miles says Woof Woof'
>>> miles.speak("Bow Wow")
'Miles says Bow Wow'
Inheritance
# dog.py
# ...
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return f"{self.name} says {sound}"
# ...
>>> miles = JackRussellTerrier("Miles", 4)
>>> miles.speak()
'Miles says Arf'
# dog.py
# ...
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return f"{self.name} says {sound}"
# ...
# dog.py
# ...
class JackRussellTerrier(Dog):
def speak(self, sound="Arf"):
return super().speak(sound)
# ...
super().speak(sound)
inside JackRussellTerrier
, Python searches the parent class, Dog
, for a .speak()
method and calls it with the variable sound
.Garbage Collection in Python
# Importing gc module import gc # Returns the number of # objects it has collected # and deallocated collected = gc.collect() # Prints Garbage collector # as 0 object print ( "Garbage collector: collected" , "%d objects." % collected) |
Sunday, September 15, 2024
Databricks
Databricks provides a community edition for free and can be used to explore it's capabilities or can be used for trying out on its Notebooks. Both Python and scala are supported.
Filesystem: It's filesystem is called dbfs
Saturday, September 14, 2024
Pytest
pipenv pip install pytest
Now, I have a simple function in a file
t.py
def square(x: float):
return x * x
t_test.py
import t
def test_square():
assert t.square(5) == 25
Now, enhancing the test case code for running tests for multiple cases.
t_test.py
import t
import pytest
@pytest.mark.parametrize(
('input_n', 'expected'),
(
(5,25),
(3.,9.),
)
)
def test_square(input_n, expected):
assert t.square(input_n) == expected
Now, adding a class,
t_test.py
import t
import pytest
@pytest.mark.parametrize(
('input_n', 'expected'),
(
(5,25),
(3.,9.),
)
)
def test_square(input_n, expected):
assert t.square(input_n) == expected
class TestSquare:
def test_square(self):
assert t.square(3) == 9
PipEnv
Pipenv is a Python virtualenv management tool that supports a multitude of systems and nicely bridges the gaps between pip, python (using system python, pyenv or asdf) and virtualenv. Linux, macOS, and Windows are all first-class citizens in pipenv.
Pipenv is a recommended way to install Python Packages and use a virtual environment because when you use the PIP Package manager that's bundled with python anything installed gets installed globally and you do not have encapsulated environment for each project that is created Eg: Spark, ML might need different packages altogether. Pipenv allows us to create environment virtually and it also allows us to easily add or remove packages easily specific to the Project needs.
Pipenv automatically creates and manages a virtualenv for your projects, as well as adds/removes packages from your Pipfile
as you install/uninstall packages. It also generates a project Pipfile.lock
, which is used to produce deterministic builds.
Pipenv is primarily meant to provide users and developers of applications with an easy method to arrive at a consistent working project environment.
Few Useful commands:
pip install --user pipenv
pipenv --version
pipenv shell
pipenv install -r requirements.txt
pipenv pip freeze
pipenv graph