Sunday, September 15, 2024

Databricks

 


Databricks provides a community edition for free and can be used to explore it's capabilities or can be used for trying out on its Notebooks. Both Python and scala are supported.


Filesystem: It's filesystem is called dbfs


df.write.partitionBy("Location").mode("overwrite").parquet("Table1")

To View the files written and is similar as HDFS/S3/gs in GCP,

dbutils.fs.ls("/Table1/")

Saturday, September 14, 2024

Pytest

 

pipenv pip install pytest


Now, I have a simple function in a file

t.py

def square(x: float):

return x * x


t_test.py

import t

def test_square():

assert t.square(5) == 25


Now, enhancing the test case code for running tests for multiple cases.


t_test.py

import t

import pytest

@pytest.mark.parametrize(

('input_n', 'expected'),

(

(5,25),

(3.,9.),

)

)


def test_square(input_n, expected):

assert t.square(input_n) == expected


Now, adding a class,


t_test.py

import t

import pytest


@pytest.mark.parametrize(

('input_n', 'expected'),

(

(5,25),

(3.,9.),

)

)

def test_square(input_n, expected):

assert t.square(input_n) == expected


class TestSquare:

def test_square(self):

assert t.square(3) == 9


PipEnv

 

Pipenv is a Python virtualenv management tool that supports a multitude of systems and nicely bridges the gaps between pip, python (using system python, pyenv or asdf) and virtualenv. Linux, macOS, and Windows are all first-class citizens in pipenv.


Pipenv is a recommended way to install Python Packages and use a virtual environment because when you use the PIP Package manager that's bundled with python anything installed gets installed globally and you do not have encapsulated environment for each project that is created Eg: Spark, ML might need different packages altogether. Pipenv allows us to create environment virtually and it also allows us to easily add or remove packages easily specific to the Project needs.


Pipenv automatically creates and manages a virtualenv for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. It also generates a project Pipfile.lock, which is used to produce deterministic builds.

Pipenv is primarily meant to provide users and developers of applications with an easy method to arrive at a consistent working project environment.


Few Useful commands:

pip install --user pipenv

pipenv --version

pipenv shell

pipenv install -r requirements.txt

pipenv pip freeze

pipenv graph