Hadoop and Spark by Leela Prasad: Hadoop Commands

In terminal,

sudo -u hdfs hadoop fs -mkdir /user/Leela

Note: Need to first create Leela and then Hive directory.

sudo -u hdfs hadoop fs -mkdir /user/Leela/Hive

sudo -u hdfs hadoop fs -mkdir hdfs://quickstart.cloudera:8020/user/Leela/Hive/Student

hadoop fs -mkdir Leela/Nifi1_out

To go to created directory and lookup for files in it.

hadoop fs -ls Leela
Found 1 items

drwxr-xr-x - hadoop supergroup 0 2017-07-07 11:34 Leela/Nifi1_out

To Upload a file to HDFS, use

hadoop fs -put /home/cloudera/Desktop/Spark/Words2 /user/Leela/Hive

hadoop fs -put file:///home/cloudera/Desktop/Hadoop/HIve/hive_inputs/student.txt hdfs://quickstart.cloudera:8020/user/Leela/Hive/Student

To delete a file in HDFS, use

hadoop fs -rm hdfs://quickstart.cloudera:8020/user/Pig/demo.txt

hadoop fs -rm /user/hadoop/Leela/Nifi1_out/1.txt

FOLLOWED BY,

hadoop fs -expunge

To delete a directory in HDFS, use

sudo -u hdfs hadoop fs -rmdir hdfs://quickstart.cloudera:8020/user/Leela/Hive/Student

To recursively delete a directory that has some files in it.

hadoop fs -rmr /user/hadoop/Leela/Nifi1_out

followed by hadoop fs -expunge

To view Contents of HDFS file

hadoop fs -cat /user/hadoop/Leela/Nifi1_out/1.txt

Force replace by overriding existing Directory.

hdfs dfs -put -f /home/cloudera/Employee/newdir1 /user/cloudera

Merge Files in HDFS and save the resultant in Local filesystem

hdfs dfs -getmerge /user/cloudera/newdir1 /home/cloudera/Employee/newdir1/MergedEmployee.txt

Change Permissions.

hdfs dfs -chmod 664 /user/cloudera/newdir1/MergedEmployee.txt

To know hadoop installed dir

whereis hadoop

To know hadoop directory

echo $HADOOP_HOME

This command gives the directories and the users created those users

hadoop fs -ls

Give the password as cloudera

su root

su hdfs //switch to hdfs user if the directory is created by hdfs user.

Note: Switch to the Owner of the directory to get permissions. Highlighted above

Provide the permissions

hadoop fs -chmod -R 777 /user/Leela

hadoop fs -chmod -R 777 /user

To get list of files in a HDFS Directory

hadoop dfs -ls xml_test //here xml_test is the directory name

In case of Namenode in safe mode Error use the below command,

sudo -u hdfs hdfs dfsadmin -safemode leave

To Kill an existing Job

/usr/lib/hadoop/bin/hadoop job -kill job_1490524209136_0004

To see the list of services Running

sudo jps

To get the lost of Hadoop directories:

hadoop fs -ls /

Putty Usage

Give host name along with User id:

eg:ubuntu@ec2-13-126-209-11.ap-south-1.compute.amazonaws.com

ubuntu@13.126.197.33

For Cloud, first Generate a keypair Use PuttyGen to generate a PPK file

In Putty, Connection -> SSH -> Auth Provide path of the generated PPK file and click on Open

Also this can be saved.

This PPK file is for cloud. Incase of datacentre username and password has to be specified in the terminal

WinSCP Usage: To transfer files to datanode

provide host name as like above

Eg: ubuntu@13.126.71.73

Click Advanced -> SSH -. Authentication and Provide PPK file path

To install aws cli

sudo apt install awscli

To get namenode's host name

hdfs getconf -namenodes

To view contents in a HDFS file

hadoop fs -cat /user/cert/problem/solution/part-m-00000

Copy a file from source to destination

hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/dir2

Move file from source to destination.

hdfs dfs -mkdir /user/cloudera/problem2
hdfs dfs -mv /user/cloudera/products /user/cloudera/problem2/products

Display the aggregate length of a file.
hdfs dfs -du /user/cloudera/problem2/products/part-m-00000

Display the size of sub-directories in parent directory(-h displays in MB)
hdfs dfs -du -h /user/cloudera/problem2/products

Display last few lines of a file.

dfs dfs -tail /user/cloudera/problem2/products/part-m-00000

Commands for operating on Services in Cloudera like Hive,Hue etc

https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_services_stop.html

To get the list of YARN jobs running in datanode.

yarn application -list

Command to start zookeeper service:

sudo service zookeeper-server start

To Kill an existing running YARN Application

yarn application -kill <APPLICATIONID>
eg:
yarn application -kill application_1542646340855_17805

Few points:

Scripts of all the services in the machine would be under /etc/init.d/.
/etc/init.d contains scripts used by the System V init tools (SysVinit). This is the traditional service management package for Linux, containing the init program (the first process that is run when the kernel has finished initializing¹) as well as some infrastructure to start and stop services and configure them. Specifically, files in /etc/init.d are shell scripts that respond to start, stop, restart.

To start all hadoop services use command,
for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start ; done

This will start the services those start with hadoop-.

To start individual services look for service name under /etc/init.d/ and sudo service zookeeper-server start.

List the running jobs in the Node

yarn application --list

Status of all the running services

service --status-all

Restart HMaster

sudo service hbase-master restart

Note: In case of error HBase master daemon is dead and pid file exists [FAILED], restart HMaster using above command.

2 comments:

RenuAugust 16, 2018 at 11:29 PM
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

https://www.emexotechnologies.com/online-courses/big-data-hadoop-training-in-electronic-city/
Veera BlogspotNovember 19, 2020 at 11:28 PM
Very nice article,Thank you for sharing it.
Keep updating...

Big Data Hadoop Certification

Hadoop and Spark by Leela Prasad

Sunday, February 5, 2017

Hadoop Commands