Multi-Node Cluster#

Hadoop Master: 192.168.100.213 (tc213)
Hadoop Slave: 192.168.100.211 (tc211)
Hadoop Slave: 192.168.100.212 (tc212)

Installing Java#

export JAVA_HOME=/usr/local/jdk1.8.0_181
export PATH=PATH:$JAVA_HOME/bin

Creating User Account#

# useradd hadoop
# passwd hadoop

Mapping the nodes#

vi /etc/hosts

168.100.213 tc213
168.100.211 tc211
168.100.212 tc212

Installing Hadoop#

# mkdir /usr/local/hadoop
# cd /usr/local/hadoop/
# wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
# tar -xzf hadoop-2.9.2.tar.gz
# mv hadoop-2.9.2 hadoop
# chown -R hadoop /usr/local/hadoop
# cd /usr/local/hadoop/

Configuring Hadoop#

core-site.xml#

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://tc213:9000/</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>file:/home/hadoop/cluster</value>
	</property>
</configuration>

hdfs-site.xml#

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	<property>
		<name>dfs.data.dir</name>
		<value>/home/hadoop/cluster/dfs/data</value>
		<final>true</final>
	</property>
	<property>
		<name>dfs.name.dir</name>
		<value>/home/hadoop/cluster/dfs/name</value>
		<final>true</final>
	</property>
</configuration>

mapred-site.xml#

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
</configuration>

yarn-site.xml#

<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>tc213</value>
	</property>
</configuration>

hadoop-env.sh#

export JAVA_HOME=/usr/local/jdk1.8.0_181
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

Installing Hadoop on Slave Servers#

$ scp -r hadoop tc211:/usr/local/hadoop
$ scp -r hadoop tc212:/usr/local/hadoop

Configuring Hadoop on Master Server#

Configuring Master Node#

vi /usr/local/hadoop/etc/hadoop/masters
tc213

Configuring Slave Node#

vi /usr/local/hadoop/etc/hadoop/slaves
tc211
tc212

Format Name Node on Hadoop Master#

hdfs namenode –format

Starting Hadoop Services#

mkdir -p /home/hadoop/cluster/dfs/name
mkdir -p /home/hadoop/cluster/dfs/data

cd $HADOOP_HOME/sbin
start-all.sh

Results#

cluster	hostname	IP	jps
Master	tc213	192.168.100.213	NameNode, SecondaryNameNode, ResourceManager
Slave	tc211	192.168.100.211	DataNode, NodeManager
Slave	tc212	192.168.100.212	DataNode, NodeManager

Master#

tail -f -n 99 /usr/local/hadoop/logs/hadoop-hadoop-namenode-tc213.log
tail -f -n 99 /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-tc213.log
tail -f -n 99 /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-tc213.log

Slave#

tail -f -n 99 /usr/local/hadoop/logs/hadoop-hadoop-datanode-tc211.log
tail -f -n 99 /usr/local/hadoop/logs/yarn-hadoop-nodemanager-tc211.log

Adding a New DataNode in the Hadoop Cluster#

Networking#

IP address : 192.168.100.214 
netmask : 255.255.255.0
hostname : tc214

Adding User and SSH Access#

Add a User#

useradd hadoop
passwd hadoop

Execute the following on the master#

mkdir -p $HOME/.ssh
chmod 700 $HOME/.ssh
ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
chmod 644 $HOME/.ssh/authorized_keys
Copy the public key to new slave node in hadoop user $HOME directory
scp $HOME/.ssh/id_rsa.pub hadoop@192.168.100.214:/home/hadoop/

Execute the following on the slaves#

su hadoop ssh -X hadoop@192.168.100.214

Set Hostname of New Node#

vi /etc/sysconfig/network

NETWORKING = yes 
HOSTNAME = tc214

vi /etc/hosts

192.168.100.214 tc214

Start the DataNode on New Node#

Start HDFS on a newly added slave node by using the following command#

./bin/hadoop-daemon.sh start datanode
yarn-daemon.sh start nodemanager

Check the output of jps command on a new node. It looks as follows.#

$ jps
7141 DataNode
10312 Jps

Removing a DataNode from the Hadoop Cluster#

Change cluster configuration: hdfs-site.xml

<property>
	<name>dfs.hosts.exclude</name>
	<value>/home/hadoop/cluster/hdfs_exclude</value>
</property>

Determine hosts to decommission: /home/hadoop/cluster/hdfs_exclude
```
192.168.100.214
```
Force configuration reload: $HADOOP_HOME/bin/hdfs dfsadmin -refreshNodes
Shutdown nodes: $HADOOP_HOME/bin/hdfs dfsadmin -report

Edit excludes file again

$HADOOP_HOME/bin/hadoop-daemon.sh stop tasktracker
$HADOOP_HOME/bin/hadoop-daemon.sh start tasktracker

Results#

Name: 192.168.100.214:50010 (tc214)
Hostname: tc214
Decommission Status : Decommission in progress

Name: 192.168.100.214:50010 (tc214)
Hostname: tc214
Decommission Status : Decommissioned

http://192.168.100.213:50070/dfshealth.html#tab-datanode
hadoop-daemon.sh stop datanode
yarn-daemon.sh stop nodemanager

Multi-Node Cluster

Contents

Multi-Node Cluster#

Installing Java#

Creating User Account#

Mapping the nodes#

Installing Hadoop#

Configuring Hadoop#

core-site.xml#

hdfs-site.xml#

mapred-site.xml#

yarn-site.xml#

hadoop-env.sh#

Installing Hadoop on Slave Servers#

Configuring Hadoop on Master Server#

Configuring Master Node#

Configuring Slave Node#

Format Name Node on Hadoop Master#

Starting Hadoop Services#

Results#

Master#

Slave#

Adding a New DataNode in the Hadoop Cluster#

Networking#

Adding User and SSH Access#

Add a User#

Execute the following on the master#

Execute the following on the slaves#

Set Hostname of New Node#

Start the DataNode on New Node#

Start HDFS on a newly added slave node by using the following command#

Check the output of jps command on a new node. It looks as follows.#

Removing a DataNode from the Hadoop Cluster#

Results#

References#

Multi-Node Cluster

Contents

Multi-Node Cluster#

Installing Java#

Creating User Account#

Mapping the nodes#

Configuring Key Based Login#

Installing Hadoop#

Configuring Hadoop#

core-site.xml#

hdfs-site.xml#

mapred-site.xml#

yarn-site.xml#

hadoop-env.sh#

Installing Hadoop on Slave Servers#

Configuring Hadoop on Master Server#

Configuring Master Node#

Configuring Slave Node#

Format Name Node on Hadoop Master#

Starting Hadoop Services#

Results#

Master#

Slave#

Adding a New DataNode in the Hadoop Cluster#

Networking#

Adding User and SSH Access#

Add a User#

Execute the following on the master#

Execute the following on the slaves#

Set Hostname of New Node#

Start the DataNode on New Node#

Login to new node#

Start HDFS on a newly added slave node by using the following command#

Check the output of jps command on a new node. It looks as follows.#

Removing a DataNode from the Hadoop Cluster#

Results#

References#