Parallel processing: Setup Hadoop for multi-node cluster - MapReduce on Eclipse

- Reference Link:

(Ubuntu)

https://medium.com/@jootorres_11979/how-to-set-up-a-hadoop-3-2-1-multi-node-cluster-on-ubuntu-18-04-2-nodes-567ca44a3b12 

https://www.linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/

(Windows)

https://gist.github.com/addingama/f665914340ec26f7efa80e86f53622e1 

(MapReduce on Eclipse)

https://www.shabdar.org/hadoop-java/138-how-to-create-and-run-eclipse-project-with-a-mapreduce-sample.html

http://users.encs.concordia.ca/~moa_ali/Comp6521Lab/preparing_hadoop_in_eclipse_luna/

(download)

https://archive.apache.org/dist/hadoop/core/

 

Note of Lab:

/usr/local/hadoop/share/Hadoop/mapreduce
/usr/local/hadoop/share/hadoop/common/lib

 

Prepare
>> hdfs namenode –format

 


Running
Run it as an Administrator
>> cd C:\hadoop-2.10.0\sbin
>> start-all.cmd

Make sure these apps are running
Hadoop Namenode
Hadoop datanode
YARN Resourc Manager
YARN Node Manager

>> localhost:8088 to open Resource Manager
>> localhost:8042 to open Node Manager
>> localhost:50070 to checkout the health of Name Node
>> localhost:50075 to checkout Data Node

 

 

export HADOOP_CLASSPATH=$(hadoop classpath)
echo $HADOOP_CLASSPATH 
hadoop fs -mkdir /WordCountTutorial
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put '/home/osboxes/Desktop/WordCountTutorial/input_data/input.txt' /WordCountTutorial/Input
cd /home/osboxes/Desktop/WordCountTutorial/
>> Tạo file jar
hadoop jar '/home/osboxes/Desktop/WordCountTutorial/WordCount.jar' WordCount /WordCountTutorial/Input /WordCountTutorial/Output

 


export HADOOP_CLASSPATH=$(hadoop classpath)
echo $HADOOP_CLASSPATH 
hadoop fs -mkdir /MatrixMultiply
hadoop fs -mkdir /MatrixMultiply/Input
hadoop fs -put '/home/osboxes/Desktop/MatrixMultiply/input_data/M' /MatrixMultiply/Input
hadoop fs -put '/home/osboxes/Desktop/MatrixMultiply/input_data/N' /MatrixMultiply/Input
cd /home/osboxes/Desktop/MatrixMultiply/
>> Tao file jar
hadoop jar '/home/osboxes/Desktop/MatrixMultiply/MatrixMultiply.jar' /MatrixMultiply/Input /MatrixMultiply/Output


Xóa file: 
hadoop fs -rm -R /MatrixMultiply/Input/M.txt
hadoop fs -rm -R /MatrixMultiply/Output/_temporary

Khởi tạo lại kết nối SSH:
ssh-keygen -t rsa
ssh-copy-id hadoopuser@hadoop-master
ssh-copy-id hadoopuser@hadoop-slave1
ssh-copy-id hadoopuser@hadoop-slave2