Hadoop 2.7.3 安装
在此记录hadoop 2.7.3版本的安装过程以及基本配置过程。
安装环境
- jdk1.8
- hadoop 2.7.3
- CentOS release 6.7 (Final) * 3
hostname | ip |
---|---|
master | 172.168.170.84 |
slave1 | 172.168.170.88 |
slave2 | 172.168.170.89 |
必需软件
- JDK安装(下载地址)
- ssh安装
hadoop中使用ssh来实现cluster中各个node的登陆认证,同时需要进行ssh免密登陆。1
sudo apt-get install ssh
- rsync安装
Ubuntu 12.10已经自带rsync。1
sudo apt-get install rsync
- hadoop下载
从官方mirrors下载对应版本的hadoop。
安装Hadoop
- 创建hadoop用户组以及用户 重新用hadoop用户登陆到Linux中。
1
2sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop - 将hadoop解压到目录
/home/hadoop/local/opt
中 - 配置hadoop环境变量
1
2
3
4
5
6
7
8
9
10export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk.x86_64/
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=$HOME/local/opt/hadoop-2.7.3
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin - 进入
hadoop-2.7.3/etc/hadoop
文件夹修改core-site.xml
文件1
2
3
4
5
6
7
8
9
10
11
12
13
14<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/local/var/hadoop/tmp</value>
</property>
</configuration> - 修改
hdfs-site.xml
文件1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
<description># 通过web界面来查看HDFS状态 </description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/local/var/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/local/var/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description># 每个Block有1个备份</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration> - 修改
mapred-site.xml
这个是mapreduce任务的配置,由于hadoop2.x使用了yarn框架,所以要实现分布式部署,必须在mapreduce.framework.name
属性下配置为yarn。mapred.map.tasks
和mapred.reduce.tasks
分别为map和reduce的任务数。1
2
3
4
5
6
7
8
9
10
11
12
13
14<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration> - 修改
yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
</configuration> - 修改
slaves
文件1
2slave1
slave2 - 修改
hosts
文件,命名各个节点的名称。1
2
3
4127.0.0.1 localhost
172.168.170.84 master
172.168.170.88 slave1
172.168.170.89 slave2 - 节点之间ssh免密登陆
在master
节点中生成密钥,并添加到.ssh/authorized_keys
文件中。将1
2ssh-keygen -t rsa
cat id_rsa.pub>> authorized_keysmaster
中的/etc/hosts
文件和.ssh/authorized_keys
文件发送到slave1和slave2文件中。完成之后可以利用以下语句测试免密登陆。1
2
3scp /etc/hosts hadoop@slave1:/etc/hosts
scp /home/hadoop/.ssh/authorized_keys hadoop@slave1:/home/hadoop/.ssh/authorized_keys
scp /home/hadoop/.ssh/authorized_keys hadoop@slave2:/home/hadoop/.ssh/authorized_keys1
2ssh slave1
ssh slave2 - 将
hadoop-2.7.3
文件拷贝至slave1和slave21
2scp -r /home/hadoop/local/opt/hadoop-2.7.3 hadoop@slave1:/home/hadoop/local/opt/
scp -r /home/hadoop/local/opt/hadoop-2.7.3 hadoop@slave2:/home/hadoop/local/opt/
启动Hadoop
- 在master节点使用hadoop用户初始化NameNode
1
2
3hdfs namenode –format
#执行后控制台输出,看到 Exiting with status 0 表示格式化成功。
#如有错误,先删除var目录下的临时文件,然后重新运行该命令 - 启动hadoop
1
2
3
4#启动hdfs
start-dfs.sh
#启动yarn分布式计算框架
start-yarn.sh - 用jps命令查看hadoop集群运行情况
master节点slave节点1
2
3
4
5Jps
NameNode
ResourceManager
SecondaryNameNode
JobHistoryServer1
2
3Jps
DataNode
NodeManager - 通过以下网址查看集群状态
1
2http://172.168.170.84:50070
http://172.168.170.84:8088