1. 概述
- 本文档主要记录hadoop的hdfs高可用集群搭建。
- hdfs高可用集群依赖zookeeper,zookeeper集群搭建教程,请查看上篇文章。01.Zookeeper集群搭建
- zookeeper信息
- zookeeper001.local.com:2181
- zookeeper002.local.com:2181
- zookeeper003.local.com:2181
2. 服务器信息
主机名 | IP | CPU | Mem | 磁盘 | 操作系统 |
hdfs001.local.com | 172.21.0.21 | 4核 | 8GiB | 200GiB | centos 7.x |
hdfs002.local.com | 172.21.0.22 | 4核 | 8GiB | 200GiB | centos 7.x |
hdfs003.local.com | 172.21.0.23 | 4核 | 8GiB | 200GiB | centos 7.x |
- 主机名修改命令
# 172.21.0.21服务器上执行
$ sudo hostnamectl set-hostname hdfs001.local.com
# 172.21.0.22服务器上执行
$ sudo hostnamectl set-hostname hdfs002.local.com
# 172.21.0.23服务器上执行
$ sudo hostnamectl set-hostname hdfs003.local.com
3. 创建部署账户
# 分别在hdfs001,002,003上创建hdfsuser
# 创建账户
$ sudo useradd -m -s /bin/bash -r hdfsuser
# 设置密码
$ sudo passwd hdfsuser
# 设置账户具有sudo权限
$ sudo vim /etc/sudoers
hdfsuser ALL=(ALL) NOPASSWD: ALL
4. 调整安全项
- 分别在hdfs001,002,003上执行下面操作
- 关闭selinux
# 临时关闭
$ sudo setenforce 0
# 永久关闭
$ sduo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
- 关闭防火墙
$ systemctl stop firewalld
$ systemctl disable firewalld
5. 配置服务器之间免密登陆
- 分别在3台服务器(hdfs001,hdfs002,hdfs003)上执行下面操作
- 编辑/etc/hosts文件,增加服务器主机名映射:
$ sudo vim /etc/hosts
172.21.0.11 zookeeper001.local.com
172.21.0.12 zookeeper002.local.com
172.21.0.13 zookeeper003.local.com
172.21.0.21 hdfs001.local.com
172.21.0.22 hdfs002.local.com
172.21.0.23 hdfs003.local.com
- 生成服务器秘钥
# 切换到hdfsuser账户
$ su - hdfsuser
# 一直回车
$ ssh-keygen
- 复制秘钥到其他服务器
$ ssh-copy-id hdfs001.local.com
$ ssh-copy-id hdfsr002.local.com
$ ssh-copy-id hdfs003.local.com
6. 部署java环境
- 解压 jdk1.8.0_151.tar.gz 到 /usr/java 目录下
$ sudo tar -xvf jdk1.8.0_151.tar.gz
$ sudo mv jdk1.8.0_151 /usr/java
- 编辑/etc/profile,配置java环境变量
$ sudo vim /etc/profile.d/javaenv.sh
#!/bin/bash
#java
export JAVA_HOME=/usr/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
- 使配置生效
$ source /etc/profile.d/javaenv.sh
- 验证
$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
7. 搭建hdfs集群
7.1 配置
- 解压hadoop-3.3.3.tar.gz 到 /opt/software目录下
$ tar -xvf hadoop-3.3.3.tar.gz
# 创建安装目录
$ sudo mkdir /opt/software
$ sudo chown -R hdfsuser.hdfsuser /opt/software
# 移动至/opt/software
$ mv hadoop-3.3.3 /opt/software
- 编辑/etc/profile.d/hadoopenv.sh,增加hadoop环境变量
$ sudo vim /etc/profile.d/hadoopenv.sh
#!/bin/bash
#hadoop
export HADOOP_PREFIX=/opt/software/hadoop-3.3.3
export HADOOP_HOME=/opt/software/hadoop-3.3.3
export HADOOP_HDFS_HOME=/opt/software/hadoop-3.3.3
export HADOOP_CONF_DIR=/opt/software/hadoop-3.3.3/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=/opt/software/hadoop-3.3.3/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
- 修改hadoop-env.sh
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim hadoop-env.sh
#增加以下内容,hadoop 的操作用户为当前系统用户
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HDFS_NAMENODE_USER="hdfsuser"
export HDFS_DATANODE_USER="hdfsuser"
export HDFS_SECONDARYNAMENODE_USER="hdfsuser"
export YARN_RESOURCEMANAGER_USER="hdfsuser"
export YARN_NODEMANAGER_USER="hdfsuser"
- 编辑 core-site.xml,此处设置了 hdfs 主节点为 hdfs001.local.com和hdfs002.local.com,目的确保数据安全。
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1/</value>
</property>
<!-- 设置hdfs 用户 -->
<property>
<name>hadoop.proxyuser.hdfsuser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfsuser.groups</name>
<value>*</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/hdata/hadoop_data/temDir</value>
</property>
<!-- 指定客户端访问zookeeper的地址 -->
<property>
<name>ha.zookeeper.quorum</name> <value>zookeeper001.local.com:2181,zookeeper002.local.com:2181,zookeeper003.local.com:2181</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
</configuration>
- 编辑hdfs-site.xml
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致-->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hdfs001.local.com:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hdfs001.local.com:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hdfs002.local.com:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hdfs002.local.com:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hdfs001.local.com:8485;hdfs002.local..com:8485;hdfs003.local..com:8485/ns1</value>
</property>
<!--指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hdata/hadoop_data/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/docloud/.ssh/id_rsa</value>
</property>
<!--配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--数据副本数-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hdata/hadoop_data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hdata/hadoop_data/datanode</value>
</property>
</configuration>
- 编辑 mapred-site.xml
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 编辑 yarn-site.xml
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hdfs001.local.com</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hdfs002.local.com</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zookeeper001.lcoal.com:2181,zookeeper002.lcoal.com:2181,zookeeper003.lcoal.com:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>49152</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>49152</value>
</property>
</configuration>
- 配置授权账户,编辑hadoop-policy.xml,修改一下内容 * 改为hdfsuser
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim hadoop-policy.xml
<property>
<name>security.client.protocol.acl</name>
<value>hdfsuser</value>
</property>
- 编辑 workers 文件
$ cd /opt/software/hadoop-3.3.3/etc/hadoop
$ vim workers
hdfs001.local.com
hdfs002.local.com
hdfs003.local.com
- 将hadoop-3.3.3 复制到其他节点/opt/software 目录下,并创建数据目录
$ sudo mkdir /hdata
$ sudo chown -R hdfsuser /hdata
$ mkdir -p /hdata/hadoop_data/{namenode,datanode,temDir,journaldata}
- 修改其他节点的/etc/profile.d下添加hadoop环境变量,并source使生效
$ source /etc/profile.d/hadoopenv.sh
7.2 首次启动
- 启动journalnode
# 分别在001,002及003上执行
$ cd /opt/software/hadoop-3.3.3/sbin
$ ./hadoop-daemon.sh start journalnode
# 检查journalnode进程是否启动
$ jps
1539 JournalNode
- 开始格式化
# 格式化hdfs,在hdfs001.local.com上
$ hdfs namenode –format
# 在格式化节点hdfs001.local.com后,开启namenode,在该服务器上
$ hdfs --daemon start namenode
- 开启namenode
# 在hdfs001.local.com上同步格式化信息
hdfs namenode -bootstrapStandby
# 在hdfs002.local.com上同步格式化信后,开启namenode,在hdfs002服务器上
hdfs --daemon start namenode
- 格式化ZKFC(在hdfs001.local.com上执行即可)
# 交互式时,输入y
$ hdfs zkfc -formatZK
- 启动hdfs,在hdfs001.local.com
$ cd /opt/software/hadoop-3.3.3/sbin
$ ./start-dfs.sh
- 启动YARNA如未用到hadoop计算可不启动。
$ cd /opt/software/hadoop-3.3.3/sbin
$ ./start-yarn.sh
- 至此,hadoop HA集群搭建完成,可以使用一些常用的命令进行hadoop集群测试。
7.2 非首次启动
- 启动HDFS在hdfs001.local.com
$ cd /opt/software/hadoop-3.3.3/sbin
$ ./start-dfs.sh
- 启动YARN在hdfs001.local.com上执行,根据实际需求启动。
$ cd /opt/software/hadoop-3.3.3/sbin
$ ./start-yarn.sh
7.3 查询集群状态
- 获取namenode节点状态
# hdfs001.local.com的状态
$ hdfs haadmin –getServiceState nn1
# hdfs002.local.com的状态
$ hdfs haadmin –getServiceState nn2
- 集群是否处在安全模式
$ hdfs dfsadmin -safemode get