hadoop 分布式安装

集群规划:

hostname os ip role
master centos7 192.168.3.100 NameNode, ResourceManager
master1 centos7 192.168.3.101 SecondaryNameNode
slave1 centos7 192.168.3.102 DateNode, NodeManager
slave2 centos7 192.168.3.103 DateNode, NodeManager
slave3 centos7 192.168.3.104 DateNode, NodeManager

基础环境配置

1. 修改主机名
hostnamectl set-hostname newhostname
2. 关闭防火墙和selinux
setenforce 0;sed 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
3. 配置hosts文件
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.3.100 master
192.168.3.101 master1
192.168.3.102 slave1
192.168.3.103 slave2
192.168.3.104 slave3
4. 配置ssh信任
ssh-keygen -t rsa -P '' -f '~/.ssh/id_rsa'

cat ~/.ssh/id_rsa.pub >> authorized_keys

vi ~/.ssh/config
host *master*
  StrictHostKeyChecking no
host *slave*
  StrictHostKeyChecking no
host localhost
  StrictHostKeyChecking no
host 127.0.0.1
  StrictHostKeyChecking no
5. 安装java
rpm -ivh jdk-8u171-linux-x64.rpm

配置java环境变量

vi /etc/profile
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
6. 安装hadoop

解压安装包到指定目录

tar xvzf hadoop-2.7.6.tar.gz -C /usr/local/
cd /usr/local
mv hadoop-2.7.6 hadoop

配置hadoop环境变量

vi /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin:$PATH

hadoop配置文件

集群/分布式模式需要修改 /local/hadoop/etc/hadoop 中的5个配置文件,更多设置项可点击查看官方说明: slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml

1. slaves
    vi etc/slaves

    slave1
    slave2
    slave3

2. core-site.xml
    vi etc/core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://master:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/usr/local/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
    </configuration>

3. hdfs-site.xml
    vi etc/hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.namenode.http-address</name>
            <value>master:50070</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>master1:50090</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>  
            <name>dfs.namenode.name.dir</name>  
            <value>file:/usr/local/hadoop/tmp/dfs/name</value>
        </property>  
        <property>  
            <name>dfs.datanode.data.dir</name>  
            <value>file:/usr/local/hadoop/tmp/dfs/data</value>  
        </property>  
    </configuration>


4. yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>master</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>

4. mapred-site.xml
    vi etc/mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>

复制文件至其他slave

scp -r /usr/local/hadoop slave1:/usr/local/
scp -r /usr/local/hadoop slave2:/usr/local/
scp -r /usr/local/hadoop slave3:/usr/local/
scp -r /usr/local/hadoop master1:/usr/local/

master节点格式化namenode

hdfs namenode -format

在 Master 节点上启动hadoop

start-dfs.sh

在 Master 节点上启动yarn

start-yarn.sh

web Interface

NameNode            http://192.168.3.100:50070        Default HTTP port is 50070
SecondaryNameNode   http://192.168.3.101:50090        Default HTTP port is 50090
ResourceManager     http://192.168.3.100:8088         Default HTTP port is 8088
Logo

更多推荐