本文共 5293 字,大约阅读时间需要 17 分钟。
namenode
resourcemanager JobHistoryServer HistoryServer
datanode
nodemanager
docker宿主机(virtualbox)
Intellij idea spark client hdfs client
sudo route -n add 172.18.0.0/24 192.168.99.100
docker network create --subnet=172.18.0.0/16 hadoopnet
说明:每个文件夹包含一个启动脚本和一个hdfs挂载的共享卷data
说明:本地的hadoop目录会挂载到docker中hadoop/etc/hadoop配置文件目录。
docker run --name namenode \--hostname namenode \--network hadoopnet \--ip 172.18.0.11 \-d \-v $PWD/data:/opt/tmp \-v /Users/wangsen/hadoop/datanode/hadoop:/opt/hadoop-2.7.3/etc/hadoop \-v $PWD/spark-2.1.1-bin-hadoop2.7:/opt/spark \--rm dbp/hadoop
dbp/hadoop是docker镜像的名字,共加载了3个共享卷(文件夹)
在创建镜像的时候没有装载spark,hadoop是通过Dockerfile创建dbp/hadoop时,装载到镜像中的;设置spark采用装载模式,也可以重新commit或build dockerfile生成包含spark的镜像。
docker run --name datanode1 --hostname datanode1 --network hadoopnet --ip 172.18.0.13 -d -v $PWD/data:/opt/tmp -v /Users/wangsen/hadoop/datanode/hadoop:/opt/hadoop-2.7.3/etc/hadoop --rm dbp/hadoop
docker run --name datanode2 --hostname datanode2 --network hadoopnet --ip 172.18.0.14 -d -v $PWD/data:/opt/tmp -v /Users/wangsen/hadoop/datanode/hadoop:/opt/hadoop-2.7.3/etc/hadoop --rm dbp/hadoop
## 配置HDFS路径fs.defaultFS hdfs://namenode:9000
dfs.replication 3 dfs.namenode.name.dir /opt/tmp dfs.datanode.data.dir /opt/tmp
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address namenode:18040 yarn.resourcemanager.scheduler.address namenode:18030 yarn.resourcemanager.resource-tracker.address namenode:18025 yarn.resourcemanager.admin.address namenode:18141 yarn.resourcemanager.webapp.address namenode:18088 yarn.log-aggregation-enable true yarn.log.server.url http://namenode:19888/jobhistory/logs yarn.nodemanager.vmem-check-enabled false yarn.nodemanager.pmem-check-enabled false
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
## 配置spark ui 页面,通过yarn history服务查看spark任务运行结果## hdfs:///tmp/spark/events是hdfs上的路径,保存spark运行信息spark.master=localspark.yarn.historyServer.address=namenode:18080spark.history.ui.port=18080spark.eventLog.enabled=truespark.eventLog.dir=hdfs:///tmp/spark/eventsspark.history.fs.logDirectory=hdfs:///tmp/spark/events
修改JAVA_HOME,填写java_home的绝对路径
如果你希望按作者的思路,搭建自己的spark docker集群,那么你可以从Dockerfile 创建image开始。
FROM ubuntu:16.04MAINTAINER wsnRUN apt-get updateRUN apt-get install -y openjdk-8-jdkRUN apt-get install -y vimRUN apt install -y net-toolsRUN apt install -y iputils-ping RUN apt-get install -y openssh-serverRUN mkdir /var/run/sshdRUN echo 'root:root' |chpasswdRUN sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_configRUN sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_configRUN sed -ri 's/# StrictHostKeyChecking ask/StrictHostKeyChecking no/' /etc/ssh/ssh_configRUN mkdir /root/.sshRUN ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsaRUN cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64ENV JRE_HOME /usr/lib/jvm/java-8-openjdk-amd64/jreENV PATH /opt/hadoop-2.7.3/bin:/opt/hadoop-2.7.3/sbin:/usr/lib/jvm/java-8-openjdk-amd64/bin:$PATHENV CLASSPATH ./:/usr/lib/jvm/java-8-openjdk-amd64/lib:/usr/lib/jvm/java-8-openjdk-amd64/jre/libADD hadoop-2.7.3.tar.gz /opt/EXPOSE 22CMD ["/usr/sbin/sshd", "-D"]
转载地址:http://bdqyx.baihongyu.com/