hadoop大数据平台分布式集群（大数据平台Hadoop集群搭建）

作者：Linux-1874出处：https://www.cnblogs.com/qiuhom-1874/HDFS架构提示：从上图架构可以看到，客户端访问hdfs上的某一文件，首先要向namenode请求文件的元数据信息，然后nn就会告诉客户端，访问的文件在datanode上的位置，然后客户端再依次向datanode请求对应的数据，最后拼接成一个完整的文件；这里需要注意一个概念，datanode存放

作者：Linux-1874

出处：https://www.cnblogs.com/qiuhom-1874/

HDFS架构

提示：从上图架构可以看到，客户端访问hdfs上的某一文件，首先要向namenode请求文件的元数据信息，然后nn就会告诉客户端，访问的文件在datanode上的位置，然后客户端再依次向datanode请求对应的数据，最后拼接成一个完整的文件；这里需要注意一个概念，datanode存放文件数据是按照文件大小和块大小来切分存放的，什么意思呢？比如一个文件100M大小，假设dn（datanode）上的块大小为10M一块，那么它存放在dn上是把100M切分为10M一块，共10块，然后把这10块数据分别存放在不同的dn上；同时这些块分别存放在不同的dn上，还会分别在不同的dn上存在副本，这样一来使得一个文件的数据块被多个dn分散冗余的存放；对于nn节点，它主要维护了那个文件的数据存放在那些节点，和那些dn存放了那些文件的数据块（这个数据是通过dn周期性的向nn发送）；我们可以理解为nn内部有两张表分别记录了那些文件的数据块分别存放在那些dn上（以文件为中心），和那些dn存放了那些文件的数据块（以节点为中心）；从上面的描述不难想象，当nn挂掉以后，整个存放在hdfs上的文件都将找不到，所以在生产中我们会使用zk（zookeeper）来对nn节点做高可用；对于hdfs来讲，它本质上不是内核文件系统，所以它依赖本地Linux文件系统；

mapreduce计算过程

提示：如上图所示，首先mapreduce会把给定的数据切分为多个（切分之前通过程序员写程序实现把给定的数据切分为多分，并抽取成kv键值对），然后启动多个mapper对其进行map计算，多个mapper计算后的结果在通过combiner进行合并（combiner是有程序员编写程序实现，主要实现合并规则），把相同key的值根据某种计算规则合并在一起，然后把结果在通过partitoner（分区器，这个分区器是通过程序员写程序实现，主要实现对map后的结果和对应reducer进行关联）分别发送给不同的reducer进行计算，最终每个reducer会产生一个最终的唯一结果；简单讲mapper的作用是读入kv键值对，输出新的kv键值对，会有新的kv产生；combiner的作用是把当前mapper生成的新kv键值对进行相同key的键值对进行合并，至于怎么合并，合并规则是什么是由程序员定义，所以combiner就是程序员写的程序实现，本质上combiner是读入kv键值对，输出kv键值对，不会产生新的kv；partitioner的作用就是把combiner合并后的键值对进行调度至reducer，至于怎么调度，该发往那个reducer，以及由几个reducer进行处理，由程序员定义；最终reducer折叠计算以后生成新的kv键值对；

hadoop v1与v2架构

提示：在hadoop v1的架构中，所有计算任务都跑在mapreduce之上，mapreduce就主要担任了两个角色，第一个是集群资源管理器和数据处理；到了hadoop v2 其架构就为hdfs yarn 一堆任务，其实我们可以把一堆任务理解为v1中的mapreduce，不同于v1中的mapreduce，v2中mapreduce只负责数据计算，不在负责集群资源管理，集群资源管理由yarn实现；对于v2来讲其计算任务都跑在了执yarn之上；对于hdfs来讲，v1和v2中的作用都是一样的，都是起存储文件作用；

hadoop v2 计算任务资源调度过程

提示：rm（resource manager）收到客户端的任务请求，此时rm会根据各dn上运行的nm(node manager)周期性报告的状态信息来决定把客户端的任务调度给那个nm来执行；当rm选定好nm后，就把任务发送给对应nm，对应nm内部会起一个appmaster（am）的容器，负责本次任务的主控端，而appmaster需要启动container来运行任务，它会向rm请求，然后rm会根据am的请求在对应的nm上启动一个或多个container；最后各container运行后的结果会发送给am，然后再由am返回给rm，rm再返回给客户端；在这其中rm主要用来接收个nm发送的各节点状态信息和资源调度以及接收各am计算任务后的结果并反馈给各客户端；nm主要用来管理各node上的资源和上报状态信息给rm；am主要用来管理各任务的资源申请和各任务执行后端结果返回给rm；

hadoop生态圈

提示：上图是hadoop v2生态圈架构图，其中hdfs和yarn是hadoop的核心组件，对于运行在其上的各种任务都必须依赖hadoop，也必须支持调用mapreduce接口；

二、hadoop集群部署

环境说明

名称角色ipnode01nn,snn,rm192.168.0.41node02dn,nm192.168.0.42node03dn,nm192.168.0.43node04dn,nm192.168.0.44

各节点同步时间

配置/etc/hosts解析个节点主机名

各节点安装jdk

yum install -y java-1.8.0-openjdk-devel

提示：安装devel包才会有jps命令

验证jdk是否安装完成，版本是否正确，确定java命令所在位置

添加JAVA_HOME环境变量

验证JAVA_HOME变量配置是否正确

创建目录，用于存放hadoop安装包

mkdir /bigdata

到此基础环境就准备OK，接下来下载hadoop二进制包

[root@node01 ~]# wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz--2020-09-27 22:50:16--https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gzResolving mirror.bit.edu.cn (mirror.bit.edu.cn)... 202.204.80.77, 219.143.204.117, 2001:da8:204:1205::22Connecting to mirror.bit.edu.cn (mirror.bit.edu.cn)|202.204.80.77|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 366447449 (349M) [application/octet-stream]Saving to: ‘hadoop-2.9.2.tar.gz’100%[============================================================================>] 366,447,449 1.44MB/sin 2m 19s 2020-09-27 22:52:35 (2.51 MB/s) - ‘hadoop-2.9.2.tar.gz’ saved [366447449/366447449][root@node01 ~]# lshadoop-2.9.2.tar.gz[root@node01 ~]#

解压hadoop-2.9.3.tar.gz到/bigdata/目录，并将解压到目录链接至hadoop

导出hadoop环境变量配置

[root@node01 ~]# cat /etc/profile.d/hadoop.shexport HADOOP_HOME=/bigdata/hadoopexport PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbinexport HADOOP_YARN_HOME=${HADOOP_HOME}export HADOOP_MAPPERD_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}[root@node01 ~]#

创建hadoop用户，并设置其密码为admin

[root@node01 ~]# useradd hadoop[root@node01 ~]# echo "admin" |passwd --stdin hadoopChanging password for user hadoop.passwd: all authentication tokens updated successfully.[root@node01 ~]#

各节点间hadoop用户做免密登录

[hadoop@node01 ~]$ ssh-keygen Generating public/private rsa key pair.Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Created directory '/home/hadoop/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:SHA256:6CNhqdagySJXc4iRBVSoLENddO7JLZMCsdjQzqSFnmw hadoop@node01.test.orgThe key's randomart image is: ---[RSA 2048]---- | o*==o .|| o=Bo o||=oX.|| E =.oo.||o.o B.oBS.||.o * =. o||=.o o||oo. .||| ----[SHA256]----- [hadoop@node01 ~]$ ssh-copy-id node01/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"The authenticity of host 'node01 (192.168.0.41)' can't be established.ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr oE57oE.ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.Are you sure you want to continue connecting (yes/no)? yes/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshadoop@node01's password: Number of key(s) added: 1Now try logging into the machine, with:"ssh 'node01'"and check to make sure that only the key(s) you wanted were added.[hadoop@node01 ~]$ scp -r ./.ssh node02:/home/hadoop/The authenticity of host 'node02 (192.168.0.42)' can't be established.ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr oE57oE.ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'node02,192.168.0.42' (ECDSA) to the list of known hosts.hadoop@node02's password: id_rsa100% 1679636.9KB/s00:00id_rsa.pub100%404186.3KB/s00:00known_hosts100%362153.4KB/s00:00authorized_keys100%404203.9KB/s00:00[hadoop@node01 ~]$ scp -r ./.ssh node03:/home/hadoop/The authenticity of host 'node03 (192.168.0.43)' can't be established.ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr oE57oE.ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'node03,192.168.0.43' (ECDSA) to the list of known hosts.hadoop@node03's password:id_rsa100% 1679755.1KB/s00:00id_rsa.pub100%404165.7KB/s00:00known_hosts100%543350.9KB/s00:00authorized_keys100%404330.0KB/s00:00[hadoop@node01 ~]$ scp -r ./.ssh node04:/home/hadoop/The authenticity of host 'node04 (192.168.0.44)' can't be established.ECDSA key fingerprint is SHA256:lE8/Vyni4z8hsXaa8OMMlDpu3yOIRh6dLcIr oE57oE.ECDSA key fingerprint is MD5:14:59:02:30:c0:16:b8:6c:1a:84:c3:0f:a7:ac:67:b3.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'node04,192.168.0.44' (ECDSA) to the list of known hosts.hadoop@node04's password: id_rsa100% 1679707.0KB/s00:00id_rsa.pub100%404172.8KB/s00:00known_hosts100%724437.7KB/s00:00authorized_keys100%404165.2KB/s00:00[hadoop@node01 ~]$

验证：用node01去连接node02,node03,node04看看是否是免密登录了

创建数据目录/data/hadoop/hdfs/{nn,snn,dn},并将其属主属组更改为hadoop

进入到hadoop安装目录，创建其logs目录，并将其安装目录的属主和属组更改为hadoop

提示：以上所有步骤都需要在各节点挨着做一遍；

配置hadoop的core-site.xml

提示：hadoop的配置文件语法都是xml格式的配置文件，其中<property>和</property>是一对标签，里面用name标签来引用配置的选项的key的名称，其value标签用来配置对应key的值；上面配置表示配置默认的文件系统地址；hdfs://node01:8020是hdfs文件系统访问的地址；

完整的配置

[root@node01 hadoop]# cat core-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>fs.defaultFS</name><value>hdfs://node01:8020</value><final>true</final></property></configuration>[root@node01 hadoop]# View Code

配置hdfs-site.xml

提示：以上配置主要指定hdfs相关目录以及访问web端口信息，副本数量；

完整的配置

[root@node01 hadoop]# cat hdfs-site.xml<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.namenode.name.dir</name><value>file:///data/hadoop/hdfs/nn</value></property><property><name>dfs.namenode.secondary.http-address</name><value>node01:50090</value></property><property><name>dfs.namenode.http-address</name><value>node01:50070</value></property><property><name>dfs.datanode.data.dir</name><value>file:///data/hadoop/hdfs/dn</value></property><property><name>fs.checkpoint.dir</name><value>file:///data/hadoop/hdfs/snn</value></property><property><name>fs.checkpoint.edits.dir</name><value>file:///data/hadoop/hdfs/snn</value></property></configuration>[root@node01 hadoop]# View Code

配置mapred-site.xml

提示：以上配置主要指定了mapreduce的框架为yarn;默认没有mapred-site.xml，我们需要将mapred-site.xml.template修改成mapred.site.xml；这里需要注意我上面是通过复制修改文件名，当然属主信息都会变成root，不要忘记把属组信息修改成hadoop;

完整的配置

[root@node01 hadoop]# cat mapred-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>[root@node01 hadoop]# View Code

配置yarn-site.xml

提示：以上配置主要配置了yarn框架rm和nm相关地址和指定相关类；

完整的配置

[root@node01 hadoop]# cat yarn-site.xml<?xml version="1.0"?><configuration><property><name>yarn.resourcemanager.address</name><value>node01:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>node01:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>node01:8031</value></property><property><name>yarn.resourcemanager.admin.address</name><value>node01:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>node01:8088</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value></property></configuration>[root@node01 hadoop]# View Code

配置slave.xml

[root@node01 hadoop]# cat slaves node02node03node04[root@node01 hadoop]#

复制各配置文件到其他节点

到此hadoop配置就完成了；

接下来切换到hadoop用户下，初始化hdfs

hdfs namenode -format

提示：如果执行hdfs namenode -format 出现红框中的提示，说明hdfs格式化就成功了；

启动hdfs集群

提示：hdfs主要由namenode、secondarynamenode和datanode组成，只要看到对应节点上的进程启动起来，就没有多大问题；

到此hdfs集群就正常启动了

验证：把/etc/passwd上传到hdfs的/test目录下，看看是否可以正常上传？

提示：可以看到/etc/passwd文件已经上传至hdfs的/test目录下了；

验证：查看hdfs /test目录下passwd文件，看看是否同/etc/passwd文件内容相同？

提示：可以看到hdfs上的/test/passwd文件内容同/etc/passwd文件内容相同；

验证：在dn节点查看对应目录下的文件内容，看看是否同/etc/passwd文件内容相同？

[root@node02 ~]# tree /data/data└── hadoop└── hdfs├── dn│├── current││├── BP-157891879-192.168.0.41-1601224158145│││├── current││││├── finalized│││││└── subdir0│││││└── subdir0│││││ ├── blk_1073741825│││││ └── blk_1073741825_1001.meta││││├── rbw││││└── VERSION│││├── scanner.cursor│││└── tmp││└── VERSION│└── in_use.lock├── nn└── snn13 directories, 6 files[root@node02 ~]# cat /data/hadoop/hdfs/dn/current/BP-157891879-192.168.0.41-1601224158145/current/scanner.cursortmp/[root@node02 ~]# cat /data/hadoop/hdfs/dn/current/BP-157891879-192.168.0.41-1601224158145/current/finalized/subdir0/subdir0/blk_1073741825root:x:0:0:root:/root:/bin/bashbin:x:1:1:bin:/bin:/sbin/nologindaemon:x:2:2:daemon:/sbin:/sbin/nologinadm:x:3:4:adm:/var/adm:/sbin/nologinlp:x:4:7:lp:/var/spool/lpd:/sbin/nologinsync:x:5:0:sync:/sbin:/bin/syncshutdown:x:6:0:shutdown:/sbin:/sbin/shutdownhalt:x:7:0:halt:/sbin:/sbin/haltmail:x:8:12:mail:/var/spool/mail:/sbin/nologinoperator:x:11:0:operator:/root:/sbin/nologingames:x:12:100:games:/usr/games:/sbin/nologinftp:x:14:50:FTP User:/var/ftp:/sbin/nologinnobody:x:99:99:Nobody:/:/sbin/nologinsystemd-network:x:192:192:systemd Network Management:/:/sbin/nologindbus:x:81:81:System message bus:/:/sbin/nologinpolkitd:x:999:997:User for polkitd:/:/sbin/nologinpostfix:x:89:89::/var/spool/postfix:/sbin/nologinsshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologinntp:x:38:38::/etc/ntp:/sbin/nologintcpdump:x:72:72::/:/sbin/nologinchrony:x:998:996::/var/lib/chrony:/sbin/nologinhadoop:x:1000:1000::/home/hadoop:/bin/bash[root@node02 ~]#

提示：可以看到在dn节点上的dn目录下能够找到我们上传的passwd文件；

验证：查看其它节点是否有相同的文件？是否有我们指定数量的副本？

提示：在node03和node04上也有相同的目录和文件；说明我们设置的副本数量为3生效了；

启动yarn集群

提示：可以看到对应节点上的nm启动了；主节点上的rm也正常启动了；

访问nn的50070和8088，看看对应的web地址是否能够访问到页面？

提示：这个地址是hdfs的web地址，在这个界面可以看到hdfs的存储状况，以及对hdfs上的文件做操作；

提示：8088是yarn集群的管理地址；在这个界面上能够看到运行的计算任务的状态信息，集群配置信息，日志等等；

验证：在yarn上跑一个计算任务，统计/test/passwd文件的单词数量，看看对应的计算任务是否能够跑起来？

[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jarAn example program must be given as the first argument.Valid program names are:aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.dbcount: An example job that count the pageview counts from a database.distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.grep: A map/reduce program that counts the matches of a regex in the input.join: A job that effects a join over sorted, equally partitioned datasetsmultifilewc: A job that counts words from several files.pentomino: A map/reduce tile laying program to find solutions to pentomino problems.pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.randomwriter: A map/reduce program that writes 10GB of random data per node.secondarysort: An example defining a secondary sort to the reduce.sort: A map/reduce program that sorts the data written by the random writer.sudoku: A sudoku solver.teragen: Generate data for the terasortterasort: Run the terasortteravalidate: Checking results of terasortwordcount: A map/reduce program that counts the words in the input files.wordmean: A map/reduce program that counts the average length of the words in the input files.wordmedian: A map/reduce program that counts the median length of the words in the input files.wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcountUsage: wordcount <in> [<in>...] <out>[hadoop@node01 hadoop]$ yarn jar /bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/passwd /test/passwd-word-count20/09/28 00:58:01 INFO client.RMProxy: Connecting to ResourceManager at node01/192.168.0.41:803220/09/28 00:58:01 INFO input.FileInputFormat: Total input files to process : 120/09/28 00:58:01 INFO mapreduce.JobSubmitter: number of splits:120/09/28 00:58:01 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled20/09/28 00:58:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1601224871685_000120/09/28 00:58:02 INFO impl.YarnClientImpl: Submitted application application_1601224871685_000120/09/28 00:58:02 INFO mapreduce.Job: The url to track the job: http://node01:8088/proxy/application_1601224871685_0001/20/09/28 00:58:02 INFO mapreduce.Job: Running job: job_1601224871685_000120/09/28 00:58:08 INFO mapreduce.Job: Job job_1601224871685_0001 running in uber mode : false20/09/28 00:58:08 INFO mapreduce.Job:map 0% reduce 0%20/09/28 00:58:14 INFO mapreduce.Job:map 100% reduce 0%20/09/28 00:58:20 INFO mapreduce.Job:map 100% reduce 100%20/09/28 00:58:20 INFO mapreduce.Job: Job job_1601224871685_0001 completed successfully20/09/28 00:58:20 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=1144FILE: Number of bytes written=399079FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=1053HDFS: Number of bytes written=1018HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job CountersLaunched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=2753Total time spent by all reduces in occupied slots (ms)=2779Total time spent by all map tasks (ms)=2753Total time spent by all reduce tasks (ms)=2779Total vcore-milliseconds taken by all map tasks=2753Total vcore-milliseconds taken by all reduce tasks=2779Total megabyte-milliseconds taken by all map tasks=2819072Total megabyte-milliseconds taken by all reduce tasks=2845696Map-Reduce FrameworkMap input records=22Map output records=30Map output bytes=1078Map output materialized bytes=1144Input split bytes=95Combine input records=30Combine output records=30Reduce input groups=30Reduce shuffle bytes=1144Reduce input records=30Reduce output records=30Spilled Records=60Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=87CPU time spent (ms)=620Physical memory (bytes) snapshot=444997632Virtual memory (bytes) snapshot=4242403328Total committed heap usage (bytes)=285212672Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format CountersBytes Read=958File Output Format CountersBytes Written=1018[hadoop@node01 hadoop]$

查看计算后生成的报告

[hadoop@node01 hadoop]$ hdfs dfs -ls -R /test-rw-r--r--3 hadoop supergroup958 2020-09-28 00:32 /test/passwddrwxr-xr-x- hadoop supergroup0 2020-09-28 00:58 /test/passwd-word-count-rw-r--r--3 hadoop supergroup0 2020-09-28 00:58 /test/passwd-word-count/_SUCCESS-rw-r--r--3 hadoop supergroup1018 2020-09-28 00:58 /test/passwd-word-count/part-r-00000[hadoop@node01 hadoop]$ hdfs dfs -cat /test/passwd-word-count/part-r-00000Management:/:/sbin/nologin1Network 1SSH:/var/empty/sshd:/sbin/nologin1User:/var/ftp:/sbin/nologin1adm:x:3:4:adm:/var/adm:/sbin/nologin1bin:x:1:1:bin:/bin:/sbin/nologin1bus:/:/sbin/nologin1chrony:x:998:996::/var/lib/chrony:/sbin/nologin 1daemon:x:2:2:daemon:/sbin:/sbin/nologin 1dbus:x:81:81:System1for1ftp:x:14:50:FTP 1games:x:12:100:games:/usr/games:/sbin/nologin1hadoop:x:1000:1000::/home/hadoop:/bin/bash1halt:x:7:0:halt:/sbin:/sbin/halt1lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin1mail:x:8:12:mail:/var/spool/mail:/sbin/nologin1message 1nobody:x:99:99:Nobody:/:/sbin/nologin1ntp:x:38:38::/etc/ntp:/sbin/nologin1operator:x:11:0:operator:/root:/sbin/nologin1polkitd:/:/sbin/nologin 1polkitd:x:999:997:User1postfix:x:89:89::/var/spool/postfix:/sbin/nologin1root:x:0:0:root:/root:/bin/bash 1shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown1sshd:x:74:74:Privilege-separated1sync:x:5:0:sync:/sbin:/bin/sync 1systemd-network:x:192:192:systemd1tcpdump:x:72:72::/:/sbin/nologin1[hadoop@node01 hadoop]$

在8088页面上查看任务的状态信息

到此hadoop v2集群就搭建完毕了；

作者：Linux-1874

出处：https://www.cnblogs.com/qiuhom-1874/