博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop之MapReduce WordCount运行
阅读量:5211 次
发布时间:2019-06-14

本文共 3034 字,大约阅读时间需要 10 分钟。

搭建好Hadoop集群环境或者单机环境,并运行,MapReduce进程要起来

1. 假设已经配置了下列环境变量

export JAVA_HOME=/usr/java/defaultexport PATH=$JAVA_HOME/bin:$PATHexport HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

2.创建2个测试文件,并上传到Hadoop HDFS中

[hadoop@centos-one temp]$ cat file01Hello World Bye World[hadoop@centos-one temp]$ cat file02 Hello Hadoop Goodbye Hadoop [hadoop@centos-one temp]$ ../hadoop-2.6.0/bin/hdfs dfs -mkdir /wordcount [hadoop@centos-one temp]$ ../hadoop-2.6.0/bin/hdfs dfs -mkdir /wordcount/input ../hadoop-2.6.0/bin/hdfs dfs -put file* /wordcount/input 这个是删除HDFS文件夹的命令: ../hadoop-2.6.0/bin/hdfs dfs -rm -r /temp

3.编写WordCount类

import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {  public static class TokenizerMapper       extends Mapper
{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer
{ private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable
values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

4. 编译WordCount.java 并 打包为jar

$ bin/hadoop com.sun.tools.javac.Main WordCount.java$ jar cf wc.jar WordCount*.class

5.运行MapReduce程序

[hadoop@centos-one temp]$ ../hadoop-2.6.0/bin/hadoop  jar wc.jar WordCount /wordcount/input /wordcount/output

查看结果

../hadoop-2.6.0/bin/hadoop  dfs -cat /wordcount/output/part-r-00000 Bye    1 Goodbye    1 Hadoop    2 Hello    2 World    2

 

转载于:https://www.cnblogs.com/yanpengfei/p/4303914.html

你可能感兴趣的文章
【SICP练习】85 练习2.57
查看>>
runC爆严重安全漏洞,主机可被攻击!使用容器的快打补丁
查看>>
Maximum Product Subarray
查看>>
solr相关配置翻译
查看>>
通过beego快速创建一个Restful风格API项目及API文档自动化(转)
查看>>
解决DataSnap支持的Tcp长连接数受限的两种方法
查看>>
Synchronous/Asynchronous:任务的同步异步,以及asynchronous callback异步回调
查看>>
ASP.NET MVC5 高级编程-学习日记-第二章 控制器
查看>>
Hibernate中inverse="true"的理解
查看>>
高级滤波
查看>>
使用arcpy添加grb2数据到镶嵌数据集中
查看>>
[转载] MySQL的四种事务隔离级别
查看>>
QT文件读写
查看>>
C语言小项目-火车票订票系统
查看>>
15.210控制台故障分析(解决问题的思路)
查看>>
BS调用本地应用程序的步骤
查看>>
常用到的多种锁(随时可能修改)
查看>>
用UL标签+CSS实现的柱状图
查看>>
mfc Edit控件属性
查看>>
Linq使用Join/在Razor中两次反射取属性值
查看>>