hadoop中mapreduce的示例代码

32次阅读
没有评论

这篇文章主要介绍 hadoop 中 mapreduce 的示例代码,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper LongWritable, Text, Text, LongWritable {
   
   @Override
   protected void map(LongWritable key, Text value,Context context)
       throws IOException, InterruptedException {

     // 获取到一行文件的内容
     String line = value.toString();
     // 切分这一行的内容为一个单词数组
     String[] words = StringUtils.split(line,
     // 遍历输出   word,1
     for(String word:words){
       
       context.write(new Text(word), new LongWritable(1));
       
     }
     
     
     
     
   }
   
   
   
   

}
package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer Text, LongWritable, Text, LongWritable {
   
   
   // key: hello ,  values : {1,1,1,1,1…..}
   @Override
   protected void reduce(Text key, Iterable LongWritable values,Context context)
       throws IOException, InterruptedException {
     
     // 定义一个累加计数器
     long count = 0;
     for(LongWritable value:values){
       
       count += value.get();
       
     }
     
     // 输出 单词:count 键值对
     context.write(key, new LongWritable(count));
     
   }
   
   

}

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 用来描述一个作业 job(使用哪个 mapper 类,哪个 reducer 类,输入文件在哪,输出结果放哪。。。。)
 * 然后提交这个 job 给 hadoop 集群
 * @author duanhaitao@itcast.cn
 *
 */
//cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
public class WordCountRunner {

   public static void main(String[] args) throws Exception {
     Configuration conf = new Configuration();
     Job wcjob = Job.getInstance(conf);
     // 设置 job 所使用的 jar 包
     conf.set(mapreduce.job.jar , wcount.jar
     
     // 设置 wcjob 中的资源所在的 jar 包
     wcjob.setJarByClass(WordCountRunner.class);
     
     
     //wcjob 要使用哪个 mapper 类
     wcjob.setMapperClass(WordCountMapper.class);
     //wcjob 要使用哪个 reducer 类
     wcjob.setReducerClass(WordCountReducer.class);
     
     //wcjob 的 mapper 类输出的 kv 数据类型
     wcjob.setMapOutputKeyClass(Text.class);
     wcjob.setMapOutputValueClass(LongWritable.class);
     
     //wcjob 的 reducer 类输出的 kv 数据类型
     wcjob.setOutputKeyClass(Text.class);
     wcjob.setOutputValueClass(LongWritable.class);
     
     // 指定要处理的原始数据所存放的路径
     FileInputFormat.setInputPaths(wcjob, hdfs://192.168.88.155:9000/wc/srcdata
   
     // 指定处理之后的结果输出到哪个路径
     FileOutputFormat.setOutputPath(wcjob, new Path( hdfs://192.168.88.155:9000/wc/output));
     
     boolean res = wcjob.waitForCompletion(true);
     
     System.exit(res?0:1);
     
     
   }
   
   
   
}

打包成 mr.jar 放在 hadoop server 上

[root@hadoop02 ~]# hadoop jar /root/Desktop/mr.jar cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
Java HotSpot(TM) Client VM warning: You have loaded library /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It s highly recommended that you fix the library with execstack -c libfile , or link it with -z noexecstack .
15/12/05 06:07:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
15/12/05 06:07:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/192.168.88.155:8032
15/12/05 06:07:08 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/05 06:07:09 INFO input.FileInputFormat: Total input paths to process : 1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: number of splits:1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449322432664_0001
15/12/05 06:07:10 INFO impl.YarnClientImpl: Submitted application application_1449322432664_0001
15/12/05 06:07:10 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1449322432664_0001/
15/12/05 06:07:10 INFO mapreduce.Job: Running job: job_1449322432664_0001
15/12/05 06:07:22 INFO mapreduce.Job: Job job_1449322432664_0001 running in uber mode : false
15/12/05 06:07:22 INFO mapreduce.Job:  map 0% reduce 0%
15/12/05 06:07:32 INFO mapreduce.Job:  map 100% reduce 0%
15/12/05 06:07:39 INFO mapreduce.Job:  map 100% reduce 100%
15/12/05 06:07:40 INFO mapreduce.Job: Job job_1449322432664_0001 completed successfully
15/12/05 06:07:41 INFO mapreduce.Job: Counters: 49
  File System Counters
  FILE: Number of bytes read=635
  FILE: Number of bytes written=212441
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=338
  HDFS: Number of bytes written=223
  HDFS: Number of read operations=6
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=2
  Job Counters
  Launched map tasks=1
  Launched reduce tasks=1
  Data-local map tasks=1
  Total time spent by all maps in occupied slots (ms)=7463
  Total time spent by all reduces in occupied slots (ms)=4688
  Total time spent by all map tasks (ms)=7463
  Total time spent by all reduce tasks (ms)=4688
  Total vcore-seconds taken by all map tasks=7463
  Total vcore-seconds taken by all reduce tasks=4688
  Total megabyte-seconds taken by all map tasks=7642112
  Total megabyte-seconds taken by all reduce tasks=4800512
  Map-Reduce Framework
  Map input records=10
  Map output records=41
  Map output bytes=547
  Map output materialized bytes=635
  Input split bytes=114
  Combine input records=0
  Combine output records=0
  Reduce input groups=30
  Reduce shuffle bytes=635
  Reduce input records=41
  Reduce output records=30
  Spilled Records=82
  Shuffled Maps =1
  Failed Shuffles=0
  Merged Map outputs=1
  GC time elapsed (ms)=211
  CPU time spent (ms)=1350
  Physical memory (bytes) snapshot=221917184
  Virtual memory (bytes) snapshot=722092032
  Total committed heap usage (bytes)=137039872
  Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
  File Input Format Counters
  Bytes Read=224
  File Output Format Counters
  Bytes Written=223

以上是“hadoop 中 mapreduce 的示例代码”这篇文章的所有内容,感谢各位的阅读!希望分享的内容对大家有帮助,更多相关知识,欢迎关注丸趣 TV 行业资讯频道!