Saturday, February 20, 2016

Compression Settings in Mapreduce

1. To read compressed input files, and the output is also compressed
   
     In the driver class, add the below:

 jobConf.setBoolean("mapred.output.compress",true)      
jobConf.setClass("mapred.output.compression.codec","GzipCodec.class","CompressionCodec.class")

    Running the program over compressed input:

    % hadoop jar MaxTempWithCompression input/tanu/input.txt.gz output

    % gunzip -c output/part-r-00000.gz
         1949    111
         1950    20

2. To compress the mapper output
 
 jonconf.setCompressMapOutput(true);
 jobConf.setMapOutputCompressorClass(GzipCodec.class);

No comments:

Post a Comment