Streaming#
Hadoop Streaming#
hadoop jar /home/hadoop/hadoop-streaming-2.9.2.jar \
-input /input_dir \
-output myOutputDir \
-mapper /bin/cat \
-reducer /usr/bin/wc
Streaming Command Options#
hadoop command [genericOptions] [streamingOptions]
| Parameter | Optional/Required | Description |
|---|---|---|
| -input directoryname or filename | Required | Input location for mapper |
| -output directoryname | Required | Output location for reducer |
| -mapper executable or JavaClassName | Optional | Mapper executable. If not specified, IdentityMapper is used as the default |
| -reducer executable or JavaClassName | Optional | Reducer executable. If not specified, IdentityReducer is used as the default |
| -file filename | Optional | Make the mapper, reducer, or combiner executable available locally on the compute nodes |
| -inputformat JavaClassName | Optional | Class you supply should return key/value pairs of Text class. If not specified, TextInputFormat is used as the default |
| -outputformat JavaClassName | Optional | Class you supply should take key/value pairs of Text class. If not specified, TextOutputformat is used as the default |
| -partitioner JavaClassName | Optional | Class that determines which reduce a key is sent to |
| -combiner streamingCommand or JavaClassName | Optional | Combiner executable for map output |
| -cmdenv name=value | Optional | Pass environment variable to streaming commands |
| -inputreader | Optional | For backwards-compatibility: specifies a record reader class (instead of an input format class) |
| -verbose | Optional | Verbose output |
| -lazyOutput | Optional | Create output lazily. For example, if the output format is based on FileOutputFormat, the output file is created only on the first call to Context.write |
| -numReduceTasks | Optional | Specify the number of reducers |
| -mapdebug | Optional | Script to call when map task fails |
| -reducedebug | Optional | Script to call when reduce task fails |
Specifying a Java Class as the Mapper/Reducer#
hadoop jar /home/hadoop/hadoop-streaming-2.9.2.jar \
-input /input_dir \
-output myOutputDir \
-inputformat org.apache.hadoop.mapred.KeyValueTextInputFormat \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /usr/bin/wc