Streaming#
Hadoop Streaming#
hadoop jar /home/hadoop/hadoop-streaming-2.9.2.jar \
-input /input_dir \
-output myOutputDir \
-mapper /bin/cat \
-reducer /usr/bin/wc
Streaming Command Options#
hadoop command [genericOptions] [streamingOptions]
Parameter | Optional/Required | Description |
---|---|---|
-input directoryname or filename | Required | Input location for mapper |
-output directoryname | Required | Output location for reducer |
-mapper executable or JavaClassName | Optional | Mapper executable. If not specified, IdentityMapper is used as the default |
-reducer executable or JavaClassName | Optional | Reducer executable. If not specified, IdentityReducer is used as the default |
-file filename | Optional | Make the mapper, reducer, or combiner executable available locally on the compute nodes |
-inputformat JavaClassName | Optional | Class you supply should return key/value pairs of Text class. If not specified, TextInputFormat is used as the default |
-outputformat JavaClassName | Optional | Class you supply should take key/value pairs of Text class. If not specified, TextOutputformat is used as the default |
-partitioner JavaClassName | Optional | Class that determines which reduce a key is sent to |
-combiner streamingCommand or JavaClassName | Optional | Combiner executable for map output |
-cmdenv name=value | Optional | Pass environment variable to streaming commands |
-inputreader | Optional | For backwards-compatibility: specifies a record reader class (instead of an input format class) |
-verbose | Optional | Verbose output |
-lazyOutput | Optional | Create output lazily. For example, if the output format is based on FileOutputFormat, the output file is created only on the first call to Context.write |
-numReduceTasks | Optional | Specify the number of reducers |
-mapdebug | Optional | Script to call when map task fails |
-reducedebug | Optional | Script to call when reduce task fails |
Specifying a Java Class as the Mapper/Reducer#
hadoop jar /home/hadoop/hadoop-streaming-2.9.2.jar \
-input /input_dir \
-output myOutputDir \
-inputformat org.apache.hadoop.mapred.KeyValueTextInputFormat \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /usr/bin/wc