Pig Execution

Execution Modes

  • Local Mode

  • Tez Local Mode

  • Spark Local Mode

  • MapReduce Mode

  • Tez Mode

  • Spark Mode

/* local mode */
$ pig -x local ...
 
/* Tez local mode */
$ pig -x tez_local ...
 
/* Spark local mode */
$ pig -x spark_local ...

/* mapreduce mode */
$ pig ...
or
$ pig -x mapreduce ...

/* Tez mode */
$ pig -x tez ...

/* Spark mode */
$ pig -x spark ...

Pig Execution Mechanisms

  • Interactive Mode (Grunt shell) − You can run Apache Pig in interactive mode using the Grunt shell. In this shell, you can enter the Pig Latin statements and get the output (using Dump operator).

  • Batch Mode (Script) − You can run Apache Pig in Batch mode by writing the Pig Latin script in a single file with .pig extension.

  • Embedded Mode (UDF) − Apache Pig provides the provision of defining our own functions (User Defined Functions) in programming languages such as Java, and using them in our script.

Invoking the Grunt Shell

Local mode MapReduce mode
$ ./pig -x local $ ./pig -x mapreduce

Batch Mode

Local mode MapReduce mode
$ pig -x local sample_script.pig $ pig -x mapreduce sample_script.pig
mkdir pig
cp /etc/passwd ~/pig

vi id.pig

/* id.pig */

A = load 'passwd' using PigStorage(':');  -- load the passwd file 
B = foreach A generate $0 as id;  -- extract the user IDs 
store B into 'id.out';  -- write the results to a file name id.out
pig -x local id.pig
cat id.out/part-m-00000

Tips

$ pig -help
vi $PIG_HOME/conf/pig.properties
pig.logfile=/tmp/pig-err.log
tail -f -n 99 /tmp/pig-err.log

References