Pig Latin Basics#
Pig Latin – Data Model#
A bag is a collection of tuples.
A tuple is an ordered set of fields.
A field is a piece of data.
Pig Latin – Statemets#
grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as
( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
Pig Latin – Data types#
| S.N. | Data Type | Description | Example |
|---|---|---|---|
| 1 | int | Represents a signed 32-bit integer. | 8 |
| 2 | long | Represents a signed 64-bit integer. | 5L |
| 3 | float | Represents a signed 32-bit floating point. | 5.5F |
| 4 | double | Represents a 64-bit floating point. | 10.5 |
| 5 | chararray | Represents a character array (string) in Unicode UTF-8 format. | ‘tutorials point’ |
| 6 | Bytearray | Represents a Byte array (blob). | |
| 7 | Boolean | Represents a Boolean value. | true/ false. |
| 8 | Datetime | Represents a date-time. | 1970-01-01T00:00:00.000+00:00 |
| 9 | Biginteger | Represents a Java BigInteger. | 60708090709 |
| 10 | Bigdecimal | Represents a Java BigDecimal | 185.98376256272893883 |
| 11 | Tuple | A tuple is an ordered set of fields. | (raja, 30) |
| 12 | Bag | A bag is a collection of tuples. | {(raju,30),(Mohhammad,45)} |
| 13 | Map | A Map is a set of key-value pairs. | [ ‘name’#’Raju’, ‘age’#30] |
Pig Latin – Arithmetic Operators#
Suppose a = 10 and b = 20.
| Operator | Description | Example |
|---|---|---|
| + | Addition − Adds values on either side of the operator | a + b will give 30 |
| − | Subtraction − Subtracts right hand operand from left hand operand | a − b will give −10 |
| * | Multiplication − Multiplies values on either side of the operator | a * b will give 200 |
| / | Division − Divides left hand operand by right hand operand | b / a will give 2 |
| % | Modulus − Divides left hand operand by right hand operand and returns remainder | b % a will give 0 |
| ? : | Bincond − Evaluates the Boolean operators. It has three operands as shown below.variable x = (expression) ? value1 if true : value2 if false. | b = (a == 1)? 20: 30; if a = 1 the value of b is 20. if a!=1 the value of b is 30. |
| CASE WHEN THEN ELSE END | Case − The case operator is equivalent to nested bincond operator. | CASE f2 % 2 WHEN 0 THEN 'even' WHEN 1 THEN 'odd' END |
Pig Latin – Comparison Operators#
| Operator | Description | Example |
|---|---|---|
| == | Equal − Checks if the values of two operands are equal or not; if yes, then the condition becomes true. | (a = b) is not true |
| != | Not Equal − Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true. | (a != b) is true. |
| > | Greater than − Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true. | (a > b) is not true. |
| < | Less than − Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true. | (a < b) is true. |
| >= | Greater than or equal to − Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true. | (a >= b) is not true. |
| <= | Less than or equal to − Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true. | (a <= b) is true. |
| matches | Pattern matching − Checks whether the string in the left-hand side matches with the constant in the right-hand side. | f1 matches '.tutorial.' |
Pig Latin – Type Construction Operators#
| Operator | Description | Example |
|---|---|---|
| () | Tuple constructor operator − This operator is used to construct a tuple. | (Raju, 30) |
| {} | Bag constructor operator − This operator is used to construct a bag. | {(Raju, 30), (Mohammad, 45)} |
| [] | Map constructor operator − This operator is used to construct a tuple. | [name#Raja, age#30] |
Pig Latin – Relational Operations#
| Operator | Description |
|---|---|
| LOAD | To Load the data from the file system (local/HDFS) into a relation. |
| STORE | To save a relation to the file system (local/HDFS). |
| FILTER | To remove unwanted rows from a relation. |
| DISTINCT | To remove duplicate rows from a relation. |
| FOREACH, GENERATE | To generate data transformations based on columns of data. |
| STREAM | To transform a relation using an external program. |
| JOIN | To join two or more relations. |
| COGROUP | To group the data in two or more relations. |
| GROUP | To group the data in a single relation. |
| CROSS | To create the cross product of two or more relations. |
| ORDER | To arrange a relation in a sorted order based on one or more fields (ascending or descending). |
| LIMIT | To get a limited number of tuples from a relation. |
| UNION | To combine two or more relations into a single relation. |
| SPLIT | To split a single relation into two or more relations. |
| DUMP | To print the contents of a relation on the console. |
| DESCRIBE | To describe the schema of a relation. |
| EXPLAIN | To view the logical, physical, or MapReduce execution plans to compute a relation. |
| ILLUSTRATE | To view the step-by-step execution of a series of statements. |