Pig Latin Basics

Pig Latin – Data Model

  • A bag is a collection of tuples.

  • A tuple is an ordered set of fields.

  • A field is a piece of data.

Pig Latin – Statemets

grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as 
   ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Pig Latin – Data types

S.N. Data Type Description Example
1 int Represents a signed 32-bit integer. 8
2 long Represents a signed 64-bit integer. 5L
3 float Represents a signed 32-bit floating point. 5.5F
4 double Represents a 64-bit floating point. 10.5
5 chararray Represents a character array (string) in Unicode UTF-8 format. β€˜tutorials point’
6 Bytearray Represents a Byte array (blob).
7 Boolean Represents a Boolean value. true/ false.
8 Datetime Represents a date-time. 1970-01-01T00:00:00.000+00:00
9 Biginteger Represents a Java BigInteger. 60708090709
10 Bigdecimal Represents a Java BigDecimal 185.98376256272893883
11 Tuple A tuple is an ordered set of fields. (raja, 30)
12 Bag A bag is a collection of tuples. {(raju,30),(Mohhammad,45)}
13 Map A Map is a set of key-value pairs. [ β€˜name’#’Raju’, β€˜age’#30]

Pig Latin – Arithmetic Operators

Suppose a = 10 and b = 20.

Operator Description Example
+ Addition βˆ’ Adds values on either side of the operator a + b will give 30
βˆ’ Subtraction βˆ’ Subtracts right hand operand from left hand operand a βˆ’ b will give βˆ’10
* Multiplication βˆ’ Multiplies values on either side of the operator a * b will give 200
/ Division βˆ’ Divides left hand operand by right hand operand b / a will give 2
% Modulus βˆ’ Divides left hand operand by right hand operand and returns remainder b % a will give 0
? : Bincond βˆ’ Evaluates the Boolean operators. It has three operands as shown below.variable x = (expression) ? value1 if true : value2 if false. b = (a == 1)? 20: 30; if a = 1 the value of b is 20. if a!=1 the value of b is 30.
CASE WHEN THEN ELSE END Case βˆ’ The case operator is equivalent to nested bincond operator. CASE f2 % 2 WHEN 0 THEN 'even' WHEN 1 THEN 'odd' END

Pig Latin – Comparison Operators

Operator Description Example
== Equal βˆ’ Checks if the values of two operands are equal or not; if yes, then the condition becomes true. (a = b) is not true
!= Not Equal βˆ’ Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true. (a != b) is true.
> Greater than βˆ’ Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true. (a > b) is not true.
< Less than βˆ’ Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true. (a < b) is true.
>= Greater than or equal to βˆ’ Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true. (a >= b) is not true.
<= Less than or equal to βˆ’ Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true. (a <= b) is true.
matches Pattern matching βˆ’ Checks whether the string in the left-hand side matches with the constant in the right-hand side. f1 matches '.tutorial.'

Pig Latin – Type Construction Operators

Operator Description Example
() Tuple constructor operator βˆ’ This operator is used to construct a tuple. (Raju, 30)
{} Bag constructor operator βˆ’ This operator is used to construct a bag. {(Raju, 30), (Mohammad, 45)}
[] Map constructor operator βˆ’ This operator is used to construct a tuple. [name#Raja, age#30]

Pig Latin – Relational Operations

Operator Description
LOAD To Load the data from the file system (local/HDFS) into a relation.
STORE To save a relation to the file system (local/HDFS).
FILTER To remove unwanted rows from a relation.
DISTINCT To remove duplicate rows from a relation.
FOREACH, GENERATE To generate data transformations based on columns of data.
STREAM To transform a relation using an external program.
JOIN To join two or more relations.
COGROUP To group the data in two or more relations.
GROUP To group the data in a single relation.
CROSS To create the cross product of two or more relations.
ORDER To arrange a relation in a sorted order based on one or more fields (ascending or descending).
LIMIT To get a limited number of tuples from a relation.
UNION To combine two or more relations into a single relation.
SPLIT To split a single relation into two or more relations.
DUMP To print the contents of a relation on the console.
DESCRIBE To describe the schema of a relation.
EXPLAIN To view the logical, physical, or MapReduce execution plans to compute a relation.
ILLUSTRATE To view the step-by-step execution of a series of statements.

References