Pig Latin Basics#
Pig Latin – Data Model#
A bag is a collection of tuples.
A tuple is an ordered set of fields.
A field is a piece of data.
Pig Latin – Statemets#
grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as
( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
Pig Latin – Data types#
S.N. | Data Type | Description | Example |
---|---|---|---|
1 | int | Represents a signed 32-bit integer. | 8 |
2 | long | Represents a signed 64-bit integer. | 5L |
3 | float | Represents a signed 32-bit floating point. | 5.5F |
4 | double | Represents a 64-bit floating point. | 10.5 |
5 | chararray | Represents a character array (string) in Unicode UTF-8 format. | ‘tutorials point’ |
6 | Bytearray | Represents a Byte array (blob). | |
7 | Boolean | Represents a Boolean value. | true/ false. |
8 | Datetime | Represents a date-time. | 1970-01-01T00:00:00.000+00:00 |
9 | Biginteger | Represents a Java BigInteger. | 60708090709 |
10 | Bigdecimal | Represents a Java BigDecimal | 185.98376256272893883 |
11 | Tuple | A tuple is an ordered set of fields. | (raja, 30) |
12 | Bag | A bag is a collection of tuples. | {(raju,30),(Mohhammad,45)} |
13 | Map | A Map is a set of key-value pairs. | [ ‘name’#’Raju’, ‘age’#30] |
Pig Latin – Arithmetic Operators#
Suppose a = 10 and b = 20.
Operator | Description | Example |
---|---|---|
+ | Addition − Adds values on either side of the operator | a + b will give 30 |
− | Subtraction − Subtracts right hand operand from left hand operand | a − b will give −10 |
* | Multiplication − Multiplies values on either side of the operator | a * b will give 200 |
/ | Division − Divides left hand operand by right hand operand | b / a will give 2 |
% | Modulus − Divides left hand operand by right hand operand and returns remainder | b % a will give 0 |
? : | Bincond − Evaluates the Boolean operators. It has three operands as shown below.variable x = (expression) ? value1 if true : value2 if false. | b = (a == 1)? 20: 30; if a = 1 the value of b is 20. if a!=1 the value of b is 30. |
CASE WHEN THEN ELSE END | Case − The case operator is equivalent to nested bincond operator. | CASE f2 % 2 WHEN 0 THEN 'even' WHEN 1 THEN 'odd' END |
Pig Latin – Comparison Operators#
Operator | Description | Example |
---|---|---|
== | Equal − Checks if the values of two operands are equal or not; if yes, then the condition becomes true. | (a = b) is not true |
!= | Not Equal − Checks if the values of two operands are equal or not. If the values are not equal, then condition becomes true. | (a != b) is true. |
> | Greater than − Checks if the value of the left operand is greater than the value of the right operand. If yes, then the condition becomes true. | (a > b) is not true. |
< | Less than − Checks if the value of the left operand is less than the value of the right operand. If yes, then the condition becomes true. | (a < b) is true. |
>= | Greater than or equal to − Checks if the value of the left operand is greater than or equal to the value of the right operand. If yes, then the condition becomes true. | (a >= b) is not true. |
<= | Less than or equal to − Checks if the value of the left operand is less than or equal to the value of the right operand. If yes, then the condition becomes true. | (a <= b) is true. |
matches | Pattern matching − Checks whether the string in the left-hand side matches with the constant in the right-hand side. | f1 matches '.tutorial.' |
Pig Latin – Type Construction Operators#
Operator | Description | Example |
---|---|---|
() | Tuple constructor operator − This operator is used to construct a tuple. | (Raju, 30) |
{} | Bag constructor operator − This operator is used to construct a bag. | {(Raju, 30), (Mohammad, 45)} |
[] | Map constructor operator − This operator is used to construct a tuple. | [name#Raja, age#30] |
Pig Latin – Relational Operations#
Operator | Description |
---|---|
LOAD | To Load the data from the file system (local/HDFS) into a relation. |
STORE | To save a relation to the file system (local/HDFS). |
FILTER | To remove unwanted rows from a relation. |
DISTINCT | To remove duplicate rows from a relation. |
FOREACH, GENERATE | To generate data transformations based on columns of data. |
STREAM | To transform a relation using an external program. |
JOIN | To join two or more relations. |
COGROUP | To group the data in two or more relations. |
GROUP | To group the data in a single relation. |
CROSS | To create the cross product of two or more relations. |
ORDER | To arrange a relation in a sorted order based on one or more fields (ascending or descending). |
LIMIT | To get a limited number of tuples from a relation. |
UNION | To combine two or more relations into a single relation. |
SPLIT | To split a single relation into two or more relations. |
DUMP | To print the contents of a relation on the console. |
DESCRIBE | To describe the schema of a relation. |
EXPLAIN | To view the logical, physical, or MapReduce execution plans to compute a relation. |
ILLUSTRATE | To view the step-by-step execution of a series of statements. |