Pig Load & Store Functions
Function | Description |
---|---|
PigStorage() | To load and store structured files. |
TextLoader() | To load unstructured data into Pig. |
BinStorage() | To load and store data into Pig using machine readable format. |
Handling Compression | In Pig Latin, we can load and store compressed data. |
PigStorage()
grunt> PigStorage(field_delimiter)
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',')
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
grunt> STORE student INTO ' hdfs://localhost:9000/pig_output/ ' USING PigStorage (',');
$ hdfs dfs -ls 'hdfs://localhost:9000/pig_output/'
$ hdfs dfs -cat 'hdfs://localhost:9000/pig_output/part-m-00000'
TextLoader()
grunt> TextLoader()
grunt> details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING TextLoader();
grunt> dump details;
BinStorage()
grunt> BinStorage();
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
as (id:int, firstname:chararray, age:int, city:chararray);
grunt> STORE student_details INTO 'hdfs://localhost:9000/pig_output/mydata' USING BinStorage();
$ hdfs dfs -ls hdfs://localhost:9000/pig_output/mydata/
grunt> result = LOAD 'hdfs://localhost:9000/pig_output/mydata/part-m-00000' USING BinStorage();
grunt> Dump result;
Handling Compression
$ zip employee.txt.zip employee.txt
$ hdfs dfs -put ~/pig/employee.txt.zip hdfs://localhost:9000/pig_data/
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/employee.txt.zip' USING PigStorage(',');
grunt> data = LOAD 'hdfs://localhost:9000/pig_data/employee.txt.zip' USING TextLoader;
grunt> store data INTO 'hdfs://localhost:9000/pig_output/data.bz' USING PigStorage(',');