HDFS¶
Features of HDFS¶
- Highly Scalable
- Replication
- Fault tolerance
- Distributed data storage
- Portable
Where to use HDFS¶
- Very Large Files
- Streaming Data Access
- Commodity Hardware
Where not to use HDFS¶
- Low Latency data access
- Lots Of Small Files
- Multiple Writes
HDFS Architecture¶
hdfs_architecture-min-min
HDFS Read Image:
HDFS-Read
HDFS Write Image:
HDFS-Write
Starting HDFS¶
hadoop namenode -format
start-dfs.sh
Listing Files in HDFS¶
$HADOOP_HOME/bin/hadoop fs -ls <args>
Inserting Data into HDFS¶
hadoop fs -mkdir -p /user/input
hadoop fs -put /home/hadoop/input/README.txt /user/input
hadoop fs -ls /user/input
Retrieving Data from HDFS¶
hadoop fs -cat /user/input/README.txt
hadoop fs -get /user/input/README.txt /home/hadoop/
Shutting Down the HDFS¶
stop-dfs.sh
HDFS Basic File Operations¶
- Putting data to HDFS from local file system:
hadoop fs -copyFromLocal /usr/home/Desktop/data.txt /user/test
- Copying data from HDFS to local file system:
hadoop fs -copyToLocal /user/test/data.txt /usr/bin/data_copy.txt
- Compare the files and see that both are same:
md5 /usr/bin/data_copy.txt /usr/home/Desktop/data.txt
- Recursive deleting:
hadoop fs -rmr <arg>
- Recursive deleting:
HDFS Other commands¶
put <localSrc><dest>
copyFromLocal <localSrc><dest>
moveFromLocal <localSrc><dest>
get [-crc] <src><localDest>
cat <filen-ame>
moveToLocal <src><localDest>
setrep [-R] [-w] rep <path>
touchz <path>
test -[ezd] <path>
stat [format] <path>