HDFS

Features of HDFS

  • Highly Scalable

  • Replication

  • Fault tolerance

  • Distributed data storage

  • Portable

Where to use HDFS

  • Very Large Files

  • Streaming Data Access

  • Commodity Hardware

Where not to use HDFS

  • Low Latency data access

  • Lots Of Small Files

  • Multiple Writes

HDFS Architecture

hdfs_architecture-min-min

HDFS Read Image:

HDFS-Read

HDFS Write Image:

HDFS-Write

Starting HDFS

hadoop namenode -format
start-dfs.sh

Listing Files in HDFS

$HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS

  1. hadoop fs -mkdir -p /user/input

  2. hadoop fs -put /home/hadoop/input/README.txt /user/input

  3. hadoop fs -ls /user/input

Retrieving Data from HDFS

  1. hadoop fs -cat /user/input/README.txt

  2. hadoop fs -get /user/input/README.txt /home/hadoop/

Shutting Down the HDFS

stop-dfs.sh

HDFS Basic File Operations

  1. Putting data to HDFS from local file system: hadoop fs -copyFromLocal /usr/home/Desktop/data.txt /user/test

  2. Copying data from HDFS to local file system: hadoop fs -copyToLocal /user/test/data.txt /usr/bin/data_copy.txt

  3. Compare the files and see that both are same: md5 /usr/bin/data_copy.txt /usr/home/Desktop/data.txt

    • Recursive deleting: hadoop fs -rmr <arg>

HDFS Other commands

  • put <localSrc><dest>

  • copyFromLocal <localSrc><dest>

  • moveFromLocal <localSrc><dest>

  • get [-crc] <src><localDest>

  • cat <filen-ame>

  • moveToLocal <src><localDest>

  • setrep [-R] [-w] rep <path>

  • touchz <path>

  • test -[ezd] <path>

  • stat [format] <path>

References