Flume Introduction

What is Flume?

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events (etc…) from various sources to a centralized data store.

Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS.

apache_flume-min

Architecture

flume_architecture-min

Flume Event

flume_event-min

Flume Agent

flume_agent1-min

Data Flow

flume_dataflow-min

References