Introduction to Big Data

Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year.

Big data is a collection of large datasets that cannot be processed using traditional computing techniques.

It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques and frameworks.

Big data involves the data produced by different devices and applications.

Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types.

Polyglot Persistence

ETL Tools

ETL (Extract, Transform and Load)

Data warehouse or OLAP

MPP

Hadoop

Hadoop is an Apache open-source framework written in java that allows to store and process big data in a distributed environment across clusters of computers using simple programming models.

It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Architecture

At its core, Hadoop has two major layers namely: