There is a rampant increase in the amount of data being produced from varied sources. This can be attributed to the instrumentalisation of the current society and personnel’s leading to storage and production of vast amounts of data. Since, the data being produced is huge with a lot of variety and the rate of production is also rapid. Thus, the traditional systems fail to manage this data and this is what led to the buzz word called Big Data. Big Data is a term which refers to the explosion of variety of data produced from disparate sources [1]. It is characterized by five features or attributes i.e. high volume, variety, veracity, visibility and velocity. Since, this kind of data is beyond the management scope of traditional systems therefore in order to mine such kind of data we need analytics’ solutions that can help in gaining insights from both structured and unstructured data. At present scenario its instrumental to blend both big data and analytics into a single entity termed as big data Analytics. Analytics involves examination of data to derive meaningful insights such as hidden patterns and trends that can in turn benefit the organizations in making important business decisions and developing newer business models. The problem of data deluge imposes potential challenges involved in processing and extracting useful information from data. It also requires skills for management and analysis of huge data sets.

 

Cloud computing serves as a quintessential solution for handling big data and hosting big data workloads. Cloud computing has revolutionized the way in which computing resources can be utilized by providing facilities such as pay per use, rapid elasticity and dynamic scalability. It provides the users with an illusion of infinite storage and compute capacity. The cloud resources can be used in private mode through private cloud or can be shared publicly using a public cloud such as Amazon EC2 and Microsoft Azure. Cloud therefore serves as a scalable technology with low upfront investment costs. Thus, the proposition value associated with using cloud as a platform for carrying out analytics is quite strong and therefore it is well suited for carrying out scalable data analytics. Hadoop is a technology that can be used for handling big data. It can play a significant role in opening gates to new insights out of data and can easily handle flood of huge unstructured data sets coming from sources such as sensors, mobile devices and social media.

This paper presents about how hadoop can be used as technology on cloud for meeting the big data needs of users and discusses about the proposed hadoop based workflow for handling big data. We also present a case study of analysis carried out on movie data for mining many useful information from it which includes finding the number of movies released between a given period and the number of movies having a certain rating besides other information’s.

 

The rest of this paper is organized as follows: Section 2 presents a survey of the related approaches used for big data analytics, Section 3 discusses about hadoop as a platform for meeting the big data needs and requirements. Section 4 shows our proposed workflow for carrying out big data analytics. Furthermore, Section 5 discusses our case study for analytics of movie data. Finally the paper concludes with conclusion and future directions in section 6.