CALL FOR PAPERS :
DEC-2018
| Submission Last Date |
:
|
30-Dec-2018
|
| Acceptance Notification
|
:
|
in 15 days
|
| Publication Date
|
:
|
in 5 days
|
FOR AUTHORS
FOR REVIEWERS
IJRET® PUBLICATIONS
DOWNLOADS
CONTACT US
NEWS & UPDATES
|
BIG DATA ANALYTICS MADE EASY WITH RHADOOP
Adarsh V. Rotte, Gururaj Patwari, Suvarnalata Hiremath
Abstract: Day by day the volume of the data over network or of any organization is booming, so as the difficulty to process and analyze such a large quantity data. This large quantity of data is generally termed as Big Data. Analyzing the data is necessary for obtaining insights and gaining better application guidance. R is an efficient tool for analytics. R is an open source programming language and a software suite developed by Ross Ihaka and Robert Gentlemen used by data scientist statisticians, for data analysis, statistical computing and data visualization. Apache Hadoop is an open source java framework for processing and querying Big Data on large clusters of commodity hardware. It has two main features i.e. HDFS (Hadoop Distributed File System) for storage of Big Data and MapReduce for Processing Big data. The strengths of R lie in its ability to analyze data using a rich library of packages but fails when it comes to working on Big Data. On the other hand the strength of Hadoop is to store and process Big Data. Processing Big Data in memory is difficult as the RAM cannot hold such a large amount of data. The options would be to run analysis on limited chunks also known as sampling or to correspond the analytical power of R with the storage and processing power of Hadoop and we arrive at an ideal solution- RHadoop
Keywords: Big Data, R, MapReduce, HDFS, rhbase, rmr, ravro, plyrmr, rhdfs, Thrift Server
DOI: https://doi.org/10.15623/ijret.2015.0417003
|
|