- Perform exploratory data analyses at scale and generate meaningful results
- Work with multiple techniques to uncover hidden patterns in your data
- get a practical coverage of R while working with Spark, Hadoop, Storm, and more
Big data analytics is the process of examining large data sets that contain a variety of data types. The R programming language is one of the leading languages of data science; it boasts powerful and popular packages to tackle nearly all the problems in big data.
The book will begin with a brief introduction to the Big Data world and current industry standards in Big Data analysis. It will progress to a gentle introduction to the R language by presenting its development, language structure, applications in research and business, and its traditional shortcomings. The book will further provide readers with a revision of major R functions for data management and transformations and will eventually present a number of third-party packages allowing High Performance Computing with R.
The book will then introduce Cloud based Big Data solutions (Amazon EC2, Windows Azure & HDInsight, Google Cloud Platform etc.) and also provide guidance on R connectivity with relational (SQL-based e.g. MySQL) and non-relational (NoSQL) databases such as Cassandra, MongoDB etc. It will further expand to include industry standard Big Data tools such as Apache Hadoop ecosystem (with HBase etc.) and will thoroughly explain its HDFS and MapReduce frameworks.
The next few chapters will address other Big Data tools and most recent third-party packages allowing compatibility of R with Spark, Docker and Storm for fast and streaming data processing; visualization techniques using ggplot, shiny and rCharts.
What you will learn
- The current state of Big Data processing using R programming language and its powerful statistical capabilities
- Easily deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner
- Apply the R language to real-world Big Data problems e.g. electricity consumption across various socio-demographic indicators, near real-time Twitter sentiment analysis for specific keywords etc.
- Explore the compatibility of R with Spark, Docker and Storm
About the Author
Simon Walkowiak is a cognitive neuroscientist and a Managing Director of Mind Project Ltd – a Big Data and Predictive Analytics consultancy based in London (United Kingdom). As a former Data Curator at the UK Data Service (UKDS, University of Essex) – the European largest socio-economic data repository, Simon has an extensive experience in processing and managing large scale data sets such as censuses, sensor and smart meter readings data, telecommunication data and well-known governmental and social surveys such as the British Social Attitudes Survey, Labour Force Surveys, Understanding Society, National Travel Survey and many other socio-economic data sets collected and deposited at the UKDS by Eurostat, World Bank, Office for National Statistics, Department of Transport, NatCen and International Energy Agency to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international companies, and he has also taught a course in “Big Data Methods in R” at major UK universities and at the prestigious Big Data Summer School organised by the Institute of Analytics and Data Science.