Analyzing Data with Microsoft R-20773
Analyzing Big Data with Microsoft R |
OverviewThe main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database. Audience profile The primary audience for this course is people who wish to analyze large datasets within a big data environment. The secondary audience are developers who need to integrate R analyses into their solutions. Prerequisites In addition to their professional experience, students who attend this course should have: • Programming experience using R, and familiarity with common R packages • Knowledge of common statistical methods and data analysis best practices. • Basic knowledge of the Microsoft Windows operating system and its core functionality. Working knowledge of relational databases. At course completion After completing this course, students will be able to: • Explain how Microsoft R Server and Microsoft R Client work • Use R Client with R Server to explore big data held in different data stores • Visualize data by using graphs and plots • Transform and clean big data sets • Implement options for splitting analysis jobs into parallel tasks • Build and evaluate regression models generated from big data • Create, score, and deploy partitioning models generated from big data • Use R in the SQL Server and Hadoop environments Course Outline Module 1: Microsoft R Server and R Client Explain how Microsoft R Server and Microsoft R Client work. Lessons • What is Microsoft R server • Using Microsoft R client • The ScaleR functions Lab : Exploring Microsoft R Server and Microsoft R Client • Using R client in VSTR and RStudio • Exploring ScaleR functions • Connecting to a remote server Module 2: Exploring Big Data At the end of this module the student will be able to use R Client with R Server to explore big data held in different data stores. Lessons • Understanding ScaleR data sources • Reading data into an XDF object • Summarizing data in an XDF object Lab : Exploring Big Data • Reading a local CSV file into an XDF file • Transforming data on input • Reading data from SQL Server into an XDF file • Generating summaries over the XDF data Module 3: Visualizing Big Data Explain how to visualize data by using graphs and plots. Lessons • Visualizing In-memory data • Visualizing big data Lab : Visualizing data • Using ggplot to create a faceted plot with overlays • Using rxlinePlot and rxHistogram Module 4: Processing Big Data Explain how to transform and clean big data sets. Lessons • Transforming Big Data • Managing datasets Lab : Processing big data • Transforming big data • Sorting and merging big data • Connecting to a remote server Module 5: Parallelizing Analysis Operations Explain how to implement options for splitting analysis jobs into parallel tasks. Lessons • Using the RxLocalParallel compute context with rxExec • Using the revoPemaR package Lab : Using rxExec and RevoPemaR to parallelize operations • Using rxExec to maximize resource use • Creating and using a PEMA class Module 6: Creating and Evaluating Regression Models Explain how to build and evaluate regression models generated from big data Lessons • Clustering Big Data • Generating regression models and making predictions Lab : Creating a linear regression model • Creating a cluster • Creating a regression model • Generate data for making predictions • Use the models to make predictions and compare the results Module 7: Creating and Evaluating Partitioning Models Explain how to create and score partitioning models generated from big data. Lessons • Creating partitioning models based on decision trees. • Test partitioning models by making and comparing predictions Lab : Creating and evaluating partitioning models • Splitting the dataset • Building models • Running predictions and testing the results • Comparing results Module 8: Processing Big Data in SQL Server and Hadoop Explain how to transform and clean big data sets. Lessons • Using R in SQL Server • Using Hadoop Map/Reduce • Using Hadoop Spark Lab : Processing big data in SQL Server and Hadoop • Creating a model and predicting outcomes in SQL Server • Performing an analysis and plotting the results using Hadoop Map/Reduce • Integrating a sparklyr script into a ScaleR workflow |