Store

Analyzing Data with Microsoft R-20773

SKU R-20773
$1 595.00
1
Share this product with your friends
Analyzing Data with Microsoft R-20773
Product Details

Analyzing Big Data with Microsoft R

Overview

The main purpose of the course is to give students the ability to use Microsoft R Server to create and run an analysis on a large dataset, and show how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.

Audience profile

The primary audience for this course is people who wish to analyze large datasets within a big data environment.

The secondary audience are developers who need to integrate R analyses into their solutions.

Prerequisites

In addition to their professional experience, students who attend this course should have:

• Programming experience using R, and familiarity with common R packages

• Knowledge of common statistical methods and data analysis best practices.

• Basic knowledge of the Microsoft Windows operating system and its core functionality.

Working knowledge of relational databases.

At course completion

After completing this course, students will be able to:

• Explain how Microsoft R Server and Microsoft R Client work

• Use R Client with R Server to explore big data held in different data stores

• Visualize data by using graphs and plots

• Transform and clean big data sets

• Implement options for splitting analysis jobs into parallel tasks

• Build and evaluate regression models generated from big data

• Create, score, and deploy partitioning models generated from big data

• Use R in the SQL Server and Hadoop environments

Course Outline

Module 1: Microsoft R Server and R Client

Explain how Microsoft R Server and Microsoft R Client work.

Lessons

• What is Microsoft R server

• Using Microsoft R client

• The ScaleR functions

Lab : Exploring Microsoft R Server and Microsoft R Client

• Using R client in VSTR and RStudio

• Exploring ScaleR functions

• Connecting to a remote server

Module 2: Exploring Big Data

At the end of this module the student will be able to use R Client with R Server to explore big data held in different data stores.

Lessons

• Understanding ScaleR data sources

• Reading data into an XDF object

• Summarizing data in an XDF object

Lab : Exploring Big Data

• Reading a local CSV file into an XDF file

• Transforming data on input

• Reading data from SQL Server into an XDF file

• Generating summaries over the XDF data

Module 3: Visualizing Big Data

Explain how to visualize data by using graphs and plots.

Lessons

• Visualizing In-memory data

• Visualizing big data

Lab : Visualizing data

• Using ggplot to create a faceted plot with overlays

• Using rxlinePlot and rxHistogram

Module 4: Processing Big Data

Explain how to transform and clean big data sets.

Lessons

• Transforming Big Data

• Managing datasets

Lab : Processing big data

• Transforming big data

• Sorting and merging big data

• Connecting to a remote server

Module 5: Parallelizing Analysis Operations

Explain how to implement options for splitting analysis jobs into parallel tasks.

Lessons

• Using the RxLocalParallel compute context with rxExec

• Using the revoPemaR package

Lab : Using rxExec and RevoPemaR to parallelize operations

• Using rxExec to maximize resource use

• Creating and using a PEMA class

Module 6: Creating and Evaluating Regression Models

Explain how to build and evaluate regression models generated from big data

Lessons

• Clustering Big Data

• Generating regression models and making predictions

Lab : Creating a linear regression model

• Creating a cluster

• Creating a regression model

• Generate data for making predictions

• Use the models to make predictions and compare the results

Module 7: Creating and Evaluating Partitioning Models

Explain how to create and score partitioning models generated from big data.

Lessons

• Creating partitioning models based on decision trees.

• Test partitioning models by making and comparing predictions

Lab : Creating and evaluating partitioning models

• Splitting the dataset

• Building models

• Running predictions and testing the results

• Comparing results

Module 8: Processing Big Data in SQL Server and Hadoop

Explain how to transform and clean big data sets.

Lessons

• Using R in SQL Server

• Using Hadoop Map/Reduce

• Using Hadoop Spark

Lab : Processing big data in SQL Server and Hadoop

• Creating a model and predicting outcomes in SQL Server

• Performing an analysis and plotting the results using Hadoop Map/Reduce

• Integrating a sparklyr script into a ScaleR workflow