Defined
 Supervised learning algorithm
 Performs a regression task where it models a target (dependent variable) prediction based on a vector of independent variables
 For linear regression the goal is to find the bestfit regression line through the independent variable(s) as related to the dependent variable
 Minimize error between predicted value and observed value in the training data
Use Cases
 Optimizing pricing for product line
 Predicting whether a customer will default on a loan
 Predicting whether a patient has cancer based on image scan data
 Predicting user churn
 Sales forecasting
 Predict whether a voter will select a candidate or not
 Predict house prices
SageMaker Algorithms
Linear Learner
 Input a set of hightdimensional vectors including a numeric target, or label
 Target is a real number
 Learns a linear function and maps a vector to an approximation of the target
 Good model based on Linear Learner optimizes
 Continuous objective:
mean square error
,cross entropy loss
,absolute error
 Continuous objective:
 Requires a data matrix of observations across dimension of features
 Also requires a target column across the observations
 Important Hyperparameters

feature_dim
: number of feature in the input 
predictor_type
: type of the target variable (regressor for regression problems) 
loss
: specifies the loss function (auto, squared_loss, absolute_loss, etc.)

XGBoost
 Implementation of gradient boosted tress algorithm
 Supervised learning algorithm for predicting a target by combining the estimates from a set of simpler models
 Requires a data matrix of observations across dimension of features
 Also requires a target column across the observations
 Can differentiate the importance of features through weights
 Example use case: predict income based on census data
 Important Hyperparameters

num_round
: number of rounds the training runs 
objective
: learning task and learning objective (i.e.reg:logistic
,reg:squarederror
)

KNearestNeighbors
 Finds the k closest points to the sample point and gives a prediction of the average of their features
 Indexed based
 Objective: build kNN index to allow for efficient determination of the distance between points
 Train to construct the index
 Use dimensionality reduction to avoid the “curse of dimensionality”
 Example use case: predict student absenteeism using student grades, demographic, social, and school related features
 Important Hyperparameters

feature_dim
: number of feature in the input 
k
: number of nearest neighbors 
predictor_type
: regressor for regression problems 
sample_size
: number of data points to be samples from the training dataset 
dimensionality_reductoin_target
: target dimension to reduct to

Factorization Machines
 Extension of linear model used on high dimensional sparse datasets
 Typically used for sparse datasets such as click prediction and item recommendation
 Continuous objective: Root Mean Square Error
 Example use case: analyze the images of handwritten digits
 Important Hyperparameters

feature_dim
: number of feature in the input 
num_factors
: dimensionality of factorization 
predictor_type
: regressor for regression problems

Labs
本文由
Oscaner
创作, 采用
知识共享署名4.0
国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名

Previous
[MLSC01] [Algorithms] Algorithm Concepts 
Next
[MLSC01] [Algorithms] Clustering Algorithms