Defined
- Supervised learning algorithm
- Performs a regression task where it models a target (dependent variable) prediction based on a vector of independent variables
- For linear regression the goal is to find the best-fit regression line through the independent variable(s) as related to the dependent variable
- Minimize error between predicted value and observed value in the training data
Use Cases
- Optimizing pricing for product line
- Predicting whether a customer will default on a loan
- Predicting whether a patient has cancer based on image scan data
- Predicting user churn
- Sales forecasting
- Predict whether a voter will select a candidate or not
- Predict house prices
SageMaker Algorithms
Linear Learner
- Input a set of hight-dimensional vectors including a numeric target, or label
- Target is a real number
- Learns a linear function and maps a vector to an approximation of the target
- Good model based on Linear Learner optimizes
- Continuous objective:
mean square error
,cross entropy loss
,absolute error
- Continuous objective:
- Requires a data matrix of observations across dimension of features
- Also requires a target column across the observations
- Important Hyperparameters
-
feature_dim
: number of feature in the input -
predictor_type
: type of the target variable (regressor for regression problems) -
loss
: specifies the loss function (auto, squared_loss, absolute_loss, etc.)
-
XGBoost
- Implementation of gradient boosted tress algorithm
- Supervised learning algorithm for predicting a target by combining the estimates from a set of simpler models
- Requires a data matrix of observations across dimension of features
- Also requires a target column across the observations
- Can differentiate the importance of features through weights
- Example use case: predict income based on census data
- Important Hyperparameters
-
num_round
: number of rounds the training runs -
objective
: learning task and learning objective (i.e.reg:logistic
,reg:squarederror
)
-
K-Nearest-Neighbors
- Finds the k closest points to the sample point and gives a prediction of the average of their features
- Indexed based
- Objective: build k-NN index to allow for efficient determination of the distance between points
- Train to construct the index
- Use dimensionality reduction to avoid the “curse of dimensionality”
- Example use case: predict student absenteeism using student grades, demographic, social, and school related features
- Important Hyperparameters
-
feature_dim
: number of feature in the input -
k
: number of nearest neighbors -
predictor_type
: regressor for regression problems -
sample_size
: number of data points to be samples from the training dataset -
dimensionality_reductoin_target
: target dimension to reduct to
-
Factorization Machines
- Extension of linear model used on high dimensional sparse datasets
- Typically used for sparse datasets such as click prediction and item recommendation
- Continuous objective: Root Mean Square Error
- Example use case: analyze the images of handwritten digits
- Important Hyperparameters
-
feature_dim
: number of feature in the input -
num_factors
: dimensionality of factorization -
predictor_type
: regressor for regression problems
-
Labs
本文由
Oscaner
创作, 采用
知识共享署名4.0
国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名
-
Previous
[MLS-C01] [Algorithms] Algorithm Concepts -
Next
[MLS-C01] [Algorithms] Clustering Algorithms