- Migrate data from source to machine learning repository
- Several AWS services to help move data
- Amazon Data Pipeline
- AWS Database Migration Service (DMS)
- AWS Glue
- Amazon SageMaker
- Amazon Athena
Amazon Data Pipeline
- Copy data using Pipeline Activities
- Schedule regular data movement and data processing activities
- Integrates with on-premise and cloud-based storage systems
- Use your data where you want it and in the format you choose
AWS DMS
- Move data between databases
- MySQL to MySQL
- Aurora to DynamoDB
AWS Glue
- Extract, Transform, and Load (ETL)
- Determine data type and schema
- Can run your data engineering algorithms
- Feature Selection
- Data Cleansing
- Can run on demand, on a schedule, or on events
Amazon SageMaker
- Use jupyter notebooks
- Scikit-Learn
- Pandas
Amazon Athena
- Run SQL queries on S3 data
- Needs a data catalog such as the one created by Glue
- SQL transform your data in preparation for use in ML models
Use Cases
- Move data to S3 for your machine learning model
- Move data from EMR cluster: Amazon Data Pipeline
- Move data from DynamoDB: AWS Glue
- Move data from Redshift: Amazon Data Pipeline, AWS Glue
- Move data from on-prem database: Database Migration Services
本文由
Oscaner
创作, 采用
知识共享署名4.0
国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名
-
Previous
[MLS-C01] [Data Engineering] Gathering data -
Next
[MLS-C01] [Exploratory Data Analysis] Introduction