Key Facts
- A fully managed ETL service for categorizing, cleaning, enriching, and moving your data
- Glue components
- Central metadata repository: Glue Catalog
- ETL engine that automatically generates python or scala code
- Flexible scheduler for dependency resolution, job monitoring, and retries
- Serverless
- Can convert semi-structured schemas to relational-schemas on the fly
Terminology
-
Data Catalog
: persistent metadata store -
Classifier
: determines the schema of your data -
Connection
: the properties required to connect to data store -
Crawler
: connects to a data store and steps through prioritized list of classifiers to determine schema -
Database
: set of associated data catalog table definitions -
Data store
: repository for persistently storing data -
Data source
: data store used as input to transformation -
Data target
: data store that a transformation writes to -
Job
: ETL logic -
Table
: metadata definition that represents your data -
Transform
: code logic to change your data into a different format
Components
-
Console
: define and orchestrate ETL workflows -
Data Catalog
: persistent metadata store -
Crawlers and Classifiers
: crawlers scan data and classify it -
ETL Operations
: using metadata in the data catalog, autogenerates python or scala code -
Jobs System
: managed infrastructure to orchestrate your ETL workflow
Labs
本文由
Oscaner
创作, 采用
知识共享署名4.0
国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名
-
Previous
[MLS-C01] [Exploratory Data Analysis] Kinesis Data Analytics -
Next
[MLS-C01] [Exploratory Data Analysis] Analyze and visualize data