Creating and running an ETL job in AWS Glue
Print
- Last Updated: December 19, 2020
- 2 minute read
Once you have created a connection in AWS Glue, you can create and run an ETL job.
Take the following steps to create and run an ETL job in AWS Glue Studio.
- Sign in to the AWS Glue Studio Console.
- Click the Create and manage jobs icon.
-
Under Create job, select Source
and target added to the graph; then, provide values for the
following drop-down menus:
- Source: Select Progress DataDirect Cloud Connector for Salesforce.
- Target: Select the S3 bucket.Note: You can also select a Glue Data Catalog target, when that work flow becomes available.
- Click Create.
- On the Visual tab, select the Progress DataDirect Cloud Connector for Salesforce node.
- Select your connection from the Connection drop-down menu.
-
Choose one of the following:
- Enter table name: Enter the table name against which you want to execute the ETL job.
- Write a query: Specify a query to be used to return the data against which you want to execute the ETL job.
Note: Optionally, you can use the Filter predicate field, schema editor, Partition column field, or Data type casting feature to further modify the data that is to be returned. -
Select the Data target node. Then provide values for the
following fields:
- Format: Select the target format of your output.
- Compression Type: Optionally, select the format to use when compressing output.
- S3 Target Location: The target location for output.
-
Select the Job details tab. Provide values for the following
fields:
- Name: Provide the name of your job.
- Description: Optionally, provide a description of your job.
- IAM Role: The role assumed by the job with permission to access your data sources. For more information, see Updating IAM Role permissions.
The remaining fields on this tab should be configured to suit your environment.
- Click Save; then, Run.
- Select Run details to view the status of your run.
-
After the run has succeeded, navigate to your job:
- Open the AWS Management Console
- Under Storage, select S3.
- Select your bucket from the list; then, select your job from the Objects list.
- Download your job using the Object actions menu.