Skip to content

Constructs

The core constructs available in the framework are detailed in this section.

Available Constructs

SDLF Constructs

Construct Description
sdlf-foundations Data lake storage layers with S3 and Lake Formation
sdlf-team Dedicated policies and permissions
sdlf-dataset Glue database and crawler
sdlf-pipeline Common interface used in sdlf-stage-* constructs
sdlf-stageA EventBridge-triggered Lambda function
sdlf-stageB EventBridge-triggered Glue function
sdlf-stage-dataquality EventBridge-triggered Glue Data Quality Evaluation
sdlf-monitoring CloudTrail, S3 Storage Lens
sdlf-cicd GitOps infrastructure to deploy all SDLF constructs

VPC Networking

All SDLF constructs can work in VPC environments where outbound Internet access is constrained. They consume networking details from specific SSM string parameters:

  • /SDLF/VPC/VpcId (vpc-xxx)
  • /SDLF/VPC/SecurityGroupIds (sg-xxx,sg-yyy)
  • /SDLF/VPC/SubnetIds (subnet-xxx,subnet-yyy)

Warning

SDLF does not create the VPC infrastructure itself - VPC, subnets, security groups, VPC endpoints need creating ahead of time.

The following VPC endpoints, with DNS name enabled, are necessary for the current set of constructs:

  • com.amazonaws.{region}.athena (interface)
  • com.amazonaws.{region}.codepipeline (interface)
  • com.amazonaws.{region}.dynamodb (gateway)
  • com.amazonaws.{region}.ec2messages (interface)
  • com.amazonaws.{region}.events (interface)
  • com.amazonaws.{region}.glue (interface)
  • com.amazonaws.{region}.kms (interface)
  • com.amazonaws.{region}.lambda (interface)
  • com.amazonaws.{region}.logs (interface)
  • com.amazonaws.{region}.s3 (gateway)
  • com.amazonaws.{region}.secretsmanager (interface)
  • com.amazonaws.{region}.sns (interface)
  • com.amazonaws.{region}.sqs (interface)
  • com.amazonaws.{region}.ssm (interface)
  • com.amazonaws.{region}.ssmmessages (interface)
  • com.amazonaws.{region}.states (interface)
  • com.amazonaws.{region}.sts (interface)

The security groups used for interface endpoints must allow inbound access from the security groups provided to SDLF constructs. The security groups provided to SDLF constructs must allow outbound access to the security groups used for interface endpoints.

sdlf-datalakeLibrary

sdlf-datalakeLibrary is a Python library that can be used to interact with the data lake, in particular with the SSM parameters the different modules are publishing and the DynamoDB tables created in sdlf-foundations. If using sdlf-cicd, a Lambda layer containing sdlf-datalakeLibrary is built and used in sdlf-stageA and sdlf-stageB.

Transformations

Aforementioned constructs referred to infrastructure code. Transformations on the other hand represent the application code ran within the steps of a SDLF pipeline. They include instructions to:

  • Make an API call to another service (on or outside the AWS platform)
  • Store dataset and pipeline execution metadata in a catalog
  • Collect logs and store them in ElasticSearch
  • ... any other logic

Once transformations and other application code is pushed to the team respository, it goes through a CodePipeline and can be submitted to testing before it enters production.

Note

A SDLF team can define and manage their transformations from the sdlf-main-{domain}-{team} repository if using sdlf-cicd.

Note

Transformations enable decoupling between a SDLF pipeline and a dataset. It means that a single pipeline can process multiple datasets.