data:image/s3,"s3://crabby-images/443b4/443b49d90341289cc9800d3fb640b8a722f1b92d" alt="SQL Server 2017 Integration Services Cookbook"
Introduction
Now, let's go for the real stuff! This chapter will cover many topics that will lay out the foundations of a simple and effective ETL solution. Over the years, I have seen many SSIS implementations and one of the goals of this chapter is to give the readers the following:
- A simple but effective SSIS framework
- SSIS development best practices
- New data source integrations
All remaining chapters assume that we want to load a data warehouse that is a star schema with its staging area.
The source (operational) database used is AdventureWorksLT, an old well-known database. The following diagram describes the source database that we're going to use:
data:image/s3,"s3://crabby-images/121b1/121b17d8ca3bc1aae710fa9c70e83860281f2dc3" alt=""
From this database, we'll insert data in a staging area and finally into a data warehouse. The staging area and the data warehouse will be separated in schemas in a database that we'll manage using SSDT.
The following diagram is the representation that describes the staging schema of the AdventureWorksLTDW database:
data:image/s3,"s3://crabby-images/1a495/1a4955da4c60bb93c8453b904516e7213524b201" alt=""
Not all tables are copied in the staging area and three sections have been identified:
- Orders: These tables contain order information as well as dates related information. In the data warehouse section, these sections have their own tables.
- Customer: These tables contain information related to a customer and their addresses. In the data warehouse, these tables are grouped.
- Product: These tables contain the product information such as model, description in multiple languages, and so on. Like customer data, these tables are grouped in the data warehouse.
Once in the staging area, the data will be copied into a star schema database representation. The tables are in the same database, but have a separate schema called DW. The following diagram shows the DW tables:
data:image/s3,"s3://crabby-images/4da62/4da62e6bd085b697573ac293ec1c8ae15b9dc85a" alt=""
The goal of the data warehouse is to ease data consumptions. It's easy to understand by most users and data is categorized into areas (tables) that represent the subjects that the end users will base their analysis on.
The customer information has been regrouped into two tables, DimCustomer and DimAddress. The DimAddress table has two links to the fact table. These relationships represent the multiple addresses, two in our case: the billing and shipping address of the customer.
The product information has been flattened into one dimension: DimProduct. Although the base model allowed for more than one language when it comes to the product descriptions, only two are retained in the dimension: French and English - EnglishDesctiption and FrenchDescription.
The orders tables have been merged into one fact table: FactOrders. The DimTime dimension has been added to allow better querying of the orders using various dates: order, shipped, and due dates. The SalesOrderNumber and PurchaseOrderNumber are considered derived dimensions and stay in the fact table. We don't have enough information that can be derived from these columns and they are strongly tied to the facts.
The remaining dimension, DimOrderProvenance, has no source in the AdventureWorksLT operational database. It has been added and is managed by Master Data Services, another service that comes with SQL Server 2016 Developer Edition. We'll talk about this service and this dimension later in this book.
In the next few recipes, we'll deploy these databases and the ETL's (SSIS packages) that load these tables.