Star Schema and Snowflake Schema in Data Warehousing
What are Star Schema and Snowflake Schema in a Data Warehouse?
Data Modeling Plays an important role in managing a database as it logically describes the database in terms of various parameters such as how data will be stored, what constraints to follow, and how data objects are related to each other. In Data Warehouse also, Data Modeling is crucial for data storage in a data warehouse.
Unlike Relational Data Models in DBMS, the Data Warehouse is based on Multidimensional Modeling, in which Star Schema and Snowflake Schema are used for defining the data storage in a Data Warehouse. This article covers the Star and Snowflake Schema in detail. But before that, let’s see what is Multidimensional modeling in a Data Warehouse.
Multidimensional Modeling
In this modeling technique, the data is organized in terms of either “Facts” or “Dimensions.” The Facts represent the numerical transaction data like the amount paid for the product while Dimensions represent the perspectives around which data is stored like the date of payment.
Thus, a Fact Table stores the numerical data while a Dimension Table stores the fields that describe the data. For implementing Multidimensional Modeling in Data Warehouse, Star Schema and Snowflake Schema are used.
For example, consider the following table which shows the data of a student enrolled in a course.
In this table, the Facts are:
- 2 students enrolled
- The total amount paid by students is 2350
On the other hand, the Dimensions which describe the Facts are:
- Student Names
- Course Name
- Date of payment, etc
Now, using Facts and Dimensions, Multidimensional Modeling organizes data in terms of a Data Cube which is based on “Course Details” in the following way.
You might be confused here because all the columns of the table are not represented in the Data Cube. But, the fact is a Data Cube in a Data Warehouse can be n-dimensional. Thus, any number of dimensions (columns of the table) can be represented in the multidimensional data model.
Types of Schema in Multidimensional Modeling
A Schema is the Structural description of the data stored in the database, which is of two types in Data Warehousing. Following is a detailed description of the Star Schema and Snowflake Schema.
Star Schema
Star Schema is the most elementary form of the Dimensional Model. It organizes the data in the form of Facts and Dimensions. It contains:
- A Fact Table in the center has a foreign key column referring to the various Dimensional Tables in the schema. A Fact Table is related to the Dimensional Table as a many-to-one relationship.
- Multiple Dimension Tables each with its primary key column. Dimension Tables are not related to each other. Instead, they are related to the Fact Table.
The organization of Fact Tables and Dimension Tables resembles a Star like structure. That’s what it is called Star Schema. The diagram below shows a general Star Schema.
Snowflake Schema
Snowflake Schema is similar to Star Schema with one difference that each Dimension can contain its sub-dimensions which are called its levels. It is the expansion of the Star Schema through which each Dimension Table can be associated with its Sub-dimension Tables. For example, consider the below diagram which shows a two-dimensional Snowflake Schema with 3 levels of Sub-dimensions.
Thus, we have discussed the concept of the Star Schema and Snowflake Schema. Now, let’s see the difference between the two.
Difference between Star Schema and Snowflake Schema
The following table shows the key differences between the Star and Snowflake Schema in the Data Warehouse.
Star Schema | Snowflake Schema |
|
|
|
|
|
|
|
|
|
|
|
|
Therefore, both the Star Schema and Snowflake Schema have their utility for Multidimensional Data Modeling in Data Warehousing.