Three-Part Series: Article 3
In the first article of this series I questioned whether the continued comparison of Relational Models to Star Schema models is still relevant with the entrance of Snowflake Computing’s data warehouse/data lake service into the market place. And we began by exploring the origins of the relational model.
In my second article we came to understand the foundational architectural elements of both the business architecture model and the star schema.
Here, in this final article, the technology limits that gave rise to the Star Schema are explained and how the arrival of MPP’s technology began to change the equation of model selection. We conclude with Snowflake’s disruption of the MPP market place and why it has made our model comparison irrelevant.
Test drive the data warehouse built for the cloud:
SMP Technology Limits and the MPP Value Proposition
The star schema gained prominence in the 1990s due to vertical scaling limitations of the SMP servers, as the need for data warehousing grew. SMP class servers do not scale linearly as the number of processors increase.
As processors are added to compensate for increasing data volumes, performance gains rapidly diminish.
It wasn’t until the MPP platform came to market that this changed, providing linear scaling capability. Today, the best known MPP data warehouse vendor in the marketplace sells their platform based on its ability to virtualize the information and analytics architecture with MPP processing power. This information virtualization is performed against a relationally modeled data foundation, typically based on one of the vendor’s industry models.
By taking this approach, the business data foundation is easily extended. Flexibility is gained by fulfilling new and changing requirements through creating and altering a semantically-defined information architecture. This contrasts with developing or changing a physically implemented infrastructure, using fixed ETL processing code.
Semantic architecture creation and change (database views and/or BI semantic models) is inherently faster and less costly when applied against a business architecture data foundation. We avoid the complexity of multi-hop code streams, orchestration and scheduling. This is even more significant when we must conform multiple functionally overlapping sources to a common business model of information.
Businesses that don’t yet know their information or analytics requirements can work from the relational model as those needs are solidified.
As the business architecture data foundation is implemented, information and analytics capability aligned with the business architecture can be synchronized within the shortest of business cycles. Semantic information and analytics organization becomes far more business-activity focused rather than a technical coding exercise.
Limitations of Legacy MPP technology
MPP databases do have limitations though, even when legacy technology is implemented in the cloud. The coupling of compute with fixed node data storage limits the range of vertical query scaling to the number of nodes the infrastructure provides. Even in cloud implementations where MPP still relies on data node distribution, data must be redistributed, increasing the number of nodes to increase vertical scale. Fixed infrastructure also limits horizontal scaling during peak periods of use.
The evidence of these limitations is found when materialized views are used to pre-organize information to improve performance. In these instances, Materialized Views are used to store the results of some of the processing that would occur at analytics query runtime.
The cost of this MPP infrastructure is also a significant limiting factor, prompting many organizations to remain in the SMP world and pay the price of implementing and managing star schemas.
Snowflake Computing is Disrupting the MPP Paradigm
Snowflake Computing has designed their cloud-built data warehouse from the ground up, utilizing all the capabilities of cloud infrastructure. The product breaks all the barriers in scaling, capability and cost of traditional MPP technology. And it eliminates most of the technical implementation and administration burden found with traditional RDBMS and big data technologies. In other words, it does what technology is supposed to do; it greatly improves your data and analytics capability, while greatly reducing your infrastructure, implementation and administration costs. All while making it easier for your business analysts to utilize your data warehouse and data lake together (without data engineers or DBAs).
Snowflake decouples its MPP compute from storage with node-less data distribution for dynamic and unlimited scaling. The extent of vertical scaling can be set just ahead of query run-time, unbound by limits of fixed infrastructure and data node partitioning. And horizontal scaling can be as dynamic as needed to automatically respond to workload changes as concurrency increases and decreases.
Snowflake eliminates infrastructure cost limitations by providing a secure, resilient, built-for-the-cloud data warehouse. You pay only for the data you store, not the number of disks needed for data distribution. And you pay for just the compute you use for the time you use it, rather than paying for a fixed infrastructure that is underutilized for much of the 24 hours, 365 days per year that you own or lease it. To learn more, read the case study: Reduced upgrade costs with Snowflake Evaluation.
Your data lake and data warehouse may be co-located, able to cooperatively and seamlessly work together. Semi-structured data is stored and metadata-based database view allows direct querying with SQL. These database views can be joined directly with data warehouse content. This capability, both the loading of semi-structured data as well as view creation, is well within the capability of business analysts possessing SQL skills.
This is all achieved without the technical burden and administrative requirements we see in Hadoop® and SMP/MPP data warehouse platforms. The single administration task that remains is security administration.
If a relational model can answer any business question asked of it, and the Snowflake data warehouse built for the cloud can scale to answer any question asked of the relational model, and do so at a fraction of the typical MPP or even SMP infrastructure costs, then is the comparison of 3NF vs star schema relevant any longer?
Clarity’s Industry Models and modeling methods mirror your business architecture, in all its functional detail. Because we never generalize your business, our models don't create the “enterprise” limitations that many vendor industry models create. You will clearly understand your business through your data because our models mirror the actual architecture of your business.
This level of business architecture detail not only ensures that you will understand your business through your data, but it also defines the foundation for your data governance. It ensures that your data architecture and information/analytics architecture are instantly unified with your data governance.
Harnessing the unbound scalability of Snowflake, we deliver a semantically defined information and analytics architecture, that enables business insight and decision making. And because that architecture is semantically defined, it will adapt to the demands of your business cycle—without the cost of fixed processing infrastructure— and it will allow incorporation of your data lake in that architecture. Learn more about Clarity’s partnership with Snowflake.
Get the full story, download the white paper today.
Written by Don Gooldy
Senior principal and data architect with 24 years of database design and system architecture experience, with 15 of those years leading Business Intelligence/Data Warehousing efforts. His solutions architect qualifications are grounded in a foundation of business aligned data architecture fundamentals.
Topics: Data Warehouse Modernization