Buzzwords such as Big Data, In-Memory, Hot Data and Cold Data inundate our world, but what do these words really mean and how does it impact your landscape?
- Big Data: large volume of data, both structured and unstructured. However, it is not simply the volume of data that is important, but the valuable information that can be extracted.
- In-Memory: as opposed to a spinning disk an in-memory platform uses main memory; because there is no physical movement, performance is significantly faster.
- Hot Data: Data that needs to be accessed quickly to support frequent and/or time sensitive operational decisions.
- Example: Most current and active datasets supporting real-time operational reporting
- Cold Data: Data that is important to be saved and accessed only as needed.
- Example: Historical and less frequently accessed data used by BI application Trend analysis. Other examples include machine sensor data, social media and weblogs.
In-memory and Hot Data, Big Data and Cold Data are traditionally used synonymously but this relationship doesn’t always hold true.
What is SAP HANA® and HADOOP®?
SAP HANA is an in-memory, columnar storage platform that can be deployed both on premise as well as on the cloud. Some commonly believed myths regarding HANA include:
- Implementation of HANA takes a long time
- FALSE: standing up a HANA instance has never been easier; cloud instances can be up and running in less than an hour.
- HANA is a proprietary product and doesn’t allow for integration
- FALSE: HANA is a very open platform, allowing for integration to SAP and non-SAP products using a variety of standard connectors.
- HANA is a database
- FALSE: there are additional features, such as ability to host applications and integration tools for connection to SAP and non-SAP sources.
HADOOP is an open source framework that is built to efficiently store, process and retrieve enormous amounts of data and information across a cluster (collection of computers / machines).
Some commonly believed myths regarding HADOOP include:
- HADOOP is the only answer to Big Data
- FALSE: even though HADOOP and Big Data are often used interchangeably there are other tools and solutions out there such as Teradata, Vertica etc.
- HADOOP is only about storing large data sets
- FALSE: though data volume does come into play, it is often the nature of the data itself that clients use HADOOP for, such as storing both structured and unstructured data.
- HADOOP is just one product
- FALSE: An overview of the ecosystem is given below:
Hot vs. Cold Data
Of course, data doesn’t actually have a temperature. The terms hot and cold refer to the frequency by which the data is accessed.
Traditionally, In-Memory platforms (ex. HANA) are ideal when speed-to-insight is important. For example, real-time data updates and analytics. However, the volume of data is not excessively large and the cost is justifiable given the business need.
Environments such as HADOOP, with a high capacity disk and NoSQL file systems, are better suited for storing unstructured, high volume data, which can be costlier and time consuming for a relational database to ingest. Longer runtimes for data analysis and processing are acceptable as tradeoff for minimizing storage costs.
Regardless of the classification of hot or cold, it is imperative that the data be accessible in a timely fashion when needed by the users. Additionally, users need not be aware of the physical storage of the data, nor do they need to modify their query based on that information.
Even though the user doesn’t need to know where the data is stored, IT still needs to answer the question “what data goes where?” The short answer is, it depends, but the chart below presents some of the dimensions to consider when evaluating data sets:
Other than transactional data, master data (attributes), hierarchies and text data governance also need to be in place.
Based on experience, it is better for performance if non-transactional data is replicated (duplicated) to each of the disparate source systems. Joins can be done prior to federation to destination.
The allocation of Hot Data automatically to an In-Memory platform and Cold Data to a Big Data platform is not always the right answer. Even though traditionally this may be the case, each data element needs to be evaluated independently and, following a governance methodology, segmented into the appropriate location. Duplication of data is not necessarily a terrible thing. There are instances, such as for performance, where Master Data may need to be stored multiple times in the source system.
The Clarity Difference
At Clarity Insights, we use data strategy, engineering, science and visualization to help companies transform insight into action specifically for the unique demands of SAP. We start all projects by understanding our client’s business strategy, then understanding how data can drive success. This way we are always focused on business outcomes, which helps ensure business buy-in. We help you embed the solution in your processes, and use change management to obtain adoption. We also focus heavily on knowledge transfer to our clients, ensuring they are empowered to take action faster and with more confidence.
Our SAP practitioners:
- Have extensive knowledge and practical experience in both SAP HANA and Big Data platforms, such as HADOOP
- Take a holistic approach for data segmentation, ensuring data integrity and one-version of the truth
- Work with clients on building out a roadmap and strategy to get from current to future state architecture
- Collaborate with our customers to identify use cases, quantify business benefits and determine a feasible roadmap for transition
- Provide a technology-agnostic review to ensure we recommend the best tool to satisfy requirements
These are just some of the reasons that more than 80% of Clarity’s customers hire us for additional engagements. We have been trusted partners to the most exacting, data-intensive organizations in the nation for years. Check out our SAP HANA use case.