To understand augmented data management, it’s first necessary to know what is involved in basic data management. “Data management” is an overly broad term relating to activities ranging from database administration to disaster recovery. A better term might be “data quality management.” If the operative phrase in data processing is “garbage in, garbage out,” data quality is all about making sure that you’re not working with “garbage” data in any of your operations.
One example of managing for data quality is what’s known as Master Data Management (MDM). MDM is a process whereby there is just one correct version of data used by a business. In a financial services company, for example, MDM will make sure that John A. Smith does not get a trading account statement meant for John A. Smith, Jr., John B. Smith or John A. Smyth, and so forth —it creates a “single version of the truth” for the financial firm.
Other areas of data quality management involve overseeing metadata, clean data integration and accurate database administration. This means dealing with database change management, communication protocols and technology enablement to support all these data quality management functions.
For instance, if a healthcare insurance company or hospital system wants to drive decisions based on insights, they will want to be confident that the underlying data is clear and of high quality. To make this happen, the organization will need to apply remedial measures to resolve any significant data quality issues. They will also want to track data lineage and develop a business glossary to maintain enterprise common definitions of trusted data.
Right now, much of this work is manual in nature. Augmented data management addresses this difficulty by adding Artificial Intelligence (AI) capabilities, Machine Learning (ML) and automation to standard data management practices. Gartner expects such augmentation to have an effect on every aspect of data management. For example, with metadata, augmentation could convert it from limited use in audits and reporting to serving as the heart of more dynamic systems.
As the report says, augmentation will change metadata from passive to active. In the financial setting, metadata is often useful in understanding how well a trading system is performing, e.g. is trade execution slower on one stock exchange versus another? Metadata can reveal patterns such as these.
2019 Market Guide for Data and Analytics by Gartner
What Augmented Data Management Makes Possible
Gartner expects augmented data management to make possible a host of new data management capabilities. For example, with augmented data management, emergent metadata can be inferred from data utilization as well as from use cases and users, i.e. the way people use data is a valuable piece of information in and of itself.
If data is only accessed by investment managers, for instance, then perhaps the data is related to the investment management process, even if the official “descriptive metadata” in the database does not contain this knowledge. Inferred metadata thus represents an advance over descriptive metadata, which may no longer be synchronized with actual data capture and write processes.
On another front, augmented data management is projected to lead to the development of an established data fabric based on active utilization of all metadata types. This will occur by means of processes for inventorying metadata as well as the automatic discovery of semantics, taxonomy and ontology. Adding onto the inferred investment management data example, a financial firm might similarly infer associations with other functional groups by means of metadata, e.g. trading operations, marketing, client management and so forth.
The data fabric can then extend the reach of inferred data types by integrating their sources with relevant systems using APIs and microservices. Gartner projects that they will be deployed in new, customized environments that require consistency of performance and persistence of operations.
Dynamic data identification, another aspect of augmented data management, allows data assets to be evaluated “in stream.” This way, data managers can develop related event models based on cumulative information about the data assets. From there, they can create use cases that comprise processing requirements that recommend data for operational and/or analytics use cases.
Augmented data management will help organizations that need an easy way to discover the data they have and what it means in terms of value and integrity. This is made possible through the utilization of existing system statistics. Such data fusion processes can track which assets are used by use case. This enables the formation of a knowledge and utilization graph, e.g. which data is related to investment management, and do the investment management databases connect adequately with these newly-discovered sources of data? As new data assets arise, fusion engines analyze their similarity to other data assets. The process determines the assets’ affinity to existing data/use cases, culminating in an alert to other automated systems that new data is available for inclusion.
How Does Augmented Data Management Affect Your Organization and Skills?
Augmented data management is likely to have a significant impact. Gartner projects a 45% reduction in manual data management tasks over the coming three years as a result of augmentation. They further predict that AI-enabled automation will reduce the need for specialized data management personnel by 20% by 2023. In particular, they foresee the following changes coming in data management through augmentation:
- Automating of some data engineering tasks
- Alerting data engineers to potential errors, issues in data as well as suggesting alternative interpretations of data
- Creating automated system responses to errors
- Increasing capabilities for use of available data, open data and partner data as well as other data assets that are now hard to assess for utilization
- Automating data interrogation, mimicking data discovery and evaluating how new assets will conform to known or existing models
Gartner also expects augmented data management to have an impact on distributed data management. For instance, continuous monitoring of the capacity and rate utilization of data management environments might lead to potential redistribution of resources, including across hybrid cloud data infrastructure.
On a related front, new forms of optimization and performance management may eliminate manual determinations regarding when to make copies of data—whether they are intermediate, temporary or permanent. The goal would be to enhance operational or analytical performance. Augmentation might also include new kinds of policy-based decision engines. These could capture regulatory requirements for metadata configurations and then execute assignment of data “hot” or “warm” use cases, archiving or purging for the sake of legal and audit compliance. A financial firm, for example, might be constrained by regulations regarding what data is needs to retain for a given period of time. On the other hand, healthcare companies may need to tag their data assets based on sensitive data classifications so patient or member related sensitive information is appropriately managed and disseminated.
Augmented Data Management Use Cases
Augmented data management uses machine learning and AI to drive change in a number of established data management use cases, including:
- Data quality – Augmented data management techniques can extend data cleansing, profiling, and linking. They can also identify and semantically reconcile master data in multiple data sources.
- MDM – ML can enable the creation of new record-matching and merging algorithms.
- Data integration – Augmentation simplifies the data integration development process by recommending or automating repetitive integration workflows.
- Database Management System (DBMS) – Augmented techniques can automate storage management, indexes, partitions, database tuning, patching, upgrading, security and configuration.
Moving Forward with Augmented Data Management
Augmented data management capabilities are starting to appear in data management solutions. Gartner recommends getting started with augmentation by exploring capacity planning tools that include dynamic hardware/infrastructure provisioning. This can result in needing fewer skilled personnel with expertise in balancing physical infrastructure with logical data management requirements. It is also now time to develop a strategy for isolating data assets that are needed in multicloud and on-premises scenarios for potential replication/copy instances.
Clarity Insights can advise your organization on how to make the most of coming augmentations in data management. To learn more, let's talk.
Written by Clarity Insights
Clarity Insights is a strategic partner to the nation's leading data-driven brands.
Topics: Data Strategy