A meeting last week prodded me to organize my thoughts on how pragmatic data governance has evolved and what impact DevOps and agility has on data governance. When working with aspiring DevOps organizations, we work tirelessly on the culture aspects – agility and continuous delivery are as much a mindset as a process. That culture extends to successful data governance.
The relevance of data governance continues to expand due to:
- A focus on analytics and being "data driven"
- Big Data investments and the specter of the "data swamp"
- Continued importance of traditional governance and master data management
Firms looking to digital transformation as a competitive advantage understand how critical high quality data is to their understanding and ability to execute.
Control Based Data Governance
We need to get a handle on this data! – Everyone
When looking to improve data quality and better stratify data by value the natural reaction is towards control based process. So, we create quality processes with control gates to ensure "bad" data can't flow into trusted systems.
- Completeness (field fill rates) Go Up – required fields are present with approved value
- Validity and Accuracy Increase – values are reliable since they match approved values and ranges
- Consistency Improves – formats are relevant to purpose and consistent across sources
- Completeness (row level) and Timeliness drop as data is held at quality gates
- Reliability can actually drop as gaps in the big picture appear due to quality hold
- Run cost jumps as stewards dedicate time to clearing quality backlogs
- Agility and Time to Value drops as projects comply with high-rigor governance processes
Grow and Transform costs are often mixed. Better quality data speeds delivery on new initiatives, but the rigor of control based governance (intentionally) expands the requirements of developing new data assets.
In an environment where an acceptable pace of innovation matches the high-rigor processes, the traditional model generates most accurate, most understood data assets. However, more and more the speed of innovation is the overriding concern.
A more nuanced approach to governance recognizes the limits of control in the enterprise. Even with the highest level sponsors and lavish funding the scale of the problem inevitably leads to sluggish processes.
Maturity-based governance works within this reality by enabling stakeholders with best in class (enterprise) tools, guiding them from local, potentially low quality data towards widely available high quality data according to their perception of business value:
The key characteristics in this model are:
- Business stakeholders drive maturity advances in their data according to the value they perceive. Data governance teams simply facilitate.
- All maturity levels get access to the firm's investment in data tools and moving up in maturity doesn't mean starting over from scratch.
- Transparency via discovery enables cross-department collaboration in advancing data maturity.
Implementing Maturity-Based Governance
Maturity-based governance is in practice a largely self-service framework. Stakeholders create and blend data with little oversight from "above" and must be well supported. We look to other self-service models as we lay out our principles for implementing maturity-based governance:
- Best-in-class tools must be deployed at all levels of maturity. Some tools can be desktop deployed. Some must implement some kind of row-level security. Others need to implement "labs" or "sandboxes."
- Transparency and consistency are the foundations of trust. Automation in deployment and documentation are the only practical way to achieve these. This is where DevOps principles come into play – ensuring that data moving to the next maturity level is accorded the rights and responsibilities of that new level.
- Agility in enterprise systems make maturity advancing investments feasible. If stakeholders incur the cost of "cleaning up" their data, they must reap the benefits in a timely fashion. And timely today is measured in days and hours, not months and quarters. Again, DevOps principles enable.
- Train early and often to ensure that stakeholder can participate effectively and produce accurate conclusions from their data.
A Brief Case Study
What could this look like in practice? Here's a composite study:
A firm decides to implement a three tier maturity based governance model:
The firm's metadata repository (glossary) is updated with maturity tier information about each cataloged element, initially according to the system type (existing enterprise systems = enterprise maturity). The documentation module of this system is used to publish in real-time.
The self-service aspects of their BI environment are activated in support of this initiative:
- Self-Service Data Preparation/Blending
- Self-Service Visualizations
Security is implemented in these environments using a combination of LDAP organizational information and metadata cataloged element maturity. Exploratory data is available only to the owner, departmental available to anyone in the department, and enterprise according to the enterprise data security policy.
CICD practices are implemented to update the security controls on each data element as elements are introduced, retired, or move up (or down) in maturity. These practice extend to:
- Data Repositories– data elements (columns and tables) are incorporated into existing models via modeling automation (for simple changes) or professional data architects. These changes are applied by the CICD pipeline.
- ETL– previously self-service data preparation jobs are promoted to new environments and scheduled.
- Analytics– OLAP and statistical models move from laptops to servers and refreshes are scheduled.
- Visualization– Reports and dashboards are published in departmental or enterprise directories.
Remember that the quality of each of these artifacts is address prior to moving up the maturity scale. Stakeholders make the decision to advance and perform or procure the steps needed to meet the next tier's quality standards
Steps Towards Maturity
Maturity based governance is a holistic program. Delivering value to self-determining stakeholders is both complex and critical. How do you implement a holistic program incrementally? Here's how we at Analysts proceed:
- Prepare with Automation– even limited (but well positioned) automation will provide value in either governance model, and automating first provides early ROI.
- Firm Up Metadata– governance revolved around metadata, and transparency will bring value even while data quality is low.
- Pilot Self-Service– nurture your future cheerleaders by deploying and training on self-service tools and practices. But take care not to wait too long before...
- Introduce Maturity-Based Governance– the concept and principles and the impact on the organization (empowerment!).
- Expand Automation– use a value model to guide continuing investments and continually improve transparency and responsiveness.
- Refine the Model– early on 2 tiers may be all that's possible with many manual steps in the promotion process. Later, a model more closely reflecting the business may grow too many tiers.
Looking forward we're excited to incorporate more data discovery, predictive analytics, and probabilistic MDM into the mix to speed the maturity based governance process.
Do you have ideas or stories about these practices? We'd love to be a part of your governance journey.