Data Companies, Data Culture, Data Driven and the fable of Startups

Share on facebook
Share on twitter
Share on linkedin

During my good years in the financial business, I learned that everything you do not adopt while things are small will turn into a big nightmare in the future. This is no different for Data Governance and Privacy processes regardless of the size of the company.

 

In a company already structured with a few decades of road certainly the technological environment, the various legacy systems built on different technologies and methodologies throughout its history will be elements that will increase the complexity in the structuring and adoption of new capacities, methodologies and processes as per example to adapt to GDPR‘s Privacy standards.

 

In a new born company this level of complexity does not exist. However, in the world of Startups I see countless founders who misuse the concept and the Startup journey to justify how rudimentary their Data Governance and Privacy processes are. Those same founders commonly fill their mouths to proudly say that they are a Data Company and preach a Data Driven culture.

 

But, is it possible to be a Data Company or have a Data Driven culture without a minimal structured Data Governance process? I don’t think so. Why don’t founders take advantage of the fact they are starting from scratch to minimally structure a Data Governance process mapping and comply with Privacy requirements?

 

I also cannot answer that question. But, with the experience I acquired as a CTO for an Italian startup in the last two years, I can say that the difficulty level of adopting a Data Governance and adapting to GDPR standards after your applications have already had a go-live is the same of a large corporation.

 

When I talk about Data Governance, I don’t mean that all the DAMA chapters should be entirely structured since the beginning. Nor that there is a need for the rigidity and high degree of formalism of the Information Architecture processes. The first step to be taken to become a Data Company with a Data Driven culture is simply to adopt a Data Catalog based on broad connectivity and machine learning capacity to define metadata, lineage and glossary terms.

 

But what is a Data Catalog? A Data Catalog is a repository of technical and business metadata about an organization’s data and information sources. A data catalog spans five key areas: intelligence, collaboration, guided navigation, active governance and a broad and deep connectivity. These five key areas will allow you to easily search, interpret and analyze data assets in one friendly interface, use data assets exercising authority over those assets by identifying those that are sensitive to Privacy and applying appropriate masking and access rules. 

 

The adoption of a Data Catalog to support a Data Governance and Privacy strategy will disseminate a Data Driven culture much more quickly covering in a controlled and complete manner the Privacy, Risk and Compliance requirements. Through a curation process will be possible to assign stewards to data assets to create glossaries and business terms associated to technical metadata. A Data Catalog will naturally increase the speed of the development of new services and products as well as greater agility and accuracy in the treatment of relevant information for strategic decision making.

 

It is not necessary to be large to adopt Data Governance processes in the same way that if the company does not manage its data properly they cannot call themselves a Data Company. Having data is not enough. It is necessary to know the data and in fact use it to add even more value to your customers.

 

By Andre Luiz Coelho da Silva

Artigos recentes

Soda introduces new reporting API

Soda Introduces Operational Analytics Dashboards to Gather Insights on Platform Usage & Data Quality Efforts

It’s a stubbornly persistent struggle for organizations to ensure that their tech investments deliver on their promise, with the intended value and benefit. And with an increased investment in new data quality management solutions, we’ve heard many data teams ask for a standard way of reporting on their data quality efforts with Soda. Data teams usually want to answer the following:

  1. Are our users using the data quality monitoring solution?
  2. Are our users adopting it into their way of working and adding data quality checks before they use data?
  3. Are our data quality standards improving?
  4. How broad is our test coverage?

I’m excited to introduce a new Reporting API for Soda Cloud that enables you to build dashboards using Key Performance Indicators that help you understand Soda’s impact on ensuring data quality in your organization. Use the API to assess tool adoption and analyze the health and test coverage of your datasets.

custom dataset quality overview dashboard example utilizing new Soda reporting API
An example of a dataset quality overview dashboard using the Soda Cloud Reporting API

To get organizations going, we’ve released several API endpoints that we think will answer some of the common questions that our users need to answer.

To understand user adoption, you can gather data that gives you insight into:

  • Sign-ups: Who has signed up for Soda Cloud in your organization?
  • Sign-ins: Who is using Soda Cloud every day?
  • Daily account activity: How active is your team within Soda?
  • Scans run: Which Soda scans has your team  been running and how many tests have been executed?

Is Soda having an impact and helping us discover data issues before there is a downstream impact?

  • Alerts sent: How many alerts has Soda Cloud sent to notify users of a data issue?

To get a sense of whether your data is meeting data quality agreements, you can gather data on:

  • Dataset health: How healthy is each dataset, based on the number of tests that are passed during each Soda scan?
  • Test results: What are the results of all our tests and how have these changed over time? 

We also provide endpoints that enable you to understand how good your testing coverage is, where datasets are failing, and which specific rules and metrics are causing the majority of data quality problems:

  • Tests: A list of all of the tests and the datasets to which they apply.
  • Datasets: A list of all of the datasets that Soda Cloud accesses, including the last scan time and test failure counts. 
  • Dataset coverage: A score derived from the number of tests that apply to a dataset compared to the number of tests applied to all other datasets accessed by Soda Cloud.
Soda Cloud Reporting API Documentation
A preview of the API documentation

To ensure that you have real-time visibility to improve your data’s quality and reliability, all platform impact endpoints run in real-time. All endpoints relating to adoption and usage are refreshed daily.

Working with poor quality data is time consuming and resource intensive. Using these endpoints, you can build data quality dashboards that help you quickly understand where data quality is falling short.

Improving data quality improves business

Here’s another advantage: a data quality dashboard is a powerful communication tool to show the impact and business value that your initiatives and programs are delivering and helps elevate the visibility and importance of data quality. It’s a great mechanism to get everyone interested in data and its quality, because every organization that is data-enabled knows that good data quality is fundamental to the success of their business.

By providing a simple way for everyone to understand where data is falling short and where data is making a difference, we are bringing everyone closer to the data and increasing their trust in it. That’s what we’re particularly proud of at Soda, making data quality a team sport.

The Reporting API enables you to build dashboards that provide insights that are useful and interesting either to a head of data or business operations, while providing data engineers and analysts with the granularity they need to prioritize their data reliability efforts.

Organizations need data quality management that addresses the needs of all users and incorporates them into a unified platform that has the ability to meet their diverse needs, in environments they are familiar with.

Know Your Datasets and Gauge Adoption

Assessing the quality of data within a dataset helps organizations get ahead of silent data issues and provide end-to-end transparency.

  • Which datasets are least tested?
  • Are critical datasets healthy?
  • Which data issues should we tackle first?
  • What can we learn about our data quality issues in order to improve and deliver trusted data to the organization?
  • How often is our team running scans?
  • How are people on our team using Soda Cloud?

Get Started

The Soda Cloud Reporting API is available as of November 23, 2021, to all registered users of Soda Cloud. Access our API documentation at docs.soda.io.

With the help of a data engineer collaborating with an analytics engineer in your team, you’ll be able to use the API to build reporting dashboards in under an hour. Our docs have a step-by-step guide to get you going.

As a data engineer, once you’ve captured the data you need from the Reporting API, move it into the storage of compute systems that power your existing reporting or visualization tools.

Engage your analytics engineer to transform the data, bake it into the business logic, and create beautiful dashboards that answer the questions that will help your team understand and trust the data.

If you’re new to Soda, sign up for a free Soda Cloud account today. You’ll need a Soda Cloud account connected to an instance of Soda SQL installed in your environment. If your team has already defined data quality tests and run some scans against datasets, you’ll have data that the Reporting API can retrieve. If you’d like to see the reporting API in action, get in touch with us to request a demo, or use the easy-to-follow guide available in our docs.

Soda Architecture and Flow of Data into Dataset Health Overview Dashboard via Reporting API

Why an API and not ready-made dashboards?

Our data engineers and data scientists unanimously agreed that providing an API for our users would address the need to get quick visibility into user adoption, understand the data that matters most to the business, and ensure that data quality standards are improving. Speaking with our community, we also realized that we don’t need to replace the reporting and visualization tools that they already use. We aim to make it easy to integrate with Soda and this API means teams can continue using the tools they love – one less change to support greater adoption.

The Future is Built on Trusted Data

This is just the beginning. Customers should always expect more value from the tools and platform that they’re investing in, and we’re committed to delivering it, quickly and consistently. We’re continuing development on Soda’s Reporting API to connect you to the data that helps you understand your data and trust your data.

With that in mind, we have taken another cue from our community of users to offer Incident Management within Soda Cloud. Keep an eye out for the imminent release of a Slack-integrated feature that helps teams collaborate to quickly resolve data quality issues, before they have a downstream impact.

 

Link para o artigo: https://www.soda.io/resources/soda-introduces-new-reporting-api

Data Companies, Data Culture, Data Driven and the fable of Startups

During my good years in the financial business, I learned that everything you do not adopt while things are small will turn into a big nightmare in the future. This is no different for Data Governance and Privacy processes regardless of the size of the company.

 

In a company already structured with a few decades of road certainly the technological environment, the various legacy systems built on different technologies and methodologies throughout its history will be elements that will increase the complexity in the structuring and adoption of new capacities, methodologies and processes as per example to adapt to GDPR‘s Privacy standards.

 

In a new born company this level of complexity does not exist. However, in the world of Startups I see countless founders who misuse the concept and the Startup journey to justify how rudimentary their Data Governance and Privacy processes are. Those same founders commonly fill their mouths to proudly say that they are a Data Company and preach a Data Driven culture.

 

But, is it possible to be a Data Company or have a Data Driven culture without a minimal structured Data Governance process? I don’t think so. Why don’t founders take advantage of the fact they are starting from scratch to minimally structure a Data Governance process mapping and comply with Privacy requirements?

 

I also cannot answer that question. But, with the experience I acquired as a CTO for an Italian startup in the last two years, I can say that the difficulty level of adopting a Data Governance and adapting to GDPR standards after your applications have already had a go-live is the same of a large corporation.

 

When I talk about Data Governance, I don’t mean that all the DAMA chapters should be entirely structured since the beginning. Nor that there is a need for the rigidity and high degree of formalism of the Information Architecture processes. The first step to be taken to become a Data Company with a Data Driven culture is simply to adopt a Data Catalog based on broad connectivity and machine learning capacity to define metadata, lineage and glossary terms.

 

But what is a Data Catalog? A Data Catalog is a repository of technical and business metadata about an organization’s data and information sources. A data catalog spans five key areas: intelligence, collaboration, guided navigation, active governance and a broad and deep connectivity. These five key areas will allow you to easily search, interpret and analyze data assets in one friendly interface, use data assets exercising authority over those assets by identifying those that are sensitive to Privacy and applying appropriate masking and access rules. 

 

The adoption of a Data Catalog to support a Data Governance and Privacy strategy will disseminate a Data Driven culture much more quickly covering in a controlled and complete manner the Privacy, Risk and Compliance requirements. Through a curation process will be possible to assign stewards to data assets to create glossaries and business terms associated to technical metadata. A Data Catalog will naturally increase the speed of the development of new services and products as well as greater agility and accuracy in the treatment of relevant information for strategic decision making.

 

It is not necessary to be large to adopt Data Governance processes in the same way that if the company does not manage its data properly they cannot call themselves a Data Company. Having data is not enough. It is necessary to know the data and in fact use it to add even more value to your customers.

 

By Andre Luiz Coelho da Silva

The Rising Need for Data Governance in Healthcare

Healthcare is changing, and it all comes down to data. Leaders in healthcare seek to improve patient outcomes, meet changing business models (including value-based care), and ensure compliance while creating better experiences. Data & analytics represents a major opportunity to tackle these challenges. Indeed, many healthcare organizations today are embracing digital transformation and using data to enhance operations. In other words, they use data to heal more people and save more lives.

Female doctor reviewing healthcare data at a computer

How can data help change how care is delivered? Value-based care is a new concept, growing in popularity and transforming the business model. It introduces a new incentivization structure for physicians, which rewards them for the value of their care instead of the quantity of care. The goal is to support better patient outcomes. Hospitals and pharmacies, too, are increasingly considering this model. Leaders are asking how they might use data to drive smarter decision making to support this new model and improve medical treatments that lead to better outcomes.

Yet this is not without risks. Protected health information (PHA) and personally identifiable information (PII) that providers of healthcare and clinical trials manage is pursuant to privacy laws, like the HIPAA, CCPA, and GDPR, which mandate how such data can be used. This data is also a lucrative target for cyber criminals. Healthcare leaders face a quandary: how to use data to support innovation in a way that’s secure and compliant?

Data governance in healthcare has emerged as a solution to these challenges. It defines how data can be collected and used within an organization, and empowers data teams to:

  • Maintain compliance, even as laws change
  • Uncover intelligence from data
  • Protect data at the source
  • Put data into action to optimize the patient experience and adapt to changing business models

What is Data Governance in Healthcare?

Data governance in healthcare refers to how data is collected and used by hospitals, pharmaceutical companies, and other healthcare organizations and service providers. It combines people, process, technology, and data within a system founded on transparency and compliance. In this way, it builds human trust in the data while ensuring the data is used properly.

An active governance framework supports data-driven decision-making. This, in turn, empowers data leaders to better identify and develop new revenue streams, customize patient offerings, and use data to optimize operations.

Whether it’s an out-patient clinic, drug discovery and clinical research lab, or any other organization that provides treatment, tests, rehabilitation, or therapy – data security is critical. Healthcare organizations need to manage and protect sensitive information in a consistent, secure, and organized way.

As Michelle Hoiseth, Chief Data Officer of Parexel, a global provider of biopharmaceutical services, said in a recent interview: “We needed to understand how we could leverage data that was forming in electronic medical record systems, claim systems, and pharmacy claims systems to really see the impact of new treatments.”

For Michelle, step one was “appreciating that your data is an asset to enable your business.” To make good on this potential, healthcare organizations need to understand their data and how they can use it. This means establishing and enforcing policies and processes, standards, roles, and metrics. These systems should collectively maintain data quality, integrity, and security, so the organization can use data effectively and efficiently.

Quote from Michelle Hoiseth, CDO at Parexel

Why Is Data Governance in Healthcare Important?

Healthcare data is valuable and sensitive, so it must be protected. This is why healthcare organizations are subject to strict compliance mandates. These mandates ensure that PHA and PII data are protected and managed properly, so that patients are protected in the event of data breaches.

Yet this same data is critical to improving patient outcomes. It can guide adaptation to changing business models and aid innovation, creating better patient experiences. But again, how you work with this data is subject to compliance scrutiny. The people working with it need guidance if they’re to use it appropriately.

Here is a closer look at some of the leading reasons your team should implement data governance to enable you to use and protect this data:

Ensures High-Quality Data Analysis

Healthcare organizations often have many different databases to manage their diverse data and often have multiple databases handling the same information. However, grouping that data intelligently and making sure the right data is being properly used is a challenge.

Intellectual property, like medical research data, often contains PHI and PHA. For example, in large databases for pharmaceutical companies, medical trial data may include both the pharmaceutical research and the study population’s personal information. Anonymized versions of that data may also be generated and shared, creating multiple data sources with the same information.

Hospitals, too, often collect PII and PHA in multiple systems. Duplicative data is common, as a patient may see more than one specialist or have visits in more than one facility. Storing the same data in multiple places can lead to:

  • Human error: mistakes when transcribing data reduce its quality and integrity
  • Multiple data structures: different departments use distinct technologies and data structures

Data governance is the solution to these challenges. How can you improve the patient journey, when you don’t have accurate data from every touchpoint of that journey? How can you analyze business models without great operational data from across the organization?

Improving the patient experience requires combining this data to put it into action. Data governance not only provides a transparent framework for correct usage. It ensures quality data forms the foundation of all insights. A mountain of duplicate data can open the door to unintentional non-compliance. It can even diminish the overall quality of the data over time.

Meet Compliance Requirements

State, federal, and regional governments all understand that cybercriminals want PHI and, increasingly PHA. To protect this information, legislative bodies mandate strict rules for handling this sensitive data. Today, lawmakers impose larger and larger fines on the organizations handling this data that don’t properly protect it.

More and more companies are handling such data. No matter where a healthcare organization is located or the services it provides, it will likely host data pursuant to a number of regulatory laws.

Some important compliance regulations include:

  • Health Insurance Portability and Accountability Act (HIPAA): US federal law protecting patient data privacy
  • General Data Protection Regulation (GDPR): European Union law protecting data subject privacy
  • California Privacy Rights Act (CPRA): US state law protecting consumer personal information privacy
  • Payment Card Industry Data Security Standard (PCI DSS): Payment industry compliance requirement protecting cardholder data

To meet compliance requirements, healthcare organizations need to know where all sensitive information is located and be able to prove it’s governed effectively.

Protect From Cybercriminals

Cybercriminals have nearly always targeted PHI and are increasingly focusing on healthcare. Whether they want to steal identities, sell data, or hold information hostage, these actors recognize that such data has a financial value.

The 2021 Data Breach Investigations Report found that in healthcare:

  • 61% of data breaches were caused by external actors
  • 91% of data breaches were financially motivated
  • 66% of data breaches involved personal information
  • 55% of data breaches involved medical information

An overabundance of data can challenge an entity’s ability to protect it. Indeed, an organization can’t protect information if it doesn’t know what it has or where it lives. Clear data governance policies and processes start with implementing a data catalog and labeling private data accordingly. This knowledge empowers data leaders to take appropriate action to both protect and use it compliantly.

5 Steps for Creating Effective Data Governance in Healthcare

As healthcare organizations grow, they need scalable data governance practices to both keep private data secure and remain financially competitive. From engaging in research to providing emergency care, healthcare organizations must ensure that they can efficiently and effectively use data.

1. Determine Business Goals and Objectives

Healthcare organizations have many data use cases. At the outset, the organization must decide how data governance fits into the business goals and define objectives accordingly. For example, some goals might include:

  • Determine competitive strategies
  • Increase patient engagement
  • Decrease adverse medication effects
  • Increase patient telehealth services usage
  • Reduce audit times
  • Mature security and privacy posture

Each of these goals will require different types of information. To use that information compliantly, data teams must work within a transparent governance framework.

2. Identify, Categorize, and Prioritize Your PHI

PHI is arguably the highest risk data that a healthcare company manages. In order to stay compliant and provide the best patient care possible, identifying and categorizing PHI should be a top data governance priority.

It’s also important to make sure that information is properly categorized across all areas of the organization, including:

  • Clinical data
  • Lab data
  • Payment processing data

Where data lives and how it’s classified will determine how it’s governed. Compliance audits require that sensitive data be marked accordingly, with evidence that demonstrates usage in line with regulatory law.

3. Assess and Assign Privileges and Permissions

Privileges and permissions define who can access what data, and what they may do with it. As a best practice, data access should be governed according to the principle of least privilege. This means limiting access to information as much as possible without getting in the way of someone’s ability to do their job.

The healthcare industry has a growing number of interoperability standards, which dictate how information is stored and shared between devices. Before you assign privileges it’s important to:

  • Define types of data that different areas need to access
  • Define who within a functional area needs to access the data
  • Outline how they can access the data, including details about devices, geographic locations, and time of day

For example, a phlebotomist needs to know the patient’s name and date of birth. However, they may not need access to the patient’s entire medical history. Too much access increases the risk that data can be changed or stolen.

4. Remove Low Quality, Unused, or “Stale” Data

In healthcare especially, data integrity is incredibly important. Low quality, unused, or “stale” data can negatively impact research by skewing findings. From a physician’s perspective, bad data can lead to care issues. For example, outdated patient prescription information can impact a doctor’s diagnosis and treatment plan. Keeping data fresh helps to achieve both care and operational goals.

5. Assign Key Roles and Train Employees

Finally, it’s important to have the right people with the right training in charge of data governance. To do this, you should create teams based on role, including practitioners, IT team members, and finance.

Accountability is important. Every functional area that manages sensitive information needs to ensure that the data managers, data owners, and data analysts understand their responsibilities. Data owners are in charge of their data, and they must know who has access and who should have access.

In addition, adding a Chief Data Officer (CDO) can help maintain best data governance practices. The CDO acts as a point-of-contact within the organization for data managers maintaining the daily activities.

Monitor, Measure, and Continuously Improve

At this point, you should reference back to the goals you set in Step 1. If your goal was to increase patients’ telehealth services usage, for example, you’ll need benchmarks of current usage to measure change with time. Dashboards are useful means to track such change.

Once you have baseline metrics, you can monitor change over time and measure the impact of business efforts on achieving the goals you’ve set. This takes time, attention, and patience! Don’t feel frustrated if you don’t see results immediately.

Finally, data governance is a cycle. As you measure your progress, you may spot areas where you could get better. It’s important you make those changes as you go. This ensures you continuously improve your governance process.

Implement Data Governance in Healthcare with Alation

Whether your healthcare organization is looking to optimize patient care, improve research processes, or meet compliance requirements, data governance is mission-critical. Alation’s data catalog creates a standardized view of assets and ensures consistent data quality. Alation’s Data Governance App then helps you create the policies and procedures needed to make sure that the right data is used and that it is used properly.

For Michelle Hoiseth, Chief Data Officer of Parexel, this approach now means that “People see who is accountable for that data, the viability or quality of that data, classification or other limitations of use. They are then able to create a direct connection with people whose job it is to help them get their data needs met, no matter who you are or where you are in the business”

By consolidating data in a single location and making sure it is used properly, everyone in healthcare, including researchers, clinical trials, and care providers, can make better-informed decisions. Better decisions impact the outcomes for patients, help navigate changing business environments and value-based care and, overall, improve the experiences for everyone in their organization.

 

Link para o artigo: https://www.alation.com/blog/data-governance-in-healthcare/

A Verodati é uma empresa fundada em 2020 com atuação na América Latina e na Europa a partir de nossos escritórios em Brasília e Milão.

Brasil

SBS Q2 Bloco E, 206 – 70070-120 – Brasília

Itália

Piazzale Giulio Cesare, 9 – 20145 – Milan

Siga as redes sociais

Copyright - 2021 - Verodati - Todos os direitos reservados