The journey to derive value from data is long, requiring data infrastructure, analysts, scientists, and data consumption management processes, among other things. Even when data operations teams move down this path, growing pains occur as more people demand more data as soon as data operations teams make progress. As a result, problems may arise quickly or develop gradually over time. Some strategies can help with this issue. A data team must recognize that these time-wasting issues are real and have a plan to tackle them.
1. Data Access without Automation
When you create a data catalog or establish a procedure for users to locate and request data, administering access becomes difficult. In conventional data architectures, accessing sensitive data is frequently a complex procedure involving much manual work. For example, creating and updating user accounts for several services can be time-consuming.
Put another way, no plan for data governance survives contact with users. So even if you build your data infrastructure in a legacy data governance model, you will be busy granting access to it. For instance, one global firm I spoke with had developed a data pipeline to move customer information from one on-premise system to their cloud data warehouse. They provided self-service access, but the demand was so significant that they devoted the following three months to granting access to that system.
- No-code approaches allow you to quickly access or block access to a data set within a cloud data warehouse, associate that policy with specific users, and apply various masking techniques within minutes.
- You can also see your users, their roles, and the sensitive data they're accessing.
- You can then identify areas where you can apply access policy and make it easy to create an audit trail for governance
- More mature organizations struggling with thousands of data users may already have a data catalog solution. Integrating a control and protection solution with the data catalog allows you to create the policies, manage them in the catalog, and automatically enforce them in connected databases.
2. Manual Data Migration
Once you have established your initial cloud data warehouse and data-consumption schema, you will want to import additional data sets. However, manual data-migration approaches may slow you down and restrict your ability to gain insight from multiple sources. You can gain efficiency by refining your migration approach and tools instead:
- Implement an ETL SaaS platform to eliminate manual discovery and migration tasks. It simplifies connection to multiple data sources, collects data from various sites, converts source data into tabular formats to make it easier to perform analytics, and moves it to the cloud warehouse.
- Use a schema manipulation tool like dbt, which transforms data directly in the cloud data warehouse.
- Follow a three-zone pattern for migration—raw, staging, and production.
- Maintain existing access and masking policies even as you add or move data or change the schema in the cloud data platform. For example, every time an email address moves around and gets copied by an automated piece of software, you must apply masking policies. In addition, you'll have to create an auditable trail every time you move data for governance.
3. Complicated Governance Auditing
With more people accessing data, you should set up a data governance framework to guarantee all that data's protection, compliance, and privacy. When data teams strive to identify the location of data, they frequently look at query or access logs and build charts and graphs to determine who has accessed it. When a big data footprint has many users interacting with it, you should not waste time applying for role-based access or creating reports manually.
Solution: To scale auditing, you should simplify it.Doing so will allow you to:
- Visualize and track access to sensitive data across your organization. Have an alerting system to let you know who, where, and how your data is accessed.
- Keep access and masking policies in lockstep with changing schema.
- Understand if data access is normal or out of normal thresholds.
- Create and automate thresholds that block access or allow access with alerts based on rules you can apply quickly.
- Automate classification and reporting to show granular relationships, such as the same user role accessing different data columns.
Should this even be your job?
The most significant time sink is data engineers, and DBAs handle data control and protection. Is this even logical? Since you're taking data from one place to the next and knowhow to write SQL code required by most tools to grant and restrict access, it has fallen to data teams.
But is that the best use of your time and talents? Wouldn't it make more sense for the teams whose jobs focus on data governance and security to be able to manage data governance and security? With the proper no-code control and protection solution, you could transfer these tasks to other teams – invite them to implement the policies, pick which data to mask, download audit trails, and set up alerts.Then, once you get all that off your plate, you can move on to what you were trained to do: extract value from data.