Products like Snowflake’s Data Cloud make it very easy to share data. Data can be easily shared over data supply chains where a single source of data is shared between a bank and its client without proliferating data.
Data Sharing is Access Sharing
Data sharing is an access control mechanism by which “datasets” are shared with “Identities”. These identities could be human identities (analysts, product teams) or non-human identities (Cloud Services, Service Accounts and APIs that programmatically access datasets). When you share data, you share access to that data.
Data sharing introduces its own set of access control, security, compliance, and governance challenges. The following are some of the aspects of access sharing that need to be considered:
- What are the roles of the data producer versus the data consumer in terms of access control?
- What is being shared? And with whom? Who should have access to what data?
- What is the granularity of access sharing? Databases or tables or columns?
- Who is responsible for access governance, including access reviews and continued business justification for data sharing?
- Is the right data being shared with the right data owners for the right reasons?
- Data access monitoring needs to be able to detect unauthorized access.
- Data access reviews and governance needs to be established.
- Data cannot leak between clients.
Illustrative Use Case – NeoBank:
NeoBank is a financial services company who provides a trading and settlement platform for its clients. As part of its core trading business, NeoBank manages all of its client data, e.g., trade and settlement data. Clients of NeoBank ask for different types of analytics and reporting on their data. Currently, these ad hoc requests for data processing are fulfilled by NeoBank through manual methods and analytics in an ad hoc method. NeoBank wants to get out of the ad hoc data processing business for specific clients as this is very manual, time-consuming and unplanned.
For example, let’s assume that NeoBank is using Snowflake and has all of its clients’ data in Snowflake instances. By having client data shared via Snowflake, NeoBank has effectively offloaded data processing back to its clients using a self-service model. This greatly reduces cost, enhances client productivity and client success. But it also introduces new data sharing, audit, compliance and governance risks.
Illustrative “Data Sharing” Use Case and Challenges
Consider a “Same REGION Data Sharing” scenario where you have multiple share-objects and those are associated with different Snowflake accounts. In the scenario below, a customer Snowflake instance (provider) has three “share objects” (ALPHA, ZEUS and DXP) that are shared with four different client Snowflake instances.
When the velocity and scale of data sharing increases, it becomes challenging to keep track of each and every data shared object and monitor what has been shared by them and with whom the data sharing is associated. This becomes even more difficult to manage the Share objects at a larger scale, if they are sharing database roles.
Visibility to “Access sharing” becomes a major risk for data exfiltration, unauthorized access and breakdown in data governance. The Stack Identity platform can drill down into the Snowflake instance metadata, and show which database role is the part of the DXP share object.
The Stack Identity platform can further extract the granular information of each and every resource shared by the DXP Share object, and can trace down the data columns which are part of this data sharing.
Stack Identity Value Proposition for Snowflake:
Stack Identity delivers a complete identity-first cloud data security platform for Snowflake and other cloud data providers. With Stack Identity, you can continuously identify, quantify and eliminate shadow access across Snowflake and the AWS ecosystem. In under 60 minutes, the Stack Identity data security platform reveals unauthorized, invisible and unwanted cloud data access patterns that are impossible to detect without multiple tools and months of manual effort.
With continuous monitoring, policy-driven automated rightsizing, and always-on governance, Stack Identity empowers secure and safe data sharing across B2B data supply chains.
Conclusion
Data sharing has unleashed a new set of challenges in securing your data supply chains. There are enormous benefits to data sharing that reduces data proliferation and duplication but it creates significant new challenges to secure and govern access to data. Data sharing is an important part of many businesses, but has its own set of challenges with access control, security, compliance, and governance that still need to be carefully managed. Stack Identity offers a comprehensive solution for Snowflake that enables secure and safe data sharing across B2B data supply chains.