MineOS Data Discovery and Classification: the thinking behind our approach

Kobi Nissan

Jan 9, 2023

•

min read

MineOS Data Discovery and Classification: the thinking behind our approach

With SaaS technology's rapid rise and expansion, data has become one of the most challenging aspects to handle in the business world. Today’s data privacy challenges put consumers' personal information at risk around the globe. Data privacy professionals have an almost impossible task: oversee all of their organization’s data to stay compliant with necessary regulations and leverage it as a business advantage, all while working in a historically underutilized department.

The ideas above are why we started building this platform, and they continue to hold true to this day. Our goal is to empower data privacy professionals with a platform they can get up and running fast, use themselves, and rely less on outside resources. By using proprietary machine learning built in-house, MineOS's platform aims to simplify and accelerate data management routines. Our recently announced functionality with Continuous Data Classification and Smart Data Sampling is an example of that. This blog is a deep dive into our approach and the technology behind it.

Data is in our DNA

Our company’s structure aligns engineering and design to provide users with the best product experience. We constantly hone our cutting-edge machine learning to aid data source discovery and classification while still presenting our users - privacy professionals - with insights to create a plan of action and supervision without overwhelming them with data or a complex UI. Diving into the root cause and exploring the most advanced scenarios let Mine design and engineer a data compliance stack that fully addresses the challenges of the data privacy world.

One of those challenges is the inconceivable amount of SaaS tools on the market and the extensive amount of people inside any company using them. Enterprise companies often use hundreds of SaaS systems, making inventory building, i.e., data discovery and classification, the necessary basis of any privacy program. Keeping up with past and present data sources is one of the biggest struggles facing compliance professionals.

Traditionally, a privacy team would need considerable time to identify even a percentage of these systems, check what data is handled through them, and create corresponding mappings between data and controls. The time and effort required to map all this out, and often to do so only partially, consumed compliance professionals before automation made the process smoother. Even with automation, however, data mapping reports are likely to be outdated by the time the lengthy process concludes, especially if you are mapping unstructured data.

This takes us to the third challenge privacy professionals have faced: preserving data inventory and continuously actualizing it. With terabytes and terabytes of data, keeping an up-to-date map has so far proven the greatest challenge of all.

Mine’s new capabilities address these challenges.

Continuous discovery of data sources

Mine uses advanced machine learning to scan multiple data sources and identify any that may potentially carry and handle sensitive data. To give our users complete flexibility in how they go about coverage, the MineOS platform allows you to integrate through SSO, email, and/or integrations.

Mine lets you define the appropriate method for scanning data sources. Our engine then creates an inventory using machine learning to identify the types of data they very likely hold. For example, if your company uses Stripe, it will almost certainly contain financial information like credit card numbers.

The more sources you connect to Mine, the more comprehensive a picture of data usage the platform can give you. By mixing and matching how you scan various data sources within your company, you can reap the benefits of each method while still getting an unprecedented 90-95% coverage of your data mapping.

Continuous data classification

Typically, data privacy platforms perform a full scan of your systems, which takes weeks, if not months, to complete. As this doesn’t capture a data map in real-time and consumes many, many resources, it is a largely inefficient way of scanning data.

Unlike competitors, Mine has found a way around this inefficiency by providing three approaches you can use to find and classify data depending on your organization’s current needs and objectives:

1) Full data scan

2) Smart data sampling

3) Context-based analysis

MinePricacyOps Data Classification: An option for each strategy

Full data scan

If you had infinite time and resources, full data scans would be the best option. Considering that an average company uses hundreds of different systems and SaaS platforms, you can only imagine how much time and effort it would take to scan all of them at once.

A full scan is thorough, going through all your data sources to build a nearly complete picture of your organization's data privacy landscape. However, as noted above, it takes months to do, is not guaranteed to capture shadow data sources, and will be an all-consuming effort for your development and DevOps teams to cover and include all existing data sources and terabytes of stored data.

A full scan is best reserved for individual, high-use SaaS systems.

Context-based analysis

Mine’s context-based analysis uses AI that accurately predicts what type of data you likely store and process through any data source. This approach balances a full data scan and smart data sampling, often providing upwards of 80% coverage by itself, all while taking minutes to complete.

Our machine learning and datasets identify common patterns of SaaS services and the data types within them. This analysis is based on a system’s capabilities and features, legal and technical documents, and custom adjustments to customer needs. Payment software will contain financial information, HR software will contain sensitive employee data, etc.

Smart data sampling

Smart data sampling is a revolutionary approach we’ve taken to get the most out of our privacy scanning engines and machine learning with minimal expenses for companies. Instead of undertaking the inefficiencies of running a full scan, our smart data sampling uses the Pareto method to sample a minimally sufficient amount of data to identify types of sensitive information.

Such an approach uses the agnostic method based on different analytics data. It allows your privacy team to quickly identify the most sensitive systems and implement the right controls towards data privacy challenges.

With our machine learning, we build context around regulated data to provide companies with a full-fledged picture of all systems containing data. We have made it possible to identify different data pieces even with similar values and adequately classify them when evaluating the context around these values, which radically improves data identification and segmentation of large datasets.

Because you can choose which of the three methods you use to scan any data source, your organization can tailor its privacy program to maximize resources and time without diluting your data mapping and coverage. You can reach near 100% coverage, the same as with a full scan, spending a fraction of the time and cost full scans typically take.

Exploring your inventory

Mine’s Data Classification is continuous, meaning your privacy team will always have an updated data inventory, including the different systems and SaaS services your company uses. To provide the breakdown privacy professionals need, our classification also includes the number of employees with access to a system, applied frameworks (PII, PCI, GDPR, and others), data types handled by the system, and its risk factor.

People-focused, actionable insights

After viewing the entire inventory of data source discovery and continuous data classification, the MineOS platform allows you to assess systems against the business impact on your company in breakdowns with the following criteria:

· Usage vs. employees

· Employees vs. data sensitivity

· Employees vs. cyber posture

· Cyber posture vs. data sensitivity

To give you actionable insights, Mine displays a color-coded matrix with your systems distributed according to risk and usage so that privacy teams can identify the most sensitive systems and the overall next steps depending on their current needs and priorities.

In addition to risk business impact assessments, Mine also provides risk evaluations. You can easily identify any potentially vulnerable systems and cut them off or secure access to them.

Making Mine Yours

Most data privacy solutions require complex integrations across an entire organization’s infrastructure, resulting in extensive spending and resource consumption. Recognizing that predicament, we have strived to make those professionals’ lives easier by quickly putting the best data privacy tools in their hands.

With Mineos’ user-friendly no-code solution, DPOs and legal teams can kick off their data privacy programs and handle data privacy-related tasks without needing frequent assistance from their organization’s IT and engineering teams.

If you can tell how passionate we are about bringing data privacy into the new age and want to see how we’re enabling companies to advance data privacy rights, book a demo and let us show you directly how comprehensive and user-friendly the platform is.

Data Privacy Hub