Close

The Cost of Dark Data

avatar
Published on January 2, 2024 by

Andrew Sweeney

The amount of data we are producing is rising at a dramatic rate. Statista predicts that by 2025, global data creation will grow to more than 180 zettabytes. Considering it reached a ‘new high’ in 2020 at 64.2 zettabytes, in just five years that number will have nearly tripled. Data is being generated so rapidly that last year, two new units of measurement were announced for the first time in over 30 years. The ronna refers to numbers with 27 zeros and the quetta to those with 30 zeros. And by 2030, the annual amount of data produced is expected to reach a yottabyte!

Much of the data stored by companies is known as dark data, defined by Gartner as the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships, and direct monetizing).

Storing Dark Data Beyond its Use-By Date

We all know how easy it is to store data longer than we need to, and unless we are vigilant, we are likely storing duplicate data on our own devices. So, if this is the case for us, how much data will enterprises store in 5-10 years and what will it cost them?

Dark data could refer to outdated employee or customer information collected for marketing, or other purposes and can include personal information such as names and addresses, financial details, and more. Of course, there are legal requirements around data storage. For example, healthcare data is regulated under the Health Insurance Portability and Accountability (HIPPA) Act and financial data is regulated by Sarbanes Oxley and both specify the length of time certain files must be stored. But data, much like foodstuff, has a use-by date, and if it isn’t brought into the light, storing dark data too long can be risky for enterprises.

The Cost of Dark Data

In December last year, Seagate released a report that showed how cloud storage as a service is expanding in the UK to address the rising cost of data management, finding that British businesses were spending on average £213,000 per year on data storage and management and were prioritizing this, with employee welfare and training suffering as a result. Previously it was estimated that Netflix was spending $9.6 million per month through AWS but last year it was reported the company was making attempts to control cloud costs.

It’s not only vital workforce services being put at risk if dark data is left unchecked. It’s also regulatory compliance (e.g., ESG) and data security. In 2020 Veritas Technologies estimated that 6.4m tons of CO2 would be unnecessarily pumped into the atmosphere over the year as a result of the power needed to store dark data. The company suggests that on average more than half (52%) of all data stored by organizations worldwide was ‘dark’. Given the predictions for data creation growth that’s going to have a big impact on ESG figures going forward.

As data becomes an even more valuable commodity, companies become a bigger target for cyber criminals, phishing, and other methods of social engineering are increasingly employed to access and exploit data. Any breach can be damaging to brand reputation and the more data stored, the greater the risk. A report released last year - The Dark Side of Data - revealed that 7 out of 10 enterprise leaders polled said data storage presents more risk than value but a worryingly low percentage of respondents were aware of those risks.

Regulatory bodies also state when data must be disposed of. Under GDPR regulations, for example, data on European residents cannot be held for longer than it is needed. Where it’s stored and how it’s protected is also governed. Penalties for non-compliance can be high and this the Irish Data Protection Commission imposed a 1.2 billion Euro fine on Meta following transfer of European users’ data to the US without adequate safeguards.

Fines can be costly, but so can the damage done to a company’s reputation following a data breach, or they are found to be non-compliant during an investigation or audit. Another issue may arise from the way data is generated and whether it’s been done without the knowledge of those who it applies to, for example, when it is gained through security cameras or Industrial Internet of Things (IIoT) devices.

Tackling dark data

You know it’s out there. You know the risks. So, how do you shed light on dark data and make it actionable?

  • Audit your environment leveraging your point solutions and systems of record to identify files with lowest or no usage that may not have been accessed in years as well as those items that have no connections with other endpoints.
  • Identify owners and communicate with them to understand if the data is still relevant or required.
  • Clean-up data no longer required.

Next, define a plan to manage dark data moving forward:

  • Work with teams to understand data that must be stored and for how long, for what purpose, and how it is generated. Ensure regulatory requirements for data storage are observed.
  • Define a plan for tagging data that incorporates the above information, including how it should be disposed of. Tagging will in turn make it easier for teams to search for and use the data they store for what it was originally intended.
  • Educate users on the risks of dark data and how tagging should be managed for data storage (as defined above) as well as where it must be stored, explaining regulatory requirements and the risks associated with non-compliance.

Of course, this can be a huge undertaking. The audit alone will take time. And, if the data is old, the data owners may have left the company; it will take time to find or assign new owners.

Nearly half of the respondents surveyed for the Dark Side of Data’ report previously mentioned, said they didn’t have the right technology to protect their data and 83% said they’d choose new technology tools over adding more team members. But it’s clear that it needs to be the right technology – a tool that won’t create further complexity or add to your data silos – and that describes a digital platform conductor (DPC).

A DPC connects to all your point solutions and systems of record and gets them working together, removing blind spots so you can make faster, better-informed business decisions about dark data. And, leveraging cross tool automation, you can use a DPC to automate many of the workflows required to clean dark data and continuously manage it.

Using a DPC you can:

  • Easily view information data details, including last viewed dates, usage, owner information – or corresponding business unit details.
  • Trigger automated communications to owners or associated teams to understand if the data is still required and take steps to clear outdated information.
  • Incorporate regulatory and legal requirements for storing data to ensure these are implemented in policies going forward and define a way to tag different data types to maintain compliance.
  • Automate user communications to educate on compliance, storage and how to tag data so that it can be better managed moving forward.
  • Use automated processes in conjunction with tagging to clean up data when it has reached a pre-defined expiration date, with additional triggers to automate approval for this to happen.

ReadyWorks is a DPC. Book a demo with ReadyWorks to understand how it can help you shed light on dark data and ensure it is deleted before it can create risk for your enterprise.