What Should Be Keeping You Up At Night: Where is Big Data Stored?

The digital universe is expected to double in size every two years with machine-generated data experiencing a 50x faster growth rate than traditional business data. Big data has a lifecycle which includes:

  • Raw data
  • Collection
  • Filtering and classification
  • Data analysis
  • Storing
  • Sharing & publishing
  • Security
  • Retrieval, reuse, & discovery

However, viewing security as an isolated stage in the lifecycle can be misleading since the storing, sharing, publishing, retrieval, reuse, and discovery are all involved with security.

96% of organizations are estimated to use cloud computing in one way or another. Cloud computing is a distributed architecture model that can centralize several remote resources on a scalable platform. Since cloud computing offers data storage in mass, it is critical to think about security as it relates to storage. With storage, the primary security risks are caused by both the location to store the data and volume of the data. 

Even if the data is stored in the cloud, it can be challenging to understand if those cloud vendors are storing all the data. Companies must ask not just about costs when selecting cloud vendors but where their data is stored — understanding where data is stored is fundamental to several other security and privacy-related issues. Reasons to understand where data is stored could be as simple as mitigating risks caused by geographic weather concerns. For example, if a hurricane hits Florida in a place where your data is stored, do you know if it has been backed up to a safe location? Also, how is the data protected in the data center from not just weather events, but intruders and cybercrime? Compliance regulations like General Data Protection Regulation (GDPR) make the company responsible for the security its data, even if that data is outsourced to the cloud. Before a company can really answer questions like who has access to data and whom did the company send data to, understanding where the data is stored is critical. This situation becomes more relevant in the event of a breach, which is likely a discussion of when the breach occurs not if the breach will occur.

Data verification is also essential to ensure the data is accurate. In terms of verifying the actual data stored in the cloud, it is not as easy as just downloading the entire data set to see if it has been stored with integrity in the cloud because of cost and local bandwidth. There have been some query authentication methods that have addressed issues of correctness, completeness, and freshness over the years. Basically, a set of data values can be authenticated by a binary tree, verification is done on the data values based on the hash value of the root of the tree, and authenticity is done by the customer in iteratively computing all the hashes up the tree and checking if the hash has been computed for the root in a way that matches the authentically published value. Creating automated processes has helped with data verification efforts, but the algorithms continue to evolve to support faster and larger-scale verification for different version data.

Security enforcement has been increasing with new global regulations. There has been enforcement for companies in this space, including in July 2019, when Marriott was fined EUR 100 million for failure to implement appropriate information security protocols resulting in a breach of 339 million customer records. Also, in the same month this year, British Airways was fined EUR 183 million for failure to implement appropriate information security protocols that resulted in a breach of 500,000 customer records. While some of these fines may be a drop in the bucket for larger companies, smaller companies may be just taking a gamble on not investing in the needed systems because they do not think their organizations are high profile enough to be enforced. However, as data fines continue to increase – now is the time to re-evaluate cyber defense positioning. Regardless of company size, all organizations can at least start a regular dialogue about understanding where their data is stored. 

#Cyber #Security #BigData #DataStorage #Cloud