APIs in a COVID-19 World

Application program interfaces (APIs) are growing exponentially in the COVID-19 world. In a rare collaboration, Apple and Google are now working together to deliver their exposure notification API to developers that are working on apps for public health. This API is expected to be released in mid-May and tested over the next several weeks. The initiative uses Bluetooth to track exposures to confirmed COVID-19 cases, and smartphones can show who infected people may have been in contact with based on information stored on their phones. Specific information related to patients’ identities or locations is not shared with Google or Apple.

No alt text provided for this image

Radar is another company that is currently working with retailers to help them connect their apps to location services via APIs. With COVID-19, there is a need for curbside pickup for many restaurants and retailers. Having insight into customers’ locations is needed to deliver products on time as well as meet other demands around changing customer expectations.

No alt text provided for this image

APIs are cruical today to offering innovative products and services. They enable businesses to access a new database or technology. For example, Radar has 3 key APIs. There is an API to identify the distance between the store and the shopper to find the estimated time of arrival. Then they use a second API to connect to search, which allows the consumer to open the store’s website and see all relevant locations nearby so the trips can be combined. Finally, Radar has a third API that does geocoding that takes the longitude and latitude measurements and converts it into an address. Sometimes it is helpful to think of these mobile app solutions in terms of a different series of screens that help customers find the most relevant pages and help stores display content to improve the customer experience.  

While people are used to computer screens, the magic relates to the interface that is hidden to the user that has become much more open and standardized over the years. API provides the standard interface through which software programs can communicate, share messages, and manage shared memory. Unlike a web form where there might be multiple transactions for processing user registration, an API will often include all the information needed to complete a transaction. For example, if you are thinking about searching all your records for how many orders a particular customer has placed, you could go through your business’s records and slowly scan the “customer name” data field and print each record. But, if the records are uploaded to a central database, now you can write a program that accesses that database and just finds all the instances of the customer’s name (without scanning), which takes less time and is more accurate. 

No alt text provided for this image

While some think of APIs as error-free, this is not true. APIs are not like a USB port where you have access to everything in another program. Some have compared APIs to getting help from a person in a help desk in a foreign country. API provides data that programmers have made available to outside users, and you have to know the language to ask the right questions to do anything with the data. When programmers make data available, the expose endpoints or parts of the language they used to build the program so other programmers can get that data through URLs or other special programs that build URLs.

No alt text provided for this image

Monitoring API performance is important in this COVID-19 world to make sure that APIs performance are functional, accessible, and do not suffer from things like downtime or excessive loading times. Also, when IoS has a new update, there a changes that an API has to get ahead of to ensure future customer experiences stay consistent.

APIs are useful for pulling specific information from another program, and developers can help with building programs to display that data in an application. Consultants can help in building existing APIs and also creating custom connections. However, having some basic understanding of how APIs work will be key to great integrations in the post-pandemic world.

#COVID-19 #APIs #API #DigitalTransformation

Cybersecurity 101: Ten KPI’s to Monitor

It’s no surprise that attackers are using more sophisticated techniques to target systems from personal devices to all sizes of businesses. Deloitte estimated that a low-end cyber-attack costing $34/month could return $25K while larger attacks costing a few thousand dollars could return as much as $1 million per month. IBM estimates the average cost of a data breach to a business as $3.86 million. To mitigate the harm caused by data breaches, you need to know what to monitor and why. Here are ten cybersecurity monitoring suggestions to protect networks, devices, and programs and information from attack, damage, and unauthorized access. 

Security Incidents and Impact

The number of reported incidents should be measured to stay aware of cyberattacks. Not every incident leads to a costly data breach. The first step is to calculate the threat percentage of major and minor incidents. Then the average cost of an incident can be determined. Once the numbers of how cybersecurity and data breaches are impacting the company are known, the big picture of annual loss expectancy can be discussed.

Annual Loss Expectancy = (Number of Incidents per Year) x (Potential Loss per Incident)

The annual loss expectancy can change, though, as data breaches and the costs of cleaning up a data breach rise, which requires adjustments to the calculation. Third-party tools can be helpful in detecting and monitoring all applications to see trends in incidents.

Number Security of Incidents

The number of both small and major security incidents is important to measure to remain informed of exploitation and set key performance indicator priorities. If you have the number of security incidents, it is possible to focus on the incidents that have the most significant financial impact on the company. Some hackers are targeting areas for a catastrophic loss. For example, the WannaCry attack is predicted by experts to have created $4 billion in damages, and hospitals were shut down during recovery. While the smaller incidents may not be catastrophic, the alert team should detect and disarm these threats before damage is done. Minor incidents typically include things like suspicious emails and activity on the server from hackers that may try to take down your website.

Time to Resolve an Incident

Time to resolve an incident is essential to measure to learn how the cyber team is performing and measure business impact. Time is money, and cybersecurity is no exception. A log should be kept documenting the time that the breach was first noticed until the final report. Third-party vendor tools can support the logging and interpretation of this time. Downtime can hurt a business from loss of sales to customer confidence. Server logs and hosting providers can help identify data, and traffic issues can provide insights into how much potential damage was caused by the hack. Both the mean-time-to-identify and the mean-time-to-respond should be measured as poor performance in these areas can be a contributor to breach costs. For US companies in 2017, the mean-time-to-contain was 208 days, and the mean-time-to-identify was 52 days.

Number of Systems with Known Vulnerabilities

Knowing the number of assets that have vulnerabilities helps determine the risk that the business could incur. While managing updates and patches can sometimes be complicated, it is vital to avoid loopholes that could be used by hackers. A vulnerability scan should be performed that includes all the assets that indicate what can be done to improve the security of the business.

Invalid Log-In Attempts

It is important to check system logs from time to time to see if anyone has tried to access your computer. It is good to have a system that monitors every attempt to login and tracks, whether it is successful or otherwise. It is good to monitor failed and locked on logon attempts for the entire domain. Software like ADAudit Plus, Netwrix Account Lockout Examiner, and Security Onion can help with these goals. Software like UserLock allows users granulated access restrictions by specific areas like workstation, device, IP, and range. They also can limit for concurrent sessions, enforce user logon times, use real-time user access monitoring, create alerts and rapid response to inappropriate login behavior, remove disconnection from sessions left open as well as report and audit all access events.

Number of Users with “Super User” Access

Employees should have an access level to company resources that are necessary for their work. Identifying the access levels of network users allows them to be adjusted as needed by blocking any super users that have access, but it is not required to perform their job.

Number of Communication Ports Open During a Period of Time

Generally speaking, it is standard to avoid allowing inbound traffic for NetBIOS. Also, businesses should be observant of outbound SSL since a session that stays active for an extended time could be an SSL VPN tunnel that allows bi-directional traffic. Any common ports for protocols that would enable remote sessions should be monitored for the length of time.

Frequency of Access to Critical Enterprise Systems by Third Parties

Managers may grant access to third parties on particular activities. It is critical to monitor whether the access is canceled at the end of the provided service. If this is not measured, there is a chance that the third party returns to extract data or carry out other hacks. And, if the third party’s network is hacked, it exposes the network to the same threat.

Percentage of Business Partners with Effective Cybersecurity Policies

Companies that provide services to your business cannot be overlooked. Providing access to environments to outsourced companies can post a risk if there are not effective cybersecurity policies in place. Your security practice is as strong as the third parties that are connected to your system.

Meeting Regulatory Requirements

It is essential to measure this because there are national regulatory requirements as it relates to cybersecurity incidents. If the business is naïve to understanding current regulations and requirements, it does not relieve the firm of liability and can result in fines as well as reputation costs. States like New York, for example, require financial service companies to hire a CISO responsible for risk mitigation. Data breaches also have requirements that are time-bound for businesses.

 Key performance indicators (KPIs) can help a company keep objectives at the forefront of decision making. This overview provided ten suggestions for measuring KPI’s that can help in mitigating risks by measuring your performance against your cybersecurity goals.

#Cybersecurity #Monitoring  #KPIs

About the Author

Shannon Block is an entrepreneur, mother and proud member of the global community. Her educational background includes a B.S. in Physics and B.S. in Applied Mathematics from George Washington University, M.S. in Physics from Tufts University and she is currently completing her Doctorate in Computer Science. She has been the CEO of both for-profit and non-profit organizations. Currently as Executive Director of Skillful Colorado, Shannon and her team are working to bring a future of skills to the future of work. With more than a decade of leadership experience, Shannon is a pragmatic and collaborative leader, adept at bringing people together to solve complex problems. She approaches issues holistically, helps her team think strategically about solutions and fosters a strong network of partners with a shared interest in finding scalable solutions. Prior to Skillful, Shannon served as CEO of the Denver Zoo, Rocky Mountain Cancer Centers, and World Forward Foundation. She is deeply engaged in the Colorado community and has served on multiple boards including the International Women’s Forum, the Regional Executive Committee of the Young Presidents’ Organization, Children’s Hospital Quality and Safety Board, Women’s Forum of Colorado, and the Colorado-based Presbyterian/St. Luke’s Community Advisory Council. Follow her on Twitter @ShannonBlock or connect with her on LinkedIn.

Visit www.ShannonBlock.org for more on technology tools and trends.

Security Assessment versus Security Audit?

If you are a member of the Board and the topic of a cybersecurity audit comes up, it is important to define what it is and what it is not. Audits are often used to evaluate the effect of policies. While sometimes security audits and assessments are referred to interchangeably, they really are not the same thing.

An assessment is an evaluation that seeks information to better understand a specific situation (people, process, and technology) and make informed decisions as it relates to that specific situation. An audit, on the other hand, typically involves verifying the system against a holistic standard that results in a pass or fail outcome.

An audit often contains different assessments, with a combination of conceptual and technical reviews. A security audit might include conducting physical, access control, and vulnerability assessments. But, a security audit will also likely include evaluating design controls and processes, standard operating procedures, disaster recovery plans, as well as several other components.

Audits can be costly and, depending on the scope, may only provide broader insights into an organization’s cyber health. For example, cyber health could be defined in terms of the presence of controls. But, if what is really needed is to evaluate the effectiveness of those controls in mitigating risk? If the effectiveness of a control is desired, then asking more questions around specific assessments might be warranted. For example, the auditor may determine that the organization checks the box because a firewall is in place on company devices. But, if the firewall is not properly configured, then the firewall might not even work.

A formal audit is typically performed by an external third-party vendor that has no conflict-of-interest. It is not uncommon for larger companies to have internal audit teams running assessments throughout the year to protect the company, as well as better prepare for the external audit. A security audit is often a more systematic evaluation of the organization’s information system compared to an established set of criteria. Processes like ISO27000 provide important frameworks and details that have influenced both assessments and audits.

When trying to determine a company’s cybersecurity posture, there are a variety of different assurance actions that can be taken. Cybersecurity audits and assessments are helpful tools in assuring that policies have been applied and that there are enforceable controls in place to ensure the correct application of policy across the organization.

#Cybersecurity #VendorManagement #CyberAudits #CyberAssessments

Standards and Procedures: What Are They and Why Are They Important?

Standards are documents that share the specifications and procedures designs for the company to support the reliability of materials, products, methods, and services. The standards can help service functionality and compatibility and facilitate interoperability. Procedures are the guidelines that help in the job. They came to be a plan of action, like a roadmap for the team. Procedures can help in implementing policy, for example, an onboarding policy that has concrete steps and recommended activities. And, it can help managers not be micromanagers, so the manager does not have to worry so much about the working process and instead has fixed guidelines so that they can focus on higher-level activities.

Standard operating procedures can be helpful as they are instructions that help guide employee work processes. They are based on input from the employees who do the job, and when they are followed, it is possible to produce a service that is consistent and predictable. Several benefits include readiness for future growth, simplifying performance management, controlling the quality and consistency of the service, protecting the organization from knowledge loss, and saving on training costs.

A policy sets the high-level of why something needs to be done, formally establishing requirements to guide decisions. It is typically a statement of expectation and is enforced by the standards and further implemented by procedures. Policies and procedures into account external points of reference like industry benchmark and other requirements that have been set from the industry.

Policy statements include the expected behavior, actions, and outcome that applies to departments and people affected by the policy. It typically consists of the reason for the policy, how the policy will be enforced, and include who is responsible for answer questions on the policy.

As you can see, there is a difference between standards and procedures.  Each has their place and fills a specific need.

#Standards #Procedures

Scalable and Intelligent Security Analytics: Splunk, Devo, IBM and McAfee

Organizations of any size can be victims of a cyber attack. Small and medium-sized organizations can be tempting for attackers because they may have fewer obstacles for attackers. On the other hand, large employers face challenges in strategically thinking through structures around security governance and dimensions of monitoring. Security analytics tools can help address common problems, but I have found that solutions vary depending on what you are trying to do. Many companies are subject to industry regulations such as the Health Insurance Portability and Accountability Act, Payment Card Industry Data Security Standard and Sarbanes-Oxley Act creating compliance requirements. Security analytics tools can help address compliance requirements but also mitigate risk of data breaches and security attacks.

Below are some of the pros and cons of tools like Splunk, Devo, IBM, and McAfee, as well as their primary functions like anomaly detection capability, event correlation capability, and real-time analytics capability. Also, given the explosion of cloud computing, I considered each tool relative to the cloud computing environment. The security issues or targeted applications that the tool seeks to solve were also explored, as well as the critical design considerations for the scalability of each tool.

Splunk offers a security intelligence platform that supports many security data sources like network security, endpoint solutions, malware, and payload analysis, network data, asset management systems, and threat intelligence. Splunk has a security operation suite that works with real-time security monitoring, advanced threat detection, fraud analysis, as well as incident management. Their analytics-driven SIEM solution is focused on visibility, context, and efficiency, having a modern and flexible big data platform, as well as using machine learning to perform behavioral analytics. Splunk’s enterprise security solution is customizable with drill-down capability. Also, the Splunkbase app store has over 600 apps that can be leveraged with Splunk’s security products.

In terms of advantages, Splunk provides holistic solutions that can grow with users over time. Also, Spunk offers a broad array of partner integration services, in addition to many applications. In terms of disadvantages, Gartner has expressed some concerns around their licensing model and expensive costs associated with implementation. Also, for businesses that want the on-premises appliance, they have to engage with a third-party provider. Another drawback of Splunk includes that their advanced threat detection solutions were not ranked as high when compared with other top products in the marketplace.

Devo offers a next-gen SIEM solution that has a central hub for data and processes within the security operations center. They offer a cloud capability where SIEM is the cloud-native and flexible deployment models to help companies streamline security operations as they shift to the cloud. Devo’s capability is deployed through a scalable, extensible data analytics platform that can handle petabyte-scale data growth and real-time data analytics. Their solution offers a holistic insight relative to scalable attack surfaces, which help organizations mitigate overwhelming amount of security alerts while providing relevant context to prioritize investigation. The main features of their next-gen SIEM platform includes:    

  • Behavioral analytics – using behavioral analytics as the foundation for detection moves away from rules-based detection of better prioritize high impact threats with context to support intervening actions
  • Community collaboration – solutions focus on the relationship with peers and providers, sharing of proprietary intelligence
  • Analytics insight – SIEM can learn from analyst behavior to help automate investigations and enhance decision making with continual learning
  • Orchestration & automation – SIEM enables a rapid threat response by integrating automated, manual and repetitive processes to improve incident response workforce

Devo’s solutions involve applications in detecting and hunting high impact threats in real-time, triaging and investigating high confidence alerts, increasing signal with rich behavioral analytics, while enhancing speed through actionable insight. The Devo architecture parallelizes the data pipeline allowing growth without decreased performance.

The advantages of the Devo Data Operations Platform include performance, scalability, accessibility, security, and cost-efficiency in a full-stack, multi-tenant platform. Their solutions offer the ability to find unusual behavior in real-time. A drawback of their solution is the lack of a comprehensive free trial and limited reviews as it relates to their newer products.

IBM’s silent security strategy has a QRadar platform can be deployed as a virtual appliance, SaaS infrastructure as a service, or as a traditional appliance. They also provide a hybrid option where the Saas Solution is hosted in the IBM Cloud, which includes remote monitoring from the managed security service operations centers. They offer a variety of user and entity behavior analytics functionality that is based in machine learning analytics.

Their silent security model allows organizations to silently understand which people have access to data, detect insider threats with behavioral analytics, enforce the principle of least privilege and protect data with multi-factor authentication. Their solution is focused on the seamless user experience with single sign-on, seamless authentication, and leverages design thinking techniques to create solutions targeted to the user. Moreover, their application helps companies get ahead of issues related to compliance and regulation, delegate and simplify access recertification for LOBs, map roles to business activities and manage user data for GDPR and secure transactions for PSD2. Their Silent Security product line helps companies secure their business, enable digital transformation, prove compliance and provide security for business assets. An extension of the IBM QRadar Security Intelligence Platform is the QRadar Behavior Analytics runs on machine learning algorithms to help detect threats. It also includes a dashboard that indicates risky users by name with unusual activities by looking at QRadar associated incidents that differ from their peers or have invalid sequences of operations.

IBM Security Guardium, a complimentary feature, provides end-to-end data security and compliance solutions. This feature includes around-the-clock data activity monitoring, data protection design, and configuring and customizing data policy settings. IBM has a security secret server that is used for protecting and auditing privileged account access and authentication secrets across the business. Also, IBM’s Cloud Identity and Security Access Manager can assess high-risk activities while also providing robust authentication features. IBM Managed Identity Services then help with handling user access and diagnosing root causes in IAM programs. IBM’s security solutions work across the security lifecycle for both onsite and cloud applications.

In terms of advantages, IBM’s QRadar program is a fit for medium and large businesses looking for core SIEM functionality or those that want a unified platform to manage several security solutions. In terms of disadvantages, according to Gartner, some IBM clients have turned to third-party solutions instead of IBM’s solutions. Also, QRadar’s UBA functionality can lag behind some of the other vendors. Another drawback includes that IBM Resilient incident response tool does not have native integration within the QRadar platform. Also, automation can only be accessed on IBM’s Incident Response Platform, and some threat-hunting capabilities only are available at premium pricing.

McAfee offers integrated tools for a variety of security needs. The McAfee Enterprise Security Manager provides a security framework that includes monitoring and threat defense features. Their solutions are built to streamline operations and synchronize device data loss prevention within the cloud that can be used with any cloud service. The McAfee MVISION Cloud service protects data while stopping threats in the cloud across SaaS, PaaS, and IaaS from a single, cloud-native enforcement point.

Main features include helping organizations meet security and compliance requirements when transferring information technology environments to the cloud while extending data loss prevention, threat protection, and application security across public, hybrid, and private cloud environments or software-defined data center environments. Another key feature of McAfee includes reviewing security responsibilities related to protecting user access, data, and network traffic. Their McAfee MVISION Cloud solution helps with enforcing data loss prevention policies in the cloud, preventing unauthorized sharing of sensitive data, blocking sync downloads, detecting compromised situations, encrypting cloud data, and auditing for misconfiguration. Their Cloud Security Maturity Dashboard includes a Cloud Security Report, Cloud Security Maturity Scores, and Quadrant and Cloud Security Recommendations.

In terms of advantages of their solutions, McAfee provides proper central management, the GUI is user-friendly, it supports both MAC and Linux operating systems, it has a large user community and deployment, and administration is fairly straightforward. Also, McAfee’s solution has been recognized for their successful machine learning algorithms in preventing attacks. In terms of disadvantages, McAfee can sometimes require additional software, updates come from third-party applications, and the solution takes up CPU utilization and memory. Also, some customers have commented that when the system is scanning it can hang on the screen effecting the use of other operations. Additionally, there is some noted concern from customers about the costs as it relates to requirements.

Overall, security analytics tools are essential in gathering, filtering, and integrating diverse security event data to holistically view the security of a company’s infrastructure. The security analytics market is changing fast with the merger of vendors, addition of new capabilities, and deployment of solutions in the cloud. While security analytics tools have a variety of capabilities, hopefully this post provided some initial insight on some of the popular products. While there is not single taxonomy for security analytics, most requirements included things like basic security analytics, significant enterprise use cases, focus on advanced persistent threats and forensics, as well as a variety of security tools and services.

Streaming Data Solutions: Flink versus Spark

While real-time stream processing has been around for a while, businesses are now trying to quickly process larger volumes of streaming data. Streaming data is everywhere from Twitter, sensors, stock ticker prices, and weather. Streaming data comes in continuously, which poses challenges in processing streaming data.

Flink was initially written in Java and Scala and exposes many Application Programming Interfaces (APIs), including the DataStream API. Flink was developed by a German University and became an incubator project for Apache in 2014.

Similar, but different, Spark Streaming is one of the most used libraries in Apache Spark. Spark developers create streaming applications using DataFrames or Dataset API’s, which are available in programming languages like Java, Python, and R. The product is essentially an extension of the core Spark API.

Similarities

Both Flink and Spark are big data systems that are fault-tolerant and built to scale data. While both Flink and Spark are in-memory databases and have ability to write data to permanent storage, the goal is to keep it in memory for current usage. Both products enable programmers to use MapReduce functions and apply machine learning algorithms with streaming data. That is, both Flink and Spark are good with machine learning in processing large training and testing datasets across a distributed architecture. Also, both technologies can work with Kafka (LinkedIn’s streaming product), as well as Storm topologies.

Differences

Flink was made to be a streaming product, whereas Spark added the steaming product onto an existing service line. Spark was initially built on static data, but Flink can process batch operations by stopping the streaming. With Spark, the stream data was initially divided into micro-batches that repeat in a continuous loop. This means that with the batch program, the file needs to be opened, processed, and then closed. However, in 2018, with Spark 2.3, Spark was able to start to move away from the previous “micro-batch” approach. In contrast, Flink has, for some time, been breaking streaming data into finite sets at a checkpoint, which can be an advantage in terms of speed in running algorithms.

Flink’s Performance

Flink can be customized to have optimal performance. Specifically, code logic changes and configuration are relevant to performance. For example, event time or processing time can be considered as it relates to performance effectiveness.

Flink breaks time into “processing time” generated at each machine in a cluster and “time making” at the entry point machine in a cluster. The time generated at the entry point machine in a cluster is also known as the “ingestion time” since it is generated at the time of an event. Several scholars recommend using event time because the event time is constant, which means operations can generate deterministic results regardless of throughput. On the other hand, the processing time is the time observed by the machine. Using this lens, the operations based on processing time are not deterministic. In practice, while events are thought of as real-time, there is the assumption that the clocks at event sources are synchronized, which is rarely the case. As such, this challenges the assumption that the event time is monotonically increasing, which means the allowed lateness solves the dropped events problem, but the large lateness value can still have a significant effect on performance. Without setting the lateness, events can then be dropped due to incorrect timestamps.

Regardless of what approach is chosen, the key for efficient processing time is making sure the logic can handle events in the same event time window being split into smaller processing time windows. Researchers have also shown some performance efficiencies can be achieved by not breaking up complex events, but the tradeoff is the operators have to go through the dimensions in each event, and the event object is larger.

Spark’s Performance

In terms of Spark, identified bottlenecks include the network and disk I/O. CPI can also be a bottleneck but is not as common. Resolving the CPU is estimated to improve the completion of job time by 1-2%. Some of the challenges in managing Spark performance include that tasks can create bottlenecks on a variety of resources and different times. Also, concurrent tasks on a machine may compete for resources. Additionally, memory conditions can be a common issue since Spark’s traditional architecture is memory-centric. The causes of these performance setbacks often involve high concurrency, inefficient queries, and incorrect configurations. These issues can be mitigated with an understanding of both Spark and the data, realizing that Spark’s default configuration may not be the best to optimize performance. 

Final Thoughts

The importance of solutions like Flink and Spark is about allowing businesses to make important decisions based on what is currently happening. No one framework solves all the problems, so it becomes a situation of the best fit. Understanding the system and resources can help in addressing performance bottlenecks. There are many stream processing applications, and it is essential to pick a framework that best meets the business’ needs, as not all products are the same. Flink and Spark are two of the popular open stream processing frameworks. Depending on the application, parameters need to be set correctly to meet performance goals. It is essential to understand the tradeoffs involved to get the best performance relative to business needs.

#Spark #Flink #Performance #StreamingData #BigData

What Should Be Keeping You Up At Night: Where is Big Data Stored?

The digital universe is expected to double in size every two years with machine-generated data experiencing a 50x faster growth rate than traditional business data. Big data has a lifecycle which includes:

  • Raw data
  • Collection
  • Filtering and classification
  • Data analysis
  • Storing
  • Sharing & publishing
  • Security
  • Retrieval, reuse, & discovery

However, viewing security as an isolated stage in the lifecycle can be misleading since the storing, sharing, publishing, retrieval, reuse, and discovery are all involved with security.

96% of organizations are estimated to use cloud computing in one way or another. Cloud computing is a distributed architecture model that can centralize several remote resources on a scalable platform. Since cloud computing offers data storage in mass, it is critical to think about security as it relates to storage. With storage, the primary security risks are caused by both the location to store the data and volume of the data. 

Even if the data is stored in the cloud, it can be challenging to understand if those cloud vendors are storing all the data. Companies must ask not just about costs when selecting cloud vendors but where their data is stored — understanding where data is stored is fundamental to several other security and privacy-related issues. Reasons to understand where data is stored could be as simple as mitigating risks caused by geographic weather concerns. For example, if a hurricane hits Florida in a place where your data is stored, do you know if it has been backed up to a safe location? Also, how is the data protected in the data center from not just weather events, but intruders and cybercrime? Compliance regulations like General Data Protection Regulation (GDPR) make the company responsible for the security its data, even if that data is outsourced to the cloud. Before a company can really answer questions like who has access to data and whom did the company send data to, understanding where the data is stored is critical. This situation becomes more relevant in the event of a breach, which is likely a discussion of when the breach occurs not if the breach will occur.

Data verification is also essential to ensure the data is accurate. In terms of verifying the actual data stored in the cloud, it is not as easy as just downloading the entire data set to see if it has been stored with integrity in the cloud because of cost and local bandwidth. There have been some query authentication methods that have addressed issues of correctness, completeness, and freshness over the years. Basically, a set of data values can be authenticated by a binary tree, verification is done on the data values based on the hash value of the root of the tree, and authenticity is done by the customer in iteratively computing all the hashes up the tree and checking if the hash has been computed for the root in a way that matches the authentically published value. Creating automated processes has helped with data verification efforts, but the algorithms continue to evolve to support faster and larger-scale verification for different version data.

Security enforcement has been increasing with new global regulations. There has been enforcement for companies in this space, including in July 2019, when Marriott was fined EUR 100 million for failure to implement appropriate information security protocols resulting in a breach of 339 million customer records. Also, in the same month this year, British Airways was fined EUR 183 million for failure to implement appropriate information security protocols that resulted in a breach of 500,000 customer records. While some of these fines may be a drop in the bucket for larger companies, smaller companies may be just taking a gamble on not investing in the needed systems because they do not think their organizations are high profile enough to be enforced. However, as data fines continue to increase – now is the time to re-evaluate cyber defense positioning. Regardless of company size, all organizations can at least start a regular dialogue about understanding where their data is stored. 

#Cyber #Security #BigData #DataStorage #Cloud