Conservative Concurrency Control vs. Optimistic Concurrency Control

Abstract future technology concept background, vector illustration

Companies are collecting, storing and analyzing data more than ever before in human history.  It is estimated that 90% of all the data in the world was been created in the past few years.  The data within a database needs to be managed effectively for it to provide optimal value to the company and customers. Database Manage Systems helps that data management process.

A transaction is just a group of tasks which represents the minimum processing unit.  The transaction throughput is just the number of transactions that can be performed in a given period.  To put multiple transactions together, there are problems that can arise because of concurrency.

Concurrency control is the process of managing simultaneous execution of transactions in a shared database.  The purpose of concurrency control is to enforce isolation, preserve database consistency and to resolve read-write and write-write conflicts.  Concurrency control is essential to ensure atomicity, isolation, and serializability of concurrent transactions.

If it is insisted that only one transaction can execute at a time (that is, in serial order), then performance can be poor.  At the highest level, concurrency control is a method for scheduling or controlling the operations of transactions in such a way that they can be executed safely.  For transactions to be executed safely, they need not to cause the database to reach an inconsistent state.

For the conservative concurrency control, the pros include that all transactions can be executed correctly, the data is appropriately consistent, and the database is relatively stable and reliable.  The cons include that transactions can be slow, run time can be longer, and throughput can be decreased.

For optimistic concurrency control, the strengths include transactions that are executed efficiently, relatively safe data content and potentially higher throughput.  The cons include the risk of data inference, that there could be hidden errors, and that transactions could deadlock.

If concurrency is adequately controlled, then it is possible to maximize transaction throughput while avoiding the change of corrupting the database.

In a world where over 2.5 billion gigabytes are generated every day, effectively managing the data is essential to a company’s performance.  These data management concepts can help the information technology professional actively implement Database Manage Systems.

#Concurrency #DMS #ComputerScience

 

 

The Impact of Distributed Data Management Systems in Healthcare

Medicine doctor hand working with modern computer interface as medical concept

A distributed database management system (DDBMS) used when there are large datasets.  It is a centralized application that manages a distributed database like it is stored all on the same computer.  The logically interrelated databases are used to make the distribution of data transparent to users.  On the other hand, a centralized database system is a database that keeps data all in one single database at one single location.  There are pros and cons to DDBMS versus a centralized database system as it relates to system architecture, system functions and suitable applications.

Many companies have multiple locations that may benefit from DDBMS.  For example, consider a company, like HCA with hospitals located across the country.  Each state may have a different database that holds medical records, appointment history, etc.  The management at each hospital can query that data but the corporate office can also perform queries across the country.  Also, as new hospitals are added to the system, those hospitals can be added to the network without messing up the operations of other sites.

Another benefit of DDBMS includes that users can access data stored at other sites.  For example, use the example of a hospital that is trying to understand the number of falls occurring across the system.  In this instant, that data maybe needs to be queried often from the Chief Medical Officer that then can have the data placed at the site near her potentially enhancing the speed of the database access.

Also, if there is a DDBMS failure at one of the hospital sites, for example, that does not make the entire system breakdown.  DDBMS, unlike a centralized DBMS can function even with these local failures.

However, unlike a centralized DBMS, there is more complexity with a DDBMS in the sense that it hides from the user the distributed nature and allows data replication which if unmanaged can create challenges in reliability and performance.

Another serious risk in the hospital example with DDBMS is security.  In the world of HIPPA where data breaches can be very expensive and brand damaging, it is harder to control security when it is not a centralized approach.

Also, with the field of evidence based medicine advancing (algorithms that determine best case scenarios for treatment plans given the inputs), DDBMS may prove more challenging in execution.

Do the pros outweigh the cons? What has been your experience?

#BigData #DataAnalytics #DDBMS

 

Artificial Intelligence & Discrimination

In this rapidly changing digital world, artificial intelligence helps machines take on more complex responsibility on a regular basis.  In Singapore, what started as a program to prevent terrorism now is applied to immigration policy, the property market and even school curricula.  At a basic level, algorithms collect data about users, and then that can determine access points to many pieces of our civil society.

One of the challenges that comes with this opportunity is understanding cognitive bias to make sure that we are not programming machines to mirror discrimination.  There is growing evidence that artificial intelligence applications threaten to discriminate against legally protected groups.  An example includes in 2015 when it was discovered that Google’s photo application that applies automatic labels to pictures was classifying images of black people as gorillas.  Data rules are fed by specific images and how the programming happens matters if our world is going to be based on those selections. In theory, society is envisioned supported by the advancements of artificial intelligence, not in a world where a history of discrimination is mirrored.  For example, what does this mean for robots that are armed to replace police work or private individuals that want security robots?  Computer scientists at Carnegie Mellon University found that women are less likely than men to be shown ads on Google for high pay jobs.  Combining examples like this with how few women go into the field of computer science, and it is evident that there could be a problem if these issues are not addressed.

Part of the root cause has to do with that programming can be done with good intention, but if diversity is not represented upfront, then there can be unintentional biases downstream. Also, flawed algorithms are not immediately discoverable, and companies have little to no self-interest in making this area more transparent.

Potential solutions include having more diversity upfront in how we program that machines and potentially more public policy to drive transparency and accountability.  Ethical perceptions need to be taught in computer programming classes, as these values are fundamental to minimizing discrimination.  Specifically, with unbiased machine learning being the subject of a lot of research, there is the opportunity for teachers to offer a consensus view of discrimination.  Also under consideration is combining both anti-discrimination laws, data protection law and algorithmic fairness could support future design as it relates to artificial intelligence.  As seen from a glass half full standpoint, artificial intelligence offers an unprecedented opportunity to build inclusive practices in the process of companies.  However, there is a way to capitalize on the benefits and mitigate the risks.

 

Data Privacy

The 2018 Cambridge Analytica case forced a worldwide discussion on whether or not data privacy is a human right.  In that instance, 50 million Facebook profiles were used for Cambridge Analytica’s major data breach.

Companies that are trusted by consumers are abusing that trust every day by sharing third-party information.  Many of today’s phone applications come with real-time tracking data, and that information is being capitalized on for profit. The European Union successfully passed legislation to transform EU data privacy law to now include a range of individual rights designed to protect consumers whose personal information is collected, processed and stored by companies.  This past month with the European General Data Protection Regulation, organizations now risk losing 20 million Euros or 4 percent of annual revenue, whichever happens, to be greater.  The tides are changing, or maybe more appropriately, catching up to the rapidly changing digital environment.

Another case study that demonstrates the issues in the data privacy space is how biomedical health data is handled.  Due to electronic medical records, more health data is being saved than ever before.  How different tables merge to create new insights can sometimes contain sensitive information that then is sold to third-parties.  The data sold to the third-parties is sometimes thought to be not identifiable, but if multiple datasets are received, sometimes primary key can be developed and trace data back to the particular individual.  However, even though in 2014, the United States granted individuals a right to access their lab and genomic data, there is not a clear legal framework or ethical and accountable guidelines for the use of the data.

Regarding the cause of this problem, some of it may have to do with the rapidly advancing digital world and specialization of knowledge.  With the rapid pace of technological development from wearables to STEM cell research, the public policy has just not caught up, yet.  Also, in each area of computer science, there is specialized knowledge which makes it more difficult for key stakeholders to know what questions to ask and what controls to put in place to safeguard citizens.

Citizens, providers, and administrators need to be better aware of privacy issues and have the proper guidance on how to manage them across different parts of the delivery system.  Solutions could include future legislation, similar to the theme of the recent advancements in Europe that recognizes data privacy as a human right and has consequences for those that they violate the policy.  Also, more education for the public will likely help the momentum to balance the scales in the future.  Finally, clearer guidelines regarding the roles and responsibilities as it relates to third-party data would also be a useful start.

 

 

 

Data Governance

The lack of effective information security governance in the digital age is a growing problem.  Generally, corporate governance can be defined by the strategy, policies and processes for controlling a business.

Board of Directors set the strategy, the budget, the tolerance of risk an organization, in addition to ensuring the company’s prosperity. From 2013 to 2014, information security breaches increased by 64%.  In 2016 alone, 1.1 billion identities were stolen.  Experts predict by 2020; cybercrime damages will cost the world $6 trillion.  With increased pressure from the public, those serving in governing roles are now under more heightened scrutiny than ever before.  Despite numerous peer-reviewed research findings that have demonstrated how essential high-level support is to information security, many governing bodies still do not to have the necessary knowledge to effectively govern.  As the world becomes increasingly technologically dependent, figuring out the gap between data theory and practice in the boardroom will be critical.

Board of Directors still does not understand the relationship between strategic alignment, leadership and information security governance effectiveness in United States-based corporations.  While leadership is a well-examined topic in organization literature, its application in information technology governance, has not been studied extensively.  Information may be one of the most critical resources in a business, but unless the fundamental components influencing effective security governance are better understood, companies may be missing the advantages and multiplying the risks of their information assets.  Having a robust information security governance framework is essential for companies with any data management type system in place. With the shift towards protecting information as a valuable asset, there is increased interest in how to implement and oversee effective information governance.

There many case studies on how data breaches have reached the Board of Directors with shareholder claims against Directors on the rise.  For example, the Target data breach in 2013 affected 70 million customers.  Shareholders alleged that Target’s Board of Directors breached their fiduciary duties by not adequately overseeing the information security program, and not providing customers with prompt and accurate information on the breach.  After investigation for 21 months, the plaintiffs dismissed the case.  However, the entire process raised the eyebrows of the public as it relates to the responsibility of the Board of Directors as it relates to an organizations data privacy.  The data breach was estimated to cost Target $148 million.  The Target data breach is seen as the beginning of increased scrutiny of cybersecurity practice.  As consequences get more serious, the boardroom will be forced to determine the gap between data theory and practice better.

In the future, more policy and well-defined roles and responsibilities may contribute to changes in actual practice on how Board members are selected to maximize governance effectiveness.  Having the right people governing in the boardroom can influence critical budgetary discussions and not only support timely risk identification by also alignment with the value that an effective security governance system can delivery.  Boardroom’s would benefit from the realization of their goals that should include having safer and more reliable data, respecting evolving data privacy trends, being able to better access data, enhance ability to share and collaborate on data and reduced costs by having more effective risk management.

Understanding Human Experience

How can we better understand the world around us?  The hard sciences help us tremendously.  But, the field of qualitative research also can provide useful insights into the human experience, even from a tech standpoint.

Here I will briefly describe three approaches to be familiar with:  phenomenology, grounded theory, and ethnography.

Phenomenology

Phenomenology is a method of investigating or inquiring into the means of our experiences and reflecting, make sense of or theorizing on lived experience.  In David Eagleman’s book SUM, he describes the terms epoche and reduction.  Epoche is about trying to enter a state of openness to the experience is are trying to understand in its pre-reflective sense.  And, the reduction is about one there is openness, to try to close in on the mean of the phenomenon as it appears in the experience or consciousness.  Therefore, the purpose of the research is to arrive at phenomenal insights that contribute to thoughtfulness by using the methods of epoche and reduction.  Generally speaking, meaning questions are asked, and the underlying framework is rooted in continental philosophy.  The data includes gathering direct descriptions of experience as lived through a particular moment in time.

Grounded Theory

Glaser and Strauss’ The Discovery of Grounded Theory in 1967 publication started the discussion and has been revised over time due to research publications.  The theory involves identification and integration of categories of meaning from data. As a method, it helps with guidelines around categories, and as a theory, it is the end product which provides an explanatory framework to understand the phenomenon.  The most recognized versions of grounded theory include three main versions: the classical (Glaserian) version, Strauss and Corbin’s structured approach and Charmaz’s constructivist approach.

Ethnography

Ethnography was made popular by anthropologist Bronislaw Malinowski in the early 20th century and has since adapted into a staple especially for sociological research.  This is a method of studying experiences through perceptions and opinions, often incorporating culture or “way of life” when understanding the participant’s point of view.  Interviews and observations are common data collection techniques.  The goal of the search to develop a deep understanding of how and why people think, behave and interact as they do in a community or organization and understand this from the standpoint of the participant thereby providing insight to social life, perception, and values that shape cultural meanings and practices.

Common Indexing Issues

As a new programmer you may have a lot of questions.

A more common initial question is should you index all the columns?

A database index is important for efficiently retrieving data.  To speed up data retrieval optimally, the correct indexes need to be defined for the tables.  For example, an index can speed up the query when putting in a filter in the WHERE clause.  Missing indexes in large databases can make queries take longer.

While indexes are widely used to implement query optimization, the reason there is not typically an index on every column in a table is because they do take up RAM and drive (in most cases the space constraint will be minimal).  Also, as each index is updated for every piece of data that is updated, this can slow down inserts and updates.  It can be inefficient to index columns that will never be utilized, especially if the system is at the point of surrendering drive space and risking the performance.  The risk of additional locking is something to seriously consider.

In conclusion, the most important piece to pay attention to are the queries that are frequently used by the indexes.  However, with the pace of technology advancements, it would not be a surprise if some of these constraints were eliminated in the future.  In the meantime, a query of the missing indexes can be performed and evaluated to consider optimization.

 

 

 

 

 

 

Database Manage Systems

In today’s digital world, companies are collecting, storing and analyzing data more than ever before in human history.  The data within a database needs to be managed effectively in order for it to provide optimal value to the company.    Database Manage Systems helps the data management process.

Fundamentally, a transaction is just a group of tasks which represents the minimum processing unit.  To put multiple transactions together, there are problems that can arise because of concurrency.  The transaction throughput is just the number of transactions that can be performed in a given time period.

Concurrency control is then the process of managing simultaneous execution of transactions in a shared database.  The purpose of concurrency control is to enforce isolation, preserve database consistency and to resolve read-write and write-write conflicts.  Concurrency control is important to ensure atomicity, isolation, and serializability of concurrent transactions.

If it is insisted that only one transaction can execute at a time (that is, in serial order), then performance can be poor.  At the highest level, concurrency control is a method for scheduling or controlling the operations of transactions in such a way that they can be executed safety.  For transactions to be executed safely, they need to not cause the database to reach an inconsistent state.

For the conservative concurrency control, the pros include that all transactions can be executed correctly, the data is properly consistent and the database is relatively stable and reliable.  The cons include that transactions can be slow, run time can be longer and throughput can be decreased.

For optimistic concurrency control, the strengths include transactions that are executed efficiently, relatively safe data content and potentially higher throughput.  The cons include the risk of data inference, that there could be hidden errors, and that transactions could deadlock.

If concurrency is controlled properly, then it is possible to maximize transaction throughput while avoiding the change of corrupting the database.

A company’s performance can be dependent on how effectively the data is managed.  Understanding these components help companies have effective Database Manage Systems.

Burning Man, Nodes and Robots

Burning man is a concert of craziness.

Participants fill the Playa as approximately 70,000 people from all over the world gather for the 30th annual Burning Man arts and music festival in the Black Rock Desert of Nevada.

But, there is a machine behind the madness. When you purchase your ticket online at https://burningman.org/ it is possible because of distributed mutual exclusion.

An example of distributed mutual exclusion would include when a ticket is placed by a music goer on a website.  The ticket is sent to a group of distributed nodes for processing.  The group is just a bunch of nodes that act together for the same purpose.  One node grabs the ticket order and dedicates to handling it.  Each node gains exclusive access to find out if it has been taken by another note, decides whether to take it and mark that it is taken, if applicable.  All the nodes agree which node holds the example, by music goer.  The node can request and release the lock.

This is just to say that computer science nerds make your online ticket to burning man possible.

The theme this year is robots, so check it out and get your nerd on.

What’s The Soft Stuff?

What’s the soft stuff?

And no, I am not talking about the softest animal fur in the world which happens to be a chinchilla.  I have touched them and they are soft.  Everybody else has touched them too and I’m pretty sure they are tired of being pet.  But, let’s not digress.

I’m talking about qualitative research.  I studied physics.  The hard stuff.  The math stuff.

But, when I think about the most amazing thing I learned in physics – it was quantum mechanics.

And, quantum mechanics is cool because it is the first time the researcher was actually in the equation. In the double slit experiment, it all mattered if you looked.  Is the cat dead or alive?

So, coming from a hard science background – jumping into research and development in the computer sciences – I thought I would be doing hard research all the time.

But, this field of qualitative research is interesting.

I thought for all my tech geeks out there, I would spend a second on some of basic concepts of phenomenology, grounded theory design, and ethnographic design.

Phenomenology helps the researcher discover previously unnoticed issues which can foster new insights into the meaning of phenomena.  The beauty of this design is that it allows for research to better understand the possibilities that are already embedded in the experience of phenomena.  This design values the experience as a whole as is evident in the way the research is approached.

Grounded theory design, which was founded in 1967, can include research where individuals share culturally oriented understandings of the world, where understandings are shaped by similar values and beliefs and these determine how individuals behave according to how they interpret their existence.  Much of the focus in grounded theory is around symbolic meanings that can become uncovered by observing people’s interactions, actions and consequences.

Grounded theory appears to be a design often used in qualitative research.  While the data can come from a variety of sources, it often uses interviews and observations to shed light on questions.

Ethnographic design incorporates deep personal experience, not just observation, where people’s behavior is studied in everyday contexts.  This design uses things like quotations, descriptions and charts and diagrams to help tell a story.  This design method can result in new constructs or paradigms.

In the field of information security governance for example, a grounded theory could be applied with interview conducted at the executive level in organizations based in Colorado.  These interviews could be coded and analyzed to develop a new theory on the board of director’s perception of risk and how that informs the creation and implementation of information security strategy.  By design, the research would investigate how the board of directors perceive information security and how that perception influences the development of information security strategy in Colorado-based companies.

This approach would meet the criteria for the grounded theory design as it is not testing a hypothesis but would be attempting to discover the research situation as it is.  The design would be to discover the theory implicit in the data.

The success of the project would be measured by its products of being published, etc.  The specific procedures and canons used in the research study would provide attention measures of evaluation.  Ultimately, the readers of the dissertation will provide an understanding of the value that is contributed.  The adequacy of the research process and empirical findings should further support work.  Also, it will be important to identify the limitations in using this design model so other researchers both know what is possible and any limitations on the findings.

Even though this stuff felt a little soft at first, thinking back to my quantum mechanic days, the researcher is in the equation, so to me, it’s just a shift in thinking along the same lines as research design techniques evolve.

What has been your experience with qualitative research?  Thumbs up?  Thumbs down?