Implementing Database Internet Connectivity

The Internet is fundamentally designed to move information from one place to another. For an organization wishing to implement database connectivity, there are several different considerations.

One option for developing a program that sends and receives information to databases online includes ADO.Net which is a set of libraries included in the Microsoft .NET framework that can connect to a data source and modify data stored in a relational database.

Advantages of ADO.Net

  • Fast performance
  • Ease of use
  • Cross database compatibility
  • Optimized SQL provider
  • XML support and reliance
  • Ability to work with data offline
  • Supports multiple views
  • Lots of features
  • Rich object model
  • Features bridge gap between traditional data access and XML development
  • Supported by many compilers

Disadvantages of ADO.Net

  • Managed-only access
  • Customization is complex
  • No direct calls via ADO.Net or stored procedures to a database
  • Slow interest connection results in load time delays
  • Steep learning curve

Another relatively common option for implementing database Internet connectivity includes the use of Fusion Middleware.  Fusion Middleware (FMW) consists of software products from Oracle that are intended to simplify integration.  Just like a good plumbing system, Fusion Middleware is the fixture and pipes that allows information to pass through.

Advantages of FMW

  • Flow of real-time information access within and among systems
  • Streamlined processes result in improved efficiency
  • Can be used in many software systems
  • Ability to maintain integrity of information across many systems

Disadvantages of FMW

  • High development costs
  • Limited skilled workers with FMW experience
  • Potential to jeopardize real time performance
  • Limited benchmarking for performance
  • Some platforms not covered by Middleware

Choosing the right data architecture is critical to leveraging an organization’s data assets.  When adopting an approach, it depends on the situation what kind of application is needed.  Thus, defining the need is the first step.  Compared to the current state, how are the company’s needs being met or not being met?  It is essential to talk to diverse stakeholders to ensure the needs of the people using the system are represented.

The next step can include narrowing down the options which can be done in a variety of ways from reviewing these advantages and disadvantages to selecting your product by leveraging a Quality Function Deployment (QFD) matrix that is customized to your particular needs.

Once the list is narrowed, vendors can demo the product.  Good vendor communication is also something to pay attention to as most road bumps can be resolved with thoughtful communication.

A sound process will help successful data integrations which in turn can help organizations maintain a competitive edge in the digital world.

#DataArchitecture #InternetConnectivity #ComputerScience

 

 

Implications of Brewer’s Cap Theorem

This isn’t about my favorite baseball cap representing my team which is the Colorado Rockies. This is about CAP Theorem which is a really important topic in the field of computer science that was proposed in 2000.

New advances in computer science and mobile communications have provided users with the opportunity to better access information and services regardless of their physical location. Mobile users can now query and update databases virtually. Understanding the CAP Theorem is vital to understand key tradeoffs that need to be made in the design and implementation.

The challenges of distributed systems, especially as it relates to scaling up and down, are described by Brewer’s CAP Theorem. The CAP Theorem stands for consistency, availability, and partition tolerance. Consistency means that only factual information is stored. Availability implies there will always be a response by the system and there will never be an error or timeout regardless of load or network failures. Partition tolerance suggests the network is not 100% guaranteed.

As society has moved from mainframes to distributed servers, there is a problem where some servers are up but the network connecting them is not. If both sides are doing the wrong thing, the CAP theorem helps the practitioner know what can be changed and what cannot. Also, when the system is designed, the intent and use of database need to be thought through to ensure the right tradeoffs are made.

Figure 1.1 shows a diagram of the CAP Theorem where only two of the three conditions can be achieved in a given moment. For example, if a delay is okay, consistency might be able to be sacrificed. Or, for instance, if there is a sales processing system, availability and partition tolerance may not be the right trade-off. Some systems guarantee strong consistency and provide best effort availability, other systems guarantee availability and provide best effort consistency while other sacrifice both consistency and availability.

An example of consistency and availability includes standard databases like SQL Server, MySQL, Oracle, and PostgreSQL. An example of availability and partition tolerance includes databases like Cassandra, CouchDB, and DynamoDB. An example of partition tolerance and consistency includes databases like MongoDB, HBase, Memcache, and Redis.

The goal of understanding CAP Theorem includes understanding how to maximize consistency and the availability requirements. The professional can leverage CAP Theorem to create innovative strategies for partitions and recovery solutions.

#CapTheorem #ComputerScience #TradeOffs

 

 

Disaster Recovery and System Wide Failure (2PC vs 3PC)

Two-Phase Commit (2PC) protocol and Three-Phase Commit (3PC) protocol are two most popular algorithms of managing how to commit or abort distributed transactions in Distributed Database Management System (DDBMS).

Two-phase commit (2PC) enables databases to be returned to a former state if an error condition occurs.  It helps databases remain synchronized.  A coordinator is required and has the role of trying to determine consensus among a set of processes in two phases.  In terms of a sequence, first the coordinator contacts all the processes and suggests a value and solicits their response.  After getting the responses, the coordinator makes a decision to commit if all processes agreed upon the value or abort if there is a disagreement.  In the second phase, the coordinator then contacts all the processes again and communicates the commit or abort decision.

In a three-phase commit (3PC) protocol, all the nodes in a distributed system agree to commit to a transaction.  Unlike two-phase commit, the three-phase commit is non-blocking.  The phases include preparing to commit and then if the coordinate receives a yes from all processes during the prepare to commit phase then it asks for all the processes to commit.

In terminating a distributed transaction, since the two-phase commit is a blocking protocol, the system can get stuck.  It can get stuck because the system cannot resolve the transaction.  If the cohort sends an agreement message to the coordinator it holds the resources associated with consensus until it receives the commit or abort message of the coordinator.  The failure of the coordinator then prevents the cohorts from recovering from failure.

On the other hand, the three-phase commit protocol eliminates this blocking problem.  If a message times out, for example, other processes can unanimously agree that the operation was aborted.  The pre-commit phase helps the recovery when a process failure or both coordinator and process node failure during the commit phase occur.  In the event of a system wide power off failure, two-phase commit protocol might not recover data to the initial state when in a blocking state.  With a three-phase commit, the model is able to prevent blocking as crashes can be detected accurately.  One limitation though, for example, is that this protocol will not function with network partitions or asynchronous communication.  It is also important in this situation to do a system-wide backup as part of your disaster recovery plan.

In conclusion, 3PC is a better protocol for both terminating a distributed transactions and recovering from a system wide power off failure.

#3PC #ComputerScience #DisasterRecovery #SystemFailure

 

Conservative Concurrency Control vs. Optimistic Concurrency Control

Abstract future technology concept background, vector illustration

Companies are collecting, storing and analyzing data more than ever before in human history.  It is estimated that 90% of all the data in the world was been created in the past few years.  The data within a database needs to be managed effectively for it to provide optimal value to the company and customers. Database Manage Systems helps that data management process.

A transaction is just a group of tasks which represents the minimum processing unit.  The transaction throughput is just the number of transactions that can be performed in a given period.  To put multiple transactions together, there are problems that can arise because of concurrency.

Concurrency control is the process of managing simultaneous execution of transactions in a shared database.  The purpose of concurrency control is to enforce isolation, preserve database consistency and to resolve read-write and write-write conflicts.  Concurrency control is essential to ensure atomicity, isolation, and serializability of concurrent transactions.

If it is insisted that only one transaction can execute at a time (that is, in serial order), then performance can be poor.  At the highest level, concurrency control is a method for scheduling or controlling the operations of transactions in such a way that they can be executed safely.  For transactions to be executed safely, they need not to cause the database to reach an inconsistent state.

For the conservative concurrency control, the pros include that all transactions can be executed correctly, the data is appropriately consistent, and the database is relatively stable and reliable.  The cons include that transactions can be slow, run time can be longer, and throughput can be decreased.

For optimistic concurrency control, the strengths include transactions that are executed efficiently, relatively safe data content and potentially higher throughput.  The cons include the risk of data inference, that there could be hidden errors, and that transactions could deadlock.

If concurrency is adequately controlled, then it is possible to maximize transaction throughput while avoiding the change of corrupting the database.

In a world where over 2.5 billion gigabytes are generated every day, effectively managing the data is essential to a company’s performance.  These data management concepts can help the information technology professional actively implement Database Manage Systems.

#Concurrency #DMS #ComputerScience

 

 

The Impact of Distributed Data Management Systems in Healthcare

Medicine doctor hand working with modern computer interface as medical concept

A distributed database management system (DDBMS) used when there are large datasets.  It is a centralized application that manages a distributed database like it is stored all on the same computer.  The logically interrelated databases are used to make the distribution of data transparent to users.  On the other hand, a centralized database system is a database that keeps data all in one single database at one single location.  There are pros and cons to DDBMS versus a centralized database system as it relates to system architecture, system functions and suitable applications.

Many companies have multiple locations that may benefit from DDBMS.  For example, consider a company, like HCA with hospitals located across the country.  Each state may have a different database that holds medical records, appointment history, etc.  The management at each hospital can query that data but the corporate office can also perform queries across the country.  Also, as new hospitals are added to the system, those hospitals can be added to the network without messing up the operations of other sites.

Another benefit of DDBMS includes that users can access data stored at other sites.  For example, use the example of a hospital that is trying to understand the number of falls occurring across the system.  In this instant, that data maybe needs to be queried often from the Chief Medical Officer that then can have the data placed at the site near her potentially enhancing the speed of the database access.

Also, if there is a DDBMS failure at one of the hospital sites, for example, that does not make the entire system breakdown.  DDBMS, unlike a centralized DBMS can function even with these local failures.

However, unlike a centralized DBMS, there is more complexity with a DDBMS in the sense that it hides from the user the distributed nature and allows data replication which if unmanaged can create challenges in reliability and performance.

Another serious risk in the hospital example with DDBMS is security.  In the world of HIPPA where data breaches can be very expensive and brand damaging, it is harder to control security when it is not a centralized approach.

Also, with the field of evidence based medicine advancing (algorithms that determine best case scenarios for treatment plans given the inputs), DDBMS may prove more challenging in execution.

Do the pros outweigh the cons? What has been your experience?

#BigData #DataAnalytics #DDBMS

 

Artificial Intelligence & Discrimination

In this rapidly changing digital world, artificial intelligence helps machines take on more complex responsibility on a regular basis.  In Singapore, what started as a program to prevent terrorism now is applied to immigration policy, the property market and even school curricula.  At a basic level, algorithms collect data about users, and then that can determine access points to many pieces of our civil society.

One of the challenges that comes with this opportunity is understanding cognitive bias to make sure that we are not programming machines to mirror discrimination.  There is growing evidence that artificial intelligence applications threaten to discriminate against legally protected groups.  An example includes in 2015 when it was discovered that Google’s photo application that applies automatic labels to pictures was classifying images of black people as gorillas.  Data rules are fed by specific images and how the programming happens matters if our world is going to be based on those selections. In theory, society is envisioned supported by the advancements of artificial intelligence, not in a world where a history of discrimination is mirrored.  For example, what does this mean for robots that are armed to replace police work or private individuals that want security robots?  Computer scientists at Carnegie Mellon University found that women are less likely than men to be shown ads on Google for high pay jobs.  Combining examples like this with how few women go into the field of computer science, and it is evident that there could be a problem if these issues are not addressed.

Part of the root cause has to do with that programming can be done with good intention, but if diversity is not represented upfront, then there can be unintentional biases downstream. Also, flawed algorithms are not immediately discoverable, and companies have little to no self-interest in making this area more transparent.

Potential solutions include having more diversity upfront in how we program that machines and potentially more public policy to drive transparency and accountability.  Ethical perceptions need to be taught in computer programming classes, as these values are fundamental to minimizing discrimination.  Specifically, with unbiased machine learning being the subject of a lot of research, there is the opportunity for teachers to offer a consensus view of discrimination.  Also under consideration is combining both anti-discrimination laws, data protection law and algorithmic fairness could support future design as it relates to artificial intelligence.  As seen from a glass half full standpoint, artificial intelligence offers an unprecedented opportunity to build inclusive practices in the process of companies.  However, there is a way to capitalize on the benefits and mitigate the risks.

 

Data Privacy

The 2018 Cambridge Analytica case forced a worldwide discussion on whether or not data privacy is a human right.  In that instance, 50 million Facebook profiles were used for Cambridge Analytica’s major data breach.

Companies that are trusted by consumers are abusing that trust every day by sharing third-party information.  Many of today’s phone applications come with real-time tracking data, and that information is being capitalized on for profit. The European Union successfully passed legislation to transform EU data privacy law to now include a range of individual rights designed to protect consumers whose personal information is collected, processed and stored by companies.  This past month with the European General Data Protection Regulation, organizations now risk losing 20 million Euros or 4 percent of annual revenue, whichever happens, to be greater.  The tides are changing, or maybe more appropriately, catching up to the rapidly changing digital environment.

Another case study that demonstrates the issues in the data privacy space is how biomedical health data is handled.  Due to electronic medical records, more health data is being saved than ever before.  How different tables merge to create new insights can sometimes contain sensitive information that then is sold to third-parties.  The data sold to the third-parties is sometimes thought to be not identifiable, but if multiple datasets are received, sometimes primary key can be developed and trace data back to the particular individual.  However, even though in 2014, the United States granted individuals a right to access their lab and genomic data, there is not a clear legal framework or ethical and accountable guidelines for the use of the data.

Regarding the cause of this problem, some of it may have to do with the rapidly advancing digital world and specialization of knowledge.  With the rapid pace of technological development from wearables to STEM cell research, the public policy has just not caught up, yet.  Also, in each area of computer science, there is specialized knowledge which makes it more difficult for key stakeholders to know what questions to ask and what controls to put in place to safeguard citizens.

Citizens, providers, and administrators need to be better aware of privacy issues and have the proper guidance on how to manage them across different parts of the delivery system.  Solutions could include future legislation, similar to the theme of the recent advancements in Europe that recognizes data privacy as a human right and has consequences for those that they violate the policy.  Also, more education for the public will likely help the momentum to balance the scales in the future.  Finally, clearer guidelines regarding the roles and responsibilities as it relates to third-party data would also be a useful start.

 

 

 

Data Governance

The lack of effective information security governance in the digital age is a growing problem.  Generally, corporate governance can be defined by the strategy, policies and processes for controlling a business.

Board of Directors set the strategy, the budget, the tolerance of risk an organization, in addition to ensuring the company’s prosperity. From 2013 to 2014, information security breaches increased by 64%.  In 2016 alone, 1.1 billion identities were stolen.  Experts predict by 2020; cybercrime damages will cost the world $6 trillion.  With increased pressure from the public, those serving in governing roles are now under more heightened scrutiny than ever before.  Despite numerous peer-reviewed research findings that have demonstrated how essential high-level support is to information security, many governing bodies still do not to have the necessary knowledge to effectively govern.  As the world becomes increasingly technologically dependent, figuring out the gap between data theory and practice in the boardroom will be critical.

Board of Directors still does not understand the relationship between strategic alignment, leadership and information security governance effectiveness in United States-based corporations.  While leadership is a well-examined topic in organization literature, its application in information technology governance, has not been studied extensively.  Information may be one of the most critical resources in a business, but unless the fundamental components influencing effective security governance are better understood, companies may be missing the advantages and multiplying the risks of their information assets.  Having a robust information security governance framework is essential for companies with any data management type system in place. With the shift towards protecting information as a valuable asset, there is increased interest in how to implement and oversee effective information governance.

There many case studies on how data breaches have reached the Board of Directors with shareholder claims against Directors on the rise.  For example, the Target data breach in 2013 affected 70 million customers.  Shareholders alleged that Target’s Board of Directors breached their fiduciary duties by not adequately overseeing the information security program, and not providing customers with prompt and accurate information on the breach.  After investigation for 21 months, the plaintiffs dismissed the case.  However, the entire process raised the eyebrows of the public as it relates to the responsibility of the Board of Directors as it relates to an organizations data privacy.  The data breach was estimated to cost Target $148 million.  The Target data breach is seen as the beginning of increased scrutiny of cybersecurity practice.  As consequences get more serious, the boardroom will be forced to determine the gap between data theory and practice better.

In the future, more policy and well-defined roles and responsibilities may contribute to changes in actual practice on how Board members are selected to maximize governance effectiveness.  Having the right people governing in the boardroom can influence critical budgetary discussions and not only support timely risk identification by also alignment with the value that an effective security governance system can delivery.  Boardroom’s would benefit from the realization of their goals that should include having safer and more reliable data, respecting evolving data privacy trends, being able to better access data, enhance ability to share and collaborate on data and reduced costs by having more effective risk management.