Friday 8 September 2017

Top Two Concerns of Big Data Hadoop Implementation

In general, data can be classified into three categories. Any data which can be stored in databases can be called as Structured data. For example, transaction records of online purchase can be stored in databases. Hence, it can be called as Structured data. Some data can be partially stored in databases which can be called as Semi-Structured data. For example, the data on the XML records can be partially stored in databases and it can be called as Semi Structured Data.

The other forms of data which will not fit into these two categories are called as Unstructured Data. To name a few, data from social media sites, web logs cannot be stored analysed and processed in databases, therefore it is categorised as Unstructured Data. The other term used for Unstructured Data is Big Data.

According to NASSCOM, Structured Data accounts for 10% of the total data that exists today in the Internet. It accounts for 10% of semi-structured data and the remaining 80% of data comes under Unstructured Data. In general, organizations use analysis of Structured and Semi Structured Data using traditional data analytics tools. There was no sophisticated tools available to analyse the Unstructured Data till the Map Reduce framework which was developed by Google. Later, Apache developed a framework called "Hadoop" which analyses all these Data and reveals information which will be of great help for business to take better decisions.

Hadoop has already proved its importance in several areas. For example, according to NASSCOM, many organizations have started using Big Data analytics. National Oceanic and Atmosphere Administration (NOAA), National Aeronautics and Space Administration (NASA) and several pharmaceutical and energy companies have started using big data analytics extensively to predict their customer behaviour.

According to a recent research from Nemertes group, organizations perceive value in Big Data analytics and planning to have a better leverage in reaping the benefits of Big Data Analytics. The New York Times is using Big Data tools for text analysis, and Walt Disney Company use them to correlate and understand customer behaviour in all of its stores and theme parks. Indian IT companies such as TCS, Wipro, Infosys and other key players have also started to reap the immense potential which Big Data continues to offer.

This clearly shows that Big Data is an emerging area and many companies have started to explore new opportunities. Meanwhile, usage Big Data is proving to be worthwhile but at the same time it may also be noted that privacy and data protection concerns have also risen.

The concern about Big Data analytics is very much valid from the viewpoint of privacy. Let me give a very simple example. Nowadays I am very much sure that most of us use Social media such as Face book, Twitter and many other social forums and most of us watch videos on YouTube. Imagine these websites using Big Data Analytical tools to identify your activity on the Internet, to analyse data, your search behaviour and the content you have watched in social media. Through Big Data your activity on the Social Media Forum can be clearly identified. This is a blatant violation of your privacy. Further, just imagine the organization is sharing the data from the analysis to a few marketing agencies, this in turn creates more privacy issues.


No comments:

Post a Comment