!!!You can have data without information, but you cannot have information without data.!!!

8 min readSep 17, 2020

Big Data~Big Problems

In this article, I am going to discuss about problem of Data, means Big data it’s types and the best way to understand the characterstics of big data analytics is through the concept of four V’s. Big data general statics on life and bussiness tools of Big data and it’s distribution.
Big data is at the foundation of all of the mega trends that are happening today, from social to mobile to the cloud to gaming. You can have data without information, but you cannot have information without data. We chose it because we deal with huge amounts of data. Besides, it sounds really cool.

What is Data?

Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. It is that comes in a volume and format that makes it accessible, informative and actionable for us.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Types of BigData

1)Structured Data

It is the defined size of data which is precise and highly efficient. This is the most systematic data model because here any data can be stockpiled, obtained, organized, recouped and maneuvered in any way. This type of data resides in relational database and helps in easy storage.

Example: Data warehouses, Enterprise systems, Databases

2)Unstructured Data

It is the type of data that cannot be well ordered and customarily does not have any structured row-column configuration. Big data software tools like Hadoop can undertake the activity to organize and manage such disassembled data that are extremely convoluted, acutely huge and change rapidly.

Example: Text documents, Audio/video streams, log files

3)Semi-Structured Data

It is a self-describing data where the data format is implied and deducible. In this kind of structure, not necessarily all the acquired statistics may be similar and the schema can differ within a single database and over a period of time it can fluctuate imperiously.

Example: HTML, XML, RDF

Characteristics of Big Data

The truth is, big data is not just big, but complex. The best way to understand big data analytics is through the concept of four V’s:

Velocity of Data

Speed at which the data is originated.
Processing and analysing of the streaming data.

Example: Improved connectivity

Variety of Data

Different forms of data.
Heterogeneous & noisy data

Example: Structured Data, Unstructured Data, Semi-structured Data

Veracity of Data

Incoming data from unreliable resources
Inaccuracy of the data

Example: Costing, Source availability issues

Value of Data

Scientifically related data
Elongated studies

Example: Simulation, Hypothetical events

Big Data General Statistics

By 2020, each person on earth will generate an average of about 1.7 MB of data per second.

Daily smartphone and computer usage means that the volume of data is expanding rapidly. The average user shares dozens of media links daily, and all of that has to be stored somewhere.

Worldwide, people are already generating 2.5 quintillion bytes of data each day.

There are plenty of ways for businesses to use this data to generate more profits. In recent years, marketers have begun developing data analytics tools that help them understand the market better. Simply put, data companies can always hear a melody in all the noise you’re producing online.

Advanced data analytics show that machine-generated data will grow to encompass more than 40% of internet data in 2020.

Machine-generated data is data produced by a computer without human input. Since apps and programs are becoming more complex, there will be an increasing need for advanced big data processing, even on a smartphone level.

Stored data will grow to 44 ZB by 2020.

Originally, data analytics companies proposed a much lower number of just 40 ZB. However, the last couple of years have seen the growth of IoT technologies. IoT is already growing in volume, with more and more home appliances connected to cloud computers. In fact, IoT statistics for 2020 state how IoT generated data will be widely used to cut costs in almost all industries.

Nearly 90% of all data has been created in the last two years.

Analytics show that data grows at a nearly exponential rate. The number comes as no surprise, especially if you take into account the growth of machine-generated data.

Data growth statistics show that more than two-thirds of data today is generated by individuals, not companies.

The average day of an internet user is unimaginable without social media. Millions of people are creating a huge mass of online data simply by sharing or commenting on news, posts, and articles. Needless to say, big data companies and advertising agencies have recognized the business potential of user-generated data.

Website data analysis shows that more than 570 new websites are created every day.

There’s content being generated every second of our lives — even now as you are reading this. Businesses and individuals are moving their operations online, and as a result, machine-generated data is rising almost exponentially.

Big Data Business

A recent big data analysis report from Dresner Advisory Services concludes that 53% of companies are adopting big data analytics.

It’s becoming impossible to ignore big data and its business impacts. Big data is proving to be quite profitable. Millions of users are generating new data points each day, and plenty of data analysis tools have been developed to make sense of all the information. Data analytics is invaluable when analyzing market trends and policies.

Unstructured data is a problem for 95% of businesses.

The vast majority of companies today don’t have the necessary expertise to deal with big data. Most of the time, big data solutions are outsourced to other companies. Experts say big data specialist will soon become one of the most sought-after professions.

Over 150 trillion gigabytes (150 zettabytes) will need analysis by 2025.

Companies are getting desperate for experts who possess data analysis skills. Big data is a lucrative business, regardless of what industry you’re in. Marketers can use a data analysis tool to get a better understanding of what customers want. Financial advisors use it to predict market fluctuations. Simply put, applications of big data are countless.

Big Data Jobs Statistics

59% of all data science and analytics job demand is in finance and insurance, professional services, and IT.

The growth of data analytics has brought a huge demand for data science jobs. Currently, the highest demand for data science experts is in finance and insurance (19%), followed by professional services (18%), and IT services (17%). The question remains whether or not other industries will pick up the pace and start recognizing the benefits of big data.

According to Forbes’ big data statistics, the number of data science jobs is projected to grow by 364,000 by the end of 2020.

The overall demand for data science experts is projected to grow approximately 39% and will reach 2,720,000 jobs. Similarly, there will be a 15% increase in job openings for data science specialists. However, if you were thinking of applying right away, note that 81% of companies require specialists with at least three to five years of experience. However, not all data science jobs are the same, and different experts deal with different types of data analytics.

How much data is on the internet?

If we combined all the storage space from the four biggest online storage and service companies like Google, Amazon, and Microsoft, and looked at big data in terms of numbers, we would get a figure of 1.2 million terabytes. However, this figure is growing each second, and there are speculations that the volume of data will grow to 44 ZB by the end of 2020.

How much data does Google process per day?

According to a 2008 Google big data analysis, Google’s search engine was handling around 20 petabytes per day. However, a decade is a lot of time in terms of computing power development. If we apply Moore’s law, we can get an approximate figure of 160 petabytes per day.

How fast is the internet growing?

The internet is growing at a rate of 11 new users per second, or a million users each day. Currently, around 57% of people on earth use the internet.

Tools of BigData

Hadoop

Apache Hadoop is the most prominent and used tool in big data industry with its enormous capability of large-scale processing data. This is 100% open source framework and runs on commodity hardware in an existing data center. Furthermore, it can run on a cloud infrastructure. Hadoop consists of four parts:

Hadoop Distributed File System: Commonly known as HDFS, it is a distributed file system compatible with very high scale bandwidth.
MapReduce: A programming model for processing big data.
YARN: It is a platform used for managing and scheduling Hadoop’s resources in Hadoop infrastructure.
Libraries: To help other modules to work with Hadoop.

Distributed Storage

The two problems of the big data are the Volume of the data and the velocity of the data. Now to store such a big volume of the data, it can be done through distributed storage. Every cluster will have a master and slave topology.

The slave node will save this large volume of the data in its hard drive. The master node will manage the data storage on the slave. It split the data into the blocks and store those blocks on the slave node in parallel. Thus here the problem of velocity is solved.

Thus this master-slave distributed data storage is used in Hadoop.

!!!Thank You for Reading!!!