Data Science, Data Analytics and Big Data
JUST ANOTHER DAY
Your alarm goes off at 5:30 a.m. on Tuesday morning. You brush your teeth and turn on your geyser. Then, while waiting for the iron to heat up, you check your emails, but the power goes out. You make do with a scrunched-up shirt. Because coffee and toast are difficult to make without power for your spouse, a last-minute alteration to the morning menu: corn flakes and cold milk. You decide to skip the gym and go straight to the shower.
You get into your car and begin the trip to work at precisely 8:15 a.m., after a brief breakfast and a hasty conversation. On the way, you run into a never-ending traffic congestion with no way out. A conversation with a fellow commuter reveals that there is a procession taking place, and one of the lanes has been stopped.
When another commercial for a new luxury home airs on the radio, promising a 15-minute drive to work, you wonder what happened to the days when this road was deserted. Then you hear a hot new Bollywood song and start humming along.
Finally, after an hour and a half of agonising traffic, you arrive at work just in time for the daily meeting, but you’re frustrated and fatigued from the lengthy commute.
THE WAY THINGS ARE
This is a regular day in India for many office workers. They get up, dress themselves, and head to work. They make a few decisions along the way, but they mostly go with the flow. They are usually reactive and, regrettably, are only concerned with getting through the day.
It doesn’t have to be that way, though.
It’s Tuesday morning, and instead of 5:30 a.m., the alarm goes off at 5:10 a.m. You learned about the planned power outages and adjusted your schedule accordingly. You switch on the iron and then the geyser as soon as you wake up. While you are brushing your teeth, your spouse has already started making French Toast in the toaster. You can smell the hot cup of coffee waiting for you as you finish ironing your shirt.
The current abruptly shuts off without notice. You smile as you walk out the door for your morning run.
ALSO READ|: How to do SEO Competitor Analysis?
You have a wonderful, hot breakfast and coffee with some entertaining discussion after your workout and shower. You then get ready and leave around 8:30 a.m.
You take a little longer route but arrive at work in under 40 minutes, giving you plenty of time before your daily meeting.
You went with with the flow in the first scenario. You did things because it was second nature to you. You were content with the status quo. Before planning your day, you failed to account for numerous variables such as the power outage and the traffic jam that caused you to be late. You used a standard technique to a one-of-a-kind scenario and expected standard outcomes.
In the second scenario, you analyzed the various factors that might have an impact on your routine and adjusted your timetable accordingly. Because you were aware of the power outage, you got up a few minutes earlier than normal to turn on the geyser and iron.
ALSO READ: Artificial intelligence: A modern approach.
Your spouse also started the toaster and coffee machine a few minutes ahead of time. Then, after taking into account the traffic conditions for the day, you chose to take a different route.
You have facts from which you drew conclusions. You adjusted your actions as a result, and the result was considerably better. You made use of the power of analytics, however unintentionally.
Hello there, and welcome to the realm of data science.
WHAT IS DATA SCIENCE?
Data science refers to the application of tools and techniques from mathematics, statistics, computers, and domain expertise to the gathering, processing, manipulation, and interpretation of data.
To put it another way, data science is the process of using data to solve problems. It covers everything from data collecting to gaining insights from the information you’ve gathered.
APPLYING DATA SCIENCE
Let’s take a look at the narrative you just read.
Hypothetically, you avoided a repeat of scenario 1 by utilising insights gathered from an investigation of why your mornings were so rushed, in order to streamline your days and make them better and brighter.
To begin, you must ask yourself, “What do I require in order to have a fantastic day?”
The following variables are likely to appear on the list:
- Hot Water
This confluence of variables dictates the type of data you’ll need to gather, process, prune, and evaluate in order to obtain insight into how to improve your daily routine. Data science will assist you in determining the combined influence of each variable (data point).
DATA OR ‘BIG DATA’?
We analysed seven criteria in our simple morning routine example. The knowledge gained as a result could make your day a lot better.
But what if you were looking for something more? What if you had a model that was complicated enough to account for every single significant parameter (rather than only seven)?
You wouldn’t just be dealing with data anymore; you’d be dealing with Big data.
According to Wikipedia, big data is defined as follows:
“Big data” refers to data collections that are so massive or complicated that typical data processing programmes are insufficient to handle them. Analyze, capture, data curation, search, sharing, storage, transport, visualisation, querying, and information privacy are all challenges. The word usually alludes to the use of predictive analytics or other advanced approaches to extract value from data, rather than a specific data set size.”
To put it another way, big data is all about working with large datasets and extracting insights from them. Traditional approaches do not work with these datasets since they are so large. You’ll need to collect, analyse, store, and process data using properly designed procedures.
In general, the larger the dataset, the better the results — as long as the dataset is of acceptable quality.
In an ecommerce business, for example, the website collects a slew of data, including referring sites, time spent on site, bounce rate, landing page, and visitor flow. They keep track of this information on a person-by-person basis, which means that over the course of a few years, they’ll be able to compile a big dataset that standard approaches won’t be able to handle. That’s when they realise they’re working with ‘Big Data.’
As a result, in our morning routine example, you could have a very large dataset with a lot more parameters to process and evaluate. You may have gathered information from tens of thousands or maybe millions of people in your city. You may have gathered this information over a period of time and documented a number of additional aspects, such as weather, time of day, traffic updates, tweets, household income, and so on, that you could utilise in your study.
Another approach to put the size of datasets into perspective is to consider that a standard-sized dataset could be as thick as large as a daily newspaper.
You’d need 50 warehouses full of telephone directories to print out a ‘big data’ dataset.
Traditional tools and procedures will not suffice when dealing with such large amounts of data: specialized software created particularly for this purpose is required.
After you’ve gathered all of this information about your morning, you’ll need to investigate and research it in order to create your conclusions; this is known as data analysis. You can extrapolate from our example that watching ‘Saas bhi kabhi bahun thi’ on Monday night causes you to wake up later on Tuesday mornings. Alternatively, doing your laundry on Saturday rather than Sunday will allow you to have an additional ironed shirt on Tuesday.
But what if you wanted to search numerous data sets for more comprehensive, complicated patterns? Then you’d be engaging in data analytics.
The application of a series of procedures (algorithms) or transformations to derive insights from processed datasets is known as data analytics.
You would examine the complicated interplay of specific details in our morning routine example. For example, if you compare daily temperature to car utilization, you may find that temperature has a considerable impact on car usage. With a little further investigation, you’ll learn that this simple model is only valid during the summer months. During the rainy season, people use their cars the most. With this information, you can see that the next day’s rainfall is expected to be more than average, implying that traffic would be heavier.
That is data analytics in action. At work, data analytics is used to decide to leave sooner than usual since traffic will be higher.
Industry buzzwords like analytics, big data, and data science are frequently and wrongly used interchangeably. Data analysis is one of the basic operations that adds value to the data you collect, whereas data science is the domain in which you would operate. And you’re dealing with big data when you’re dealing with large amounts of data that can’t be processed using typical tools and methods.
How do you feel about our definition? Is it the same as yours? Do the examples of a “daily routine” work? Do you have a personal example you’d like to share? Please share your thoughts in the comments area.