Introduction to Data Science: A Starter Kit for Aspiring Data Scientists
Demystifying Kaggle for Data Science Beginners
Before you read the article, why don't you watch this video to learn why it is so important for Data Science aspirants/professionals to use Kaggle and GitHub.
This article mainly focuses on those who still face difficulty while solving Data Set or to participate in ‘Hackathon’, at the same time find ‘kaggle’ complicated to participate which was not the case at all.
Before getting into kaggle... First of all, you have to be specific about why data science?
Is it just because of having a Higher Salary Package in Industry!! If you think so and moving ahead, you are on the wrong track.
There are always two things why you are choosing this field just because you are interested or it's looking interested from the outside??
Be Clear with your WHY and you are good to land with this Domain.
It depends on how you see data science from your perspective to land with this domain.
Data Science consists of Business Knowledge, Mathematics, statistics and programming skills to find out insights to deal with the Required Business Problem.
Data Science is not New in the Market, But Yes the name is Really fascinating and interesting with interesting tools to deal with. Before Data Scientist there was "Statistician" and Now replaced by "Data Science". The major and common part was to deal with strong statistics and probability concepts to find out insight to deal with the required problem.
Most of you are more focused on landing a good job without being prepared well instead focus on learning & automatically things will come up as per my experience.
The question that used to come from many individuals is that, "Is kaggle all about competition?"
Here is my answer with some bullet points:
- Not at all, kaggle is not all about hackathons & winning competitions.
- If you are starting with your journey then you must be learning python & applying it to the data set, from here you can start your kaggle journey.
- So, how to use kaggle? Well, whatever you are learning, just apply those concepts with a different data set & explain in layman words so that it can help you as well the community over kaggle. Publish at least 1 kernels weekly over kaggle.
- It will be giving you visibility, as kaggle is one of the biggest communities of data science enthusiasts, Machine Learning Engineers, Deep Learning engineers worldwide.
- You can find multiple kernels over kaggle for the same concept & you can learn in a more efficient way(kernels are nothing just a Kaggle’s version of Jupyter Notebook).
- Another way to use kaggle is to make use of it's discussion forum. The best part of kaggle for beginners is that you can make any silly doubt as a discussion topic & individuals will help you to resolve your query over the discussion. At the same time take part in the discussion forum & get into with community where again you can become the discussion expertly.
- If you follow the above steps multiple times on a daily basis for 1 to 2 hours then you can be one of the "kernel masters" or "discussion expert".
Learning is what all matters & you should be much more focused on.
Secondly ‘Data Science’ is more of ‘research’,’your interest’ & ‘curiosity’ instead of being having ‘good programming knowledge’. And ‘Programming Knowledge’ is the second phase for all ‘data scientists’.
The very first phase according to me for all ‘data science enthusiasts’ with small bullet points are as follows:
- Be a ‘good data storyteller’ Before Being a ‘data scientist’
- Understand The ‘Problem Statement Deeper Before Going Through ‘DATA SET’’
- Work out with some ‘existing solution for your problem statement if available OR have a look on multiple research papers for the same’ which will help you to work with a different solution & while making some Hypothesis of our own you can come up with the ‘Best Solution’.
- One of the important ‘differences between kaggle grandmaster & kaggle beginner is that many of the individuals are more focused on machine learning accuracy instead of not focusing on ‘statistical approach for required problem statement’.
Everyone out there is more focused on ‘accuracy’ instead of understanding the required problem in more of ‘statistical learning’.
Let’s assume there are two students, One scored 95% in his/her Graduation or any other program & the other one failed or scored 40% or less.
So, what you can analyze from the above statement is that the one who scored less or failed he/she not going to get any job & can’t achieve anything... Which is completely wrong!
Search on ‘Google’, you will find a lot of individuals who are working with ‘big giants’ Like Google, Facebook & Microsoft without any ‘degree’ or they have graduated from 'Indira Gandhi National Open University (IGNOU)' & PLACED WITH such MNC’s.
What we can conclude from above is that ‘accuracy’ doesn’t matter what matters is to work with a ‘statistical approach' to solve any problem with your life or with the required ‘data set’. And that is what all ‘kaggle grandmaster believes in’.
Written by Vivek Chaudhary, a Data Scientist at GSAID and a mentor at Board Infinity. So what are you waiting for do start with your journey & drop him any question for help at vivekchaudhary0612@gmail.com
Happy Learning!!