How and why I became a data scientist
- Photo by Ross Findon on Unsplash
I am one of the organizers of a data science co-learning meetup in my home town and we often meet people who, like me one year ago, want to switch careers to become a data scientist. They often ask about our experiences - so here is my story:
How did I become a data scientist? (TL;DR)
After leaving my first non-academia job, approximately one year ago, I decided it was time for me to switch career paths into what I thought was business intelligence.
Who am I and why on earth did I do what I did?
I studied biology - as some people do - because for as long as I can remember I wanted to be a scientist.
However, this quickly changed during my time at the university and I was left with two options:
- be miserable and stay in academia
- leave and try something new/else
As you might have guessed I chose option #2.
Defining something new
While still in academia I decided to switch and try this cool thing called industry.
Biologists can usually directly start either in pharma or food industry (for example in quality control or similar fields).
** Well… **
That was basically the same stuff I did before which did not feel right. Additionally, I wanted to make a clean cut and not try to hold on to what I knew - so no more biology.
I subsequently joined a startup that was growing rapidly and well lets just say it did not work out.
Neither were the tasks fulfilling, nor did I have the feeling that the environment of a fast growing company - changing on a daily or even hourly basis - was ideal for my personal development.
As a result we split ways, it did not work out, which is fine - really! I had my doubts and fair share of people telling me to suck it up and just go with the flow - N O.
When life gives you melons - make lemonade.
Exactly, so I encourage all of you, whoever is reading this, leave when you can. If the company culture or working conditions or pay or holiday regulations do not suit your needs - switch.
Fullstop.
Life is full of opportunities and I sincerely hope that you discover what is important to you work-wise.
Back to my story - I learned a lot about myself when confronted with an unsuitable work-environment.
For example, what I need to function effectively at work:
- I want to work with like-minded people
- teamwork is of utmost importance to me
- it is okay to say if you feel uncomfortable with a given task
- as a result you will receive help by peers
- or you change tasks
- or a mentor will guide you
- feedback is carefully placed and if negative always accompanied by an idea to improve
- one-company - no elbows-first style
- the chance to introduce and enable changes
So far so good - but what about data science?
Well, I met someone in this startup and he was responsible for something called business intelligence.
We talked about a script I was writing for automated ordering and he told me what he was doing: install and organize a data lake/warehouse, pipelining, cleaning and analyzing the data, writing reports and creating dashboards.
I liked that, a lot.
Where to start, however…
How I start a project is by making lists. I evaluated what I already knew and had to learn in order to make this a career.
What I already knew:
- how to work with data both big and small
- how to program in python
- how to comfortably talk to audiences large and small
- how to evaluate requirements and plan projects end-to-end
This time I tried to check possible resources and remembered that a friend told me there is this platform called coursera.
Searching for business intelligence on coursera revealed a lot of options (>50) which overwhelmed me. However, there are also other MOOC platforms - edx, udemy, udacity just to name a few, all of these had business intelligence courses as well. So many courses - and no idea where to start. I decided to pick a few and just start. Additionally, I changed the topic of search to both business intelligence and data science (the first time that I heard/used the term), since they seemed to be connected somehow. Data science sounds so much cooler from an academic point of view, it combines the things I already knew how to handle - science includes a lot of data handling and munging and visualization - that was a bliss, I found something I might be really good at. Long story short, a lot of these courses were not made for me. That does not necessarily mean they were unusable or just plain bad, I just did not get the feeling of accomplishing too much while watching videos of someone who’s voice I could barely stand for a minute, much less 20 hours of course time. One of the most revelating moments was when I discovered courses that helped me improve my understanding of python and data engineering and watching them did not feel like a lecture but rather like a conversation with an old friend. My preferred courses contain:
UCSanDiegoX: DSE200x - Python for data science (free to attend - you don’t need a certificate) (https://www.edx.org/course/python-for-data-science-0)
Jose Portilla: Python for Data Science and Machine Learning Bootcamp (price should be around 10-15$, if not wait a little for one of the many events at udemy) (https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/)
Fast.ai - excellent machine learning and deep learning courses (free, forever.) (http://fast.ai)
These three are ordered and the first two possibly can be swapped but I would recommend to use fast.ai courses after you have had some exposure to basic topics since these courses are rather dense and might be overwhelming.
After finishing these courses I also attended three datacamp courses and finished them with certificates, however I have to say these did not help me to become better, they are just certificates that I collected.
However, they were invaluable for me to discover that only my own projects would give me the kind of challenge that forces you to learn new things and tackle issues by yourself (and stackoverflow/google of course).
These projects help both by building a portfolio for future employers as well as giving you a chance to think about how to best present results in a visually pleasing manner.
My daily routine is a lot like this now: Create visualizations that make data accessible to non-data native stakeholders - Tell stories with your data.
When I had decided on my first project (an analysis of web scraped apartment data for finding my next flat using a data-driven approach), it all came together and I had to first mine the web for data, then spend about 80% of the total project time on data cleaning and finally had some time for visualization, insight generation and modelling.
This end-to-end project planning/execution is what got me my first job as a data scientist.
You can do it to.
How did I manage the hassle with applications, writing CVs, cover letters etc.?
I will tell you in the next post. ;)