The Journey Toward Data Science: A Study Roadmap
*Pivoting from web development to data analytics*
Career Pivot
It's time to start studying again, and I've decided against pursuing .NET/Web Programming in favor of Data Science. I've only begun researching what I need to learn to get better acquainted with Data Science, and for now, I will focus on Python Programming.
I'm already pretty good with SQL and Relational Databases (SQL Server, Oracle), but there's much more to explore. Beyond Math and Statistics, I want to understand how to work with unstructured data.
Subject List
My initial study list (subject to change as I learn what I need) includes the following topics, presented in no particular order:
- →Python Language — foundational programming for data tasks
- →R Language — statistical computing and graphics
- →MongoDB / NoSQL — working with non-relational data structures
- →Big Data — understanding Hadoop and Hive
- →Cloud Tools — leveraging Amazon S3
Additionally, I will need to brush up on my Math and Statistics skills, as it has been a few years since university.
Reading Estimate
- →Volume — 4,000 Pages [5 to 6 books, each 500 to 800 pages]
- →Timeline — 10 months [100 pages per week] // Estimated Completion Time: October 2018
Resources
- →Intro to Python, 5th Edition — Mark Lutz
- →Programming in Python — Mark Lutz
- →The Art of R Programming — Matloff
Certifications
There are several certifications available for Python and R to serve as a measuring stick and pace-setter.
| Certification | Focus | Cost |
|---|---|---|
| 70-773 | Big Data w/ Microsoft R | $165 USD |
| 98-381 | Intro to Python | $127 USD |
| MongoDB DBA | Associate Level | $150 USD |
Updates
I am currently studying the 5th Edition of Learning Python by Mark Lutz. It is a larger volume with 40 chapters. I am taking my time and have read about 10 chapters over 17 days. I will try to pick up the pace as I delve deeper into this book and hope to finish it by mid-January 2018.
The content in this book is substantial; I've already filled out a 70-page notebook and had to re-ink three fountain pens. At this rate, I will need three more notebooks. Writing out the code manually helps me grasp it better and allows me to see differences, such as lists versus dictionaries.
I purchased The Art of R by Matloff and am currently studying it. I have installed R Studio on my i7 Windows 10 and i3 Linux systems.
This process is taking longer than anticipated. I have completed textbooks for both Python and R. Now, I am focusing my efforts on studying statistics itself rather than just tools or platforms. I aim to improve and hone my analytical mindset first, then augment it with the necessary tools.


Comments
Post a Comment