Redefining the Data Science Roadmap
Shifting from platform-specific coding to a theory-based mastery of data
The Data Science Paradigm Shift
One of the biggest eye-openers in my journey has been the realization that Data Science fundamentally revolves around Statistics. There’s a well-established college major focused on statistics, and data science isn't entirely new; it’s a combination of traditional statistical methods with modern computing technologies, utilizing software like R and SAS for analysis.
To truly grasp data science, it's essential not only to learn programming languages but also to understand the underlying principles. As such, I've decided to revise my study plan for 2019, shifting from a language/platform-specific focus to a more theory-based approach that emphasizes core concepts and techniques.
Eventual Skillset
- →Statistics — The core analytical foundation
- →DW/BI — Data Warehouse and Business Intelligence architecture
- →Math — Supporting quantitative theory
- →SQL — Database querying and management
- →Tableau — Data visualization
- →R — Statistical programming
- →Python — General purpose data scripting
Domain Study Materials
- →Statistics and Data Analysis (WMU) — A solid introduction to key concepts.
- →Basics of Statistics [BoS] — A great follow-on to reinforce and clarify concepts with alternative definitions.
- →Simple Data Analysis for Biologists — Useful for examples in hypothesis building.
- →Kimball - Data Warehouse 3rd Edition — Essential reading for understanding data warehousing.
- →Guide to Data Modeling (UW 1999) — A foundational text to support the study of Kimball.
- →Advanced Calculus Textbook — Higher-level mathematical theory.
- →Probability and Mathematical Statistics — Core probability frameworks.
Certification Roadmap
I am proficient in SQL and have completed the MCSA - SQL Server certification. This year, my study plan converges with the following certification goals:
| Certification | Focus | Cost (USD) |
|---|---|---|
| Tableau Desktop Specialist | Visualization | $150 |
| 70-773 Microsoft R | Big Data | $165 |
| 98-381 Python | Intro | $127 |
Goals
- →Read All the Books — Roughly 2,000 pages to cover.
- →Practical Hands-On Experience — Apply the concepts learned.
- →Get the Certifications — Achieve formal recognition of my skills.
Project Updates
Finished the WMU Statistics and Data Analysis textbook. It was a manageable read that helped clarify basic concepts while introducing new material for future study. Next, I move on to Basics of Statistics.
Finished Basics of Statistics [BoS]. It served as an excellent counterpoint to the WMU text. Concepts like Central Limit Theory (CLT), confidence intervals, null hypotheses, and p-values are becoming clearer. I also skimmed Simple Data Analysis for Biologists to better understand hypothesis formulation.
Began Kimball Data Warehouse 3rd Edition. I wish I had explored it sooner. My prior studies in statistics are helping me appreciate how to design data warehouses to support analytical models. The Guide to Data Modeling provided a great foundation for this, specifically regarding ERDs and essential vocabulary.


Comments
Post a Comment