Hi there! I’m Thomas Speidel a Data Scientist and Statistician. It’s been long overdue. For years, I’ve been writing about data science and statistical literacy on social media (I’ve been posting on LinkedIn since 2006!). At last, I look forward to collect, organize, and share my thoughts in this blog.
My expertise is in the area of predictive and explanatory modelling, time to event analysis, reproducibility and visualizations. For a decade, I honed my data science skills doing cancer research. Currently, I work as a data scientist for a large energy company.
Half jokingly, I consider myself a data snob: you see, I like data. But not just any data: identifying relevant data to address a problem is a journey. It requires craftsmanship, collaboration, intuition and a good dose of statistical reasoning. It also requires humility, because it’s easy to believe that the truth must be in the data, but sometimes it isn’t and the danger is we will find it anyway (Stephen Senn).
Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data.
(Tim Harford, Financial Times)
I hope that you find the content of this blog instructive and occasionally amusing.
I believe that statistical rigour must transcend professional barriers: decisions be guided by data and coated with domain context; uncertainty be embraced, not hidden. In the era of data-as-an-asset, organizations - often unknowingly - lack scientific rigour and statistical literacy to properly utilize data for decision making. And this can be costly.
What has guided my career in both business and government is my fundamental view that nothing is provably certain. One corollary of this view is probabilistic decision making. Probabilistic thinking isn’t just an intellectual construct for me, but a habit and discipline deeply rooted in my psyche”. “Success came by evaluating all the information available to try to judge the odds of various outcomes and the possible gains and losses associated with each. My life on Wall Street was based on probabilistic decisions I made on a daily basis. Robert E. Rubin
With so much of the field facilitated by computational tools, I am humbled to be a long time R user. R is an open source statistical programming language that has been around since the mid-70’s and is currently enjoying wide popularity. With a rich community, the timing to be doing data science couldn’t be better. In keeping with the spirit, this blog is entirely built and managed using R and Hugo.
Good statistical analysis seeks to calm down the rage to conclude,to align the reality of the evidence with the inferences made from that evidence. Edward Tufte
When I’m not writing about data science, I continue to help with cancer research as a way to give back to the community against this devastating disease. I spend most of my non-analytical time with my wife and three children, a St. Bernanrd dog, Ella, and a mobster cat, Luigi. I enjoy cooking (hey, I’m Italian!), swimming, listening to jazz.
Statistics is not a toolbox but a way of thinking (Frank Harrell)
- Thomas
Wed, Feb 4, 2015, SMi Group’s 17th annual E&P Information & Data Management Conference
Wed, Jun 11, 2014, Useful Business Analytics Summit, Boston, MA, USA