Selected Publications

Comply or Explain: Advancing Corporate Diversification in Canada
Rpubs

Data storytelling is emerging as an important trait for company success. But does it always pay to tell stories from data? Can data-stories mislead your audience?
Statistics View; John Wiley & Sons and, LinkedIn Pulse

Introduction: the evidence for a role of dietary carbohydrate intake with endometrial cancer risk is conflicting. We therefore evaluated the association between glycemic load (GL) and endometrial cancer in a population-based-case control study using a comprehensive quantitative food frequency questionnaire for the estimation of GL. Methods: diet in the year before the reference date was assessed with the self-administered Canadian Diet History Questionnaire in 511 cases and 980 controls in Alberta, Canada between 2002 and 2006. Multivariable logistic regression was used to examine the association between GL and endometrial cancer risk, with non-linearity evaluated by the examination of cubic splines. Results: the risk for endometrial cancer did not change based on GL (for the highest versus lowest quartile, adjusted odds ratio = 0.87, 95% confidence interval = 0.52–1.46), even after the removal of participants previously diagnosed with diabetes ((diabetics n cases = 63, n controls = 55 excluded) adjusted odds ratio = 0.77, 95% confidence interval = 0.44–1.36). We observed no evidence of effect modification by Body Mass Index (BMI)(p-interaction term = 0.22). Conclusions: intake of foods eliciting a glycemic response was not associated with endometrial cancer risk in this population of Canadian women.

Interview with Francisco Sanchez from the Houston Energy Data Science.
Houston Energy Data Science

Invited Speaker: SMi Group’s 17th annual E&P Information & Data Management Conference. London, UK. Included but not presented by the author.
Mi Group’s 17th annual E&P Information & Data Management Conference. London, UK.

Statisticians are going through a sea of changing. In this article I discuss the problems we are facing and suggest some solutions to increase our visibility and contributions.
SSC Liason, Amstat News,

Presentation at the Useful Business Analytics Summit conference held in Boston, June 10-11, 2014. Session: using predictive analytics for effective business planning and added value.
Useful Business Analytics Summit, Boston, MA, USA

Presenting and Using the Results of a Predictive Tool
Wikipedia, The Free Encyclopedia

Cancer prevention guidelines recommend a healthy body mass index, physical activity, and nutrient intake from food rather than supplements. Sedentary individuals may restrict energy intake to prevent weight gain and in so doing may compromise nutritional intake. We conducted a cross-sectional analysis to determine if adequacy of micronutrients is linked to physical activity levels (PALs) in healthy-weight adults. Tomorrow Project participants in Alberta, Canada (n = 5333), completed past-year diet and physical activity questionnaires. The percent meeting Dietary Reference Intakes (DRIs) was reported across low and high PAL groups, and the relation between PAL and percent achieved DRI was determined using multiple linear regression analyses. Overall, <50% of healthy-weight participants met DRIs for folate, calcium, and vitamin D. Percent achieved DRI increased linearly with increasing PAL in both genders (P < 0.01). A hypothetical increase in PAL from 1.4 to 1.9 was associated with a DRI that was 8%–13% higher for folate and vitamin C (men) and 5%–15% higher for calcium and iron (women). Healthy-weight adults at higher PALs appear more likely to meet DRIs for potential cancer-preventing nutrients. The benefits of higher PALs may extend beyond the usual benefits attributed to physical activity to include having a more favorable impact on nutrient adequacy.

Purpose: alcohol consumption is hypothesized to increase the risk of endometrial cancer by increasing circulating estrogen levels. This study sought to investigate the association between lifetime alcohol consumption and endometrial cancer risk. Methods: we recruited 514 incident endometrial cancer cases and 962 frequency age-matched controls in this population-based case–control study in Alberta, Canada, from 2002 to 2006. Participants completed in-person interviews querying lifetime alcohol consumption and other relevant health and lifestyle factors. Participants reported the usual number of drinks of beer, wine, and liquor consumed; this information was compiled for each drinking pattern reported over the lifetime to estimate average lifetime exposure to alcohol. Results: lifetime average alcohol consumption was relatively low (median intake: 3.9 g/day for cases, 4.9 g/day for controls). Compared with lifetime abstainers, women consuming >2.68 and ≤8.04 g/day alcohol and >8.04 g/day alcohol on average over the lifetime showed 38 and 35 % lower risks of endometrial cancer, respectively (p trend = 0.023). In addition, average lifetime consumption of all types of alcohol was associated with decreased risks. There was no evidence for effect modification by body mass index, physical activity, menopausal status, and hormone replacement therapy use combined and effects did not differ by type of endometrial cancer (type I or II). Conclusion: this study provides epidemiologic evidence for an inverse association between relatively modest lifetime average alcohol consumption (approximately 14 to 12 drink/day) and endometrial cancer risk. The direction of this relation is consistent with previous studies that examined similar levels of alcohol intake.

Chronic inflammation may be important in endometrial cancer etiology. Several established endometrial cancer risk factors, particularly obesity, are hypothesized to operate through this pathway by increasing proinflammatory cytokines such as tumor necrosis factor alpha, interleukin-6 (IL-6), and acute-phase protein C-reactive protein (CRP). This study sought to investigate the association between inflammatory markers and the risk of endometrial cancer (types I and II). We recruited 519 incident endometrial cancer cases and 964 frequency age-matched controls in this population-based case-control study in Alberta (Canada) from 2002 to 2006. Participants completed in-person interviews, were assessed for anthropometric measures, and provided 8-h fasting blood samples either preoperatively or postoperatively. Blood was analyzed for the concentrations of TNF-[alpha], IL-6, and CRP by immunoassay. Endometrial cancer cases had consistently higher mean levels of TNF-[alpha], IL-6, and CRP compared with controls in these predominantly postmenopausal women. After adjusting for age, all markers were associated with statistically significant increased risks for endometrial cancer; however, after multivariable adjustment, only the risk from CRP remained elevated (odds ratio=1.22, 95% confidence interval: 1.02-1.47). Similarly, upon stratification by cancer type, only CRP was associated positively with an increased risk for type I endometrial cancer (odds ratio=1.25, 95% confidence interval: 1.03-1.52). All markers were associated with an elevated risk for the more rare and aggressive type II cancers; however, these findings were statistically nonsignificant, likely because of the small number of cases in this group. In conclusion, we found epidemiologic evidence for an association between CRP and the risk of endometrial cancer, which was slightly stronger for type I cancer. No associations emerged for TNF-[alpha] and IL-6.
In European Journal of Cancer Prevention

Hormonal and reproductive factors modulate bioavailable estrogen to influence endometrial cancer risk. Estrogen affects the microsatellite status of tumors, but the relation between these estrogen-related factors and microsatellite instability (MSI) status of endometrial tumors is not known. We evaluated associations between hormonal and reproductive factors and risks of microsatellite stable (MSS) and MSI endometrial cancer among postmenopausal women (MSS cases = 258, MSI cases = 103, and controls = 742) in a population-based case–control study in Alberta, Canada (2002–2006). Polytomous logistic regression was used to estimate ORs and 95% confidence intervals (95% CI). We observed a significant trend in risk reduction for MSI (Ptrend = 0.005) but not MSS (Ptrend = 0.23) cancer with oral contraceptive use; with 5-year use or more, the risk reduction was stronger for MSI (OR = 0.42; 95% CI, 0.23–0.77) than for MSS cancer (OR = 0.80; 95% CI, 0.54–1.17; Pheterogeneity = 0.05). For more recent use (<30 years), the risk reduction was stronger for MSI (OR = 0.36; 95% CI, 0.19–0.69) than for MSS cancer (OR = 0.77; 95% CI, 0.51–1.15; Pheterogeneity = 0.032). No differential risk associations were observed for menopausal hormone use, parity and age at menarche, menopause or first pregnancy. We found limited evidence for statistical heterogeneity of associations of endometrial cancer risk with hormonal and reproductive factors by MSI status, except with oral contraceptive use. This finding suggests a potential role for the MMR system in the reduction of endometrial cancer risk associated with oral contraceptive use, although the exact mechanism is unclear. This study shows for the first time that oral contraceptive use is associated with a reduced risk for MSI but not for MSS endometrial cancer.

Markers of insulin resistance such as the adiponectin:leptin ratio (A:L) and the homeostasis model assessment ratio (HOMA-IR) are associated with obesity and hyperinsulinemia, both established risk factors for endometrial cancer, and may therefore be informative regarding endometrial cancer risk. This study investigated the association between endometrial cancer risk and markers of insulin resistance, namely adiponectin, leptin, the A:L atio, insulin, fasting glucose, and the HOMA-IR. We analyzed data from 541 incident endometrial cancer cases and 961 frequency age-matched controls in a population-based case–control study in Alberta, Canada from 2002 to 2006. Participants completed interview-administered questionnaires were assessed for anthropometric measures, and provided 8-h fasting blood samples either pre- or postoperatively. Blood was analyzed for concentrations of leptin, adiponectin, and insulin by immunoassay, and fasting plasma glucose levels were determined by fluorimetric quantitative determination. Compared with the lowest quartile, the highest quartile of insulin and HOMA-IR was associated with 64% (95% confidence intervals (CI): 1.12–2.40) and 72% (95% CI: 1.17–2.53) increased risks of endometrial cancer, respectively, and the highest quartile of adiponectin was associated with a 45% (95% CI: 0.37–0.80) decreased risk after multivariable adjustments. Null associations were observed between fasting glucose, leptin and A:L, and endometrial cancer risk. This population-based study provides evidence for a role of insulin resistance in endometrial cancer etiology and may provide one possible pathway whereby obesity increases the risk of this common cancer. Interventions aimed at decreasing both obesity and insulin resistance may decrease endometrial cancer risk.
In Endocrine-Related Cancer

A population-based case–control study of physical activity and endometrial cancer risk was conducted in Alberta between 2002 and 2006. Incident, histologically confirmed cases of endometrial cancer (n = 542) were frequency age-matched to controls (n = 1,032). The Lifetime Total Physical Activity Questionnaire was used to measure occupational, household, and recreational activity levels. Multivariable logistic regression analyses were conducted. Total lifetime physical activity reduced endometrial cancer risk (odds ratio [OR] for >129 vs. <82 MET-h/week/year = 0.86, 95% confidence interval [95% CI]: 0.63, 1.18). By type of activity, the risks were significantly decreased for greater recreational activity (OR = 0.64, 95% CI: 0.47, 0.87), but not for household activity (OR = 1.09, 95% CI: 0.75, 1.58) and/or occupational activity (OR = 0.90, 95% CI: 0.67, 1.20) when comparing the highest to lowest quartiles. For activity performed at different biologically defined life periods, some indication of reduced risks with activity done between menarche and full-term pregnancy and after menarche was observed. When examining the activity by intensity of activity (i.e., light <3, moderate 3–6, and vigorous >6 METs), light activity slightly decreased endometrial cancer risk (OR = 0.68, 95% CI: 0.48, 0.97) but no association with moderate or vigorous intensity activity was found. Endometrial cancer risk was increased with sedentary occupational activity by 28% (95 CI%: 0.89, 1.83) for >11.3 h/week/year versus ≤2.4 h/week/year or by 11% for every 5 h/week/year spent in sedentary behavior. This study provides evidence for a decreased risk between lifetime physical activity and endometrial cancer risk and a possible increased risk associated with sedentary behavior.
In Cancer Causes & Control

To reduce costs and avoid inconvenient overtime work, our institution changed policy in September 2000 so that autologous stem cell apheresis products were stored overnight before cryopreservation rather than immediately processed. This retrospective review was conducted to evaluate the possible impact of this policy change on hematopoietic engraftment following autologous stem cell transplantation (ASCT). In total, 229 consecutive lymphoma patients who underwent a single, unpurged ASCT in Calgary between January 1995 and November 2003 were evaluated. Of these patients, 131 patients’ autografts underwent immediate processing and cryopreservation before September 2000, and 98 patients’ autografts underwent next-day cryopreservation after overnight storage following this date. Results of univariate and multivariate analyses demonstrated no adverse effect of overnight storage before cryopreservation on the number of days to initial engraftment of platelets or neutrophils, on the proportion of patients with low blood counts 6 months post-ASCT, or on lymphoma relapse rates or overall survival post-ASCT. These data suggest that overnight storage of the autograft before cryopreservation does not adversely affect graft viability or influence long-term disease status, and support the continued use of overnight storage of stem cells before cryopreservation as a convenient, cost reduction measure.
In Bone Marrow Transplant

Recent Publications

More Publications

Recent & Upcoming Talks

Recent Posts

More Posts

I recently came across a fairly simple table of results. Being affected by an incurable condition called “graphical and tabular intolerance disorder”, I felt compelled to do something. The table comes from the US Open Dataset initiative. In particular, the Federal Aviation Interactive Reporting System (FAIRS). You can read more about it here.

CONTINUE READING

Some time ago I started writing a post on data preparation which I never completed and eventually forgot. A recent LinkedIn post by Kevin Gray stimulated a rich conversation around: “Can Data Cleaning be automated?”. It reminded and enticed me to complete the post. Data Cleaning and Data Preparation When practitioners talk about data cleaning, they usually refer to a collection of tasks needed to make the data amenable for analysis.

CONTINUE READING

Just a few notes for those wondering how I built this site. I’m utilizing R with Yahui’s fantastic package blogdown. Blogdown utilizes Hugo which is a website generator: it compiles a bunch of files so that they can be served in a website. The nice thing about Hugo is that being open source, it has a large community that contribute themes and plugins. While Hugo has many themes to choose from, I’ve had difficulties finding one that work well with blogdown.

CONTINUE READING

As I build this blog, I’ve been looking for expressive datasets to illustrate ideas and examples. For data management, I found the U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics delays and cancellation dataset appealing for two reasons: first, it contains a good mixture of variable types (date and time, categorical and numerical); recond, on a personal level, I’ve always been interested in aviation. This dataset is available from Kaggle, or directly from DOT.

CONTINUE READING

Everyone seems to like top-10 lists and many organizations are interested in Big Data, so it seems timely to write my own top 10 list on Big Data. A premise is warranted. Those who know me, know how much I ditest the term “Big Data”. Yet, for good or worse, Big Data is here to stay and so it’s important that we try clarify what it is and it isn’t.

CONTINUE READING