Hi there, I’m Rebecca, welcome to my website! Below you will find my newest posts as well as quick links to some of my more popular posts.
I’m also available for data science trainings and consulting! Click here to learn more.
New posts
AI Tutorial: Using Text Embeddings to Label Synthetic Doctor’s Notes Generated with ChatGPT
I’ve been playing around with OpenAI and ChatGPT in my research, and I thought I’d put together a short tutorial that demonstrates using ChatGPT API to generate synthetic doctor’s notes, and then using OpenAI’s text embedding models to label the notes according to whether they involve a chronic or acute condition. And yes, I’m fully aware that what I write here will probably be out of date in about 3 hours.
An introduction to Python for R Users
I have a confession to make: I am now a Python user. Don’t judge me, join me! In this post, I introduce Python for data analysis from the perspective of an R (tidyverse) user. This post is a must-read if you are an R user hoping to dip your toes in the Python pool.
Popular posts
R
Purrr is the tidyverse’s answer to apply functions for iteration. It’s one of those packages that you might have heard of, but seemed too complicated to sit down and learn. Starting with map functions, and taking you on a journey that will harness the power of the list, this post will have you purrring in no time.
The tidyverse’s take on machine learning is finally here. Tidymodels forms the basis of tidy machine learning, and this post provides a whirlwind tour to get you started.
With the introduction of dplyr 1.0.0, there are a few new features: the biggest of which is across() which supersedes the scoped versions of dplyr functions.
Statistics
Removing confounding can be done via a variety methods including IP-weighting. This post provides a summary of the intuition behind IP-weighting.
An introduction to the field of causal inference and the issues surrounding confounding.
Instrumental variables is one of the most mystical concepts in causal inference. For some reason, most of the existing explanations are overly complicated and focus on specific nuanced aspects of generating IV estimates without really providing the intuition for why it makes sense. In this post, you will not find too many technical details, but rather a narrative introducing instruments and why they are useful.