Business school study guide
In the study guide we go into more details and granularity – with references, readings, and practices for particular topics within each lecture. I suggest to read about the course philosophy, which motivates why it is structured in this way. If you’re unsure about the prerequisites and background knowledge, refer to “Why did you study all of that?”.
Some lectures are conceptual and you might go into more depth by reading. Some are highly technical and you want to go through the math and practice/code on suggested case-studies.
Introduction and course overview
First, make sure you read the introduction for a motivation and brief summary of what are we going to study, then review the schedule and syllabus in order to understand how the course is structured in particular lectures.
If you’re a student, please, don’t skip the readings in the first two lectures. These are perhaps the most important learnings from 10 years of practice and will serve you well.
- Understand the particularities of decision-making in businesses
- Why I prefer “decision science” over “data science”
- Necessary admin stuff and project requirements. Think about your goals, aspirations, interests and hobbies, so that you can pick a fun use-case
- Go through the slides again after the readings and lectures
On the card you received in the class, answer the following questions:
- Answer from 1 (strongly disagree) to 5 (strongly agree) to each to the possible purposes (raison d’etre) of a busines.
- Are you considering to start a business in the next 5 years?
Now, imagine you’re in charge of public policy, or take part of an entreprenorial think-tank or NGO for youths. Formulate a research question that a further study would try to answer. It doesn’t have to relate to the particular questions you answered before.
If you’re serious about data science and statistical fundamentals, I highly recommend the following two books by A. Gelman for self study: Regression and Other Stories and Active Statistics
This story can be found in “Active Statistics” and “Regression and Other Stories” by A. Gelman.
Business context, decisions, and strategy
I build upon the previous lecture to develop the story of decisions in a business environment. We need additional terminology, concepts, and models in order to structure the problem space. What is key here, is to get immersed and put yourself in the shoes of decision-makers. Read the second chapter.
- Evolution of firm’s performance in time. Why, Where, How? 1
- Status quo, desired, and feared trajectories.
- Tradeoffs and limited resources, SWOT Analysis
- A deep-dive into Business Analyst’s Workflow
- What is strategy? What makes good or bad strategy? 2
- What is a Value Chain? Data Science Strategy Safari 3
1 Read Kim Warren’s article on “The Dynamics of Strategy” here
2 Read the article published in McKinsey by Richard Rumelt “The perils of bad strategy”
3 Read about the framework of aligning business objectives to data science use-cases. Data Science Strategy Safari at bayesianquest
As a second activity, think about as many domains and use-cases as possible in which AI and data science plays an important role. Put yourself in the shoes of the firm: how would you reverse-engineer the product? Then, see what resonates with you here
Take a subscription-based publication like London Review of Books and analyze it using the tools and frameworks you learned.
Think about acquisition, churn, printing, transportation, and market share. How can it grow? What would make it profitable? What are the key decisions to be made?
There is a good reason why management consultants practice Fermi Estimation (sanity, order-of-magnitude checks) and it is asked in many interviews.
Take a look at the case study on Food Stamp Fraud by Carl Bergstrom and Jevin West in Calling Bullshit. Practice on other classical Fermi estimtion examples.
Decision-making under uncertainty at scale
In the previous lecture we learned how data science fits into the larger context of a business. The ideas of value chain and business analyst’s workflow are immediately applicable in practice. Now, read the third chapter.
- Clarify what I mean by AI and Cybernetics and the historical confusion 4
- When do we need Analytics vs Statistics vs Machine Learning?
- A note on interdisciplinarity and thinking in buckets
- Agents, environments, and sources of uncertainty
4 K. Pretz, “Stop Calling Everything AI, Machine-Learning Pioneer Says”; M. Jordan, “Artificial Intelligence - The Revolution Hasn’t Happened Yet”
Choose a direct-to-consumer e-commerce, like Zara, H&M, or a marketplace like Zalando (the big players in the oligopoly). How does a potential business strategy look like? What are some use-cases for Data Science / AI which can improve their outcomes?
In the case of Uber, Bolt (ride sharing platforms), we’re very much interested in same questions as before, however, we need to get a grasp on the idea of market-making. If you were to reverse-engineer the pricing algorithms, how would you go about doing it?
It is very hard to find good, realistic datasets which map well on representative use-cases from this course. This is why I curated a list of public datasets in a variety of domains. These should help if you don’t know where to start your project, that is don’t have particular problems, hypotheses, or research questions in mind.
Newsvendor Problem and Demand Planning
Demand planning, especially demand forecasting and inventory optimization is an obvious and widespread application of the models and methods from the course. The newsvendor problem is a good way to get started in this area of businesses.
In the economics’ courses you might’ve solved this problem analytically, under strong assumptions, but we have a huge problem of trying to quantify and take into account the uncertainty. Therefore, we rely on simulation, motivate the need for statistical inference and optimization algorithms. 5
5 There is an excellent presentation of the newsvendor problem by Adam Fleischhacker in Chapter 3 and Chapter 5 of “Persuasive Python”. Chapter 20 of his Business Analytics book contains another classic economic problem interpreted in modern way. If you prefer, there is a video lecture on the newsvendor.
In the paper “Newsvendors Tackle the Newsvendor Problem”, Koschat, Berk et. al. show how an analysis and optimization of printing decisions led to TIME INC. to revise its policies and generate additional \(\$3.5m\) in profit.
Besides Persuasive Python, you can find an alternative case-study on “Yaz” restaurant data, which makes playing around with the newsvendor problem easy, due to its Python package.
For an academic paper with a meta-analysis of data-driven approaches to the newswendor problem, check out S. Buttler, A. Philippi. If you have a demand planning use-case at work, the best resources are written by N. Vandeput and I. Svetunkov’s CMAF seminars.
Learning, Intuition, and Bias. What is ML?
At this point you will have a better grasp of the business environment, the processes involved in decision-making and had exposure to multiple contexts via case-studies. The newsvendor problem exposed the fact that we need additional skills and understanding of statistical modeling, machine learning and analytics. Read chapter five.
I start the introduction to machine learning in an unorthodox way, from a cognitive science perspective. The reason is: because we gain additional insights into decision-making. But don’t worry, top courses like Shai Ben-David’s “Understanding Machine Learning” 6 and Yaser Abu-Mostafa’s “Learning from data” 7 start with similar motivations.
6 Shai-Ben David - Understanding Machine Learning, 2014. Read chapter 1 and 2 for the intuition. The following ones are difficult, mathematical, but highly rewarding if you want to understand the theoretical foundations of ML.
7 Learning from data is still a great foundational course a decade later. The teaching in the recorded lectures is exceptional
- Implicit learning, intuition, and bias. Bait shyness and pigeon superstition
- The double edged sword of our intelligence. Bounded rationality
- Calling bullshit in the age of Big Data. Small data problems in Big Data
- Intelligence, Rationality, Wisdom, and Foolishness
- The learning problem and empirical risk minimization
- The surprising consequences of Bias-Variance tradeoff
We can pick a (seemingly) simpler use-case like housing prices prediction, credit default risk assessment, churn prediction, demand forecasting, or recommender systems – and show how training and prediction look like in code.
I will show that there is much more nuance behind the scenes than what kaggle datasets show, and that often the problem framing makes ML look deceptively simple.
12 Steps of Machine Learning
By now, I introduced the bare minimum of motivation, formalism, and practices of Machine Learning for you to get by. However, we need a structured approach to tackle real-world problems with ML models. The methodology I like most is articulated by Cassie Kozyrkov, (ex) Chief Decision Scientist at Google. 8
8 Cassie Kozyrkov - Making friends with machine learning, full course. It is 6 hours which will serve you during the entire carreer
- Present the 12 steps with representative examples of classification and regression
- Is the ML project feasible? If yes, is it a good idea?
- Split your damn data. Cross-Validation and hyperparameters. Pitfalls
- Similarities with CRISP-DM, tuckey’s EDA, and tidy modeling
I like this Kaggle challenge and dataset, because of how realistic and open-ended it is. You can keep it simple, with out-of-the-box models, or build some highly customized ML pipelines.
It also has data which requires a combination of different approaches to feature engineering. It is messy enough that we have to justify how we deal with those missingness patterns and weird data points. Those modeling decisions will have an impact on model performance.
Price optimization under limited capacity
In contrast with the newsvendor problem (where prices were fixed), it’s time we focus on pricing decisions. The limited capacity of airplane seats, concert tickets, hotel rooms, on the one hand simplifies the problem formulation, but introduces many complexities in practice.
This seemingly simple problem motivates why the young field of revenue management was created. When forecasting the demand, we need to think about how to model it well 9 and take into account most relevant factors influencing it. This is where we have to be careful, or at least aware of what economic theory has to say about demand elasticity, competition, and price discrimination.
9 This is where what we learn in Module 3: Applied Bayesian Statistics, especially the hierarchical models help a lot.
On the optimization side, we will keep things simple, as in the newsvendor problem. We will not go into the complexities of dynamic programming.
We can use the Kaggle simulator for dynamic pricing in the airline industry, get creative with the Mercari Challenge or other open datasets. There isn’t a firm which will make such kind of data public, so we need to get creative in how we simulate, estimate, or collect the data from what is available.
Adam Fleischhacker has an excellent tutorial (Chapter 24: Compelling Decisions and Actions Under Uncertainty) which is a good starting point.
12 Steps of Statistics. A/B Testing Scheme
I assume you have been working on Module 2 (Fundamentals of Statistics and Probability) in parallel, so that we can get to a point where we discuss the practical and methodological aspects of A/B testing and randomized experiments.
In this lecture I bridge the gap between the theory/fundamentals and practice. We will see how an end-to-end process of experiment design looks like. In the end, you will learn how to make justified choices in how you set-up and run an experiment.
Marketing is one area of online businesses which heavily relies on randomized experiments and A/B tests. Firms try to make the most out of their advertisement spending (customer acquisition), by testing changes in user experience, promotion, merchandising, to convince a larger proportion of them to buy the product or subscribe (conversion rate).
Read this end-to-end example, which takes a lot of attention and care into verifying potential pitfalls. If you need other perspectives about what can go wrong, you can refer to: HBR article, 8 pitfalls and solutions, A/A tests, and user interference
- Before jumping into the hypothesis testing, we should carefully ask whether we need an experiment at all. 10 Maybe we want to explore or predict?
- Default action is one of the most important ideas in statistics.
- What is a statistical hypothesis anyways? How does it relate to the original question?
- Make sure you define the minimal effect size of interest/relevance
- Type 3 errors: be careful what is the question you really ask of your test
- Watch the 12 steps to statistics by Cassie Kozyrkov for a methodology of how to perform experiments in business settings. For a more traditional exposition, check out Poldrack’s Chapter 9.
- One of the most difficult aspects in practice is metric design, since we deal with tradeoffs so often. An useful way to think about properties of good metrics is described in this paper, STEDII. Also see this example discussing sensitivity, stability, and directedness.
10 Read Cassie Kozyrkov’s article to recognize when “not to waste your time with statistics”. You can also watch this video. You can also see her full, short lecture for Google employees.
Churn and Lifetime Value. Open datasets
Customer retention and repurchase is what makes or breaks businesses, especially in the past few years, with so many subscription and SAAS business models. Therefore, this topic deserves a full lecture – to cover the contractual and PAYG/non-contractual settings.
I show that predicting churn and remaining customer lifetime value is not just a simple classification / regression problem. First, we need some tools from survival analysis and second – what we really want to know is who can be convinced to remain by an intervention, like promotions or loyalty programs. 11
11 This is a good opportunity to introduce the idea of Uplift and Calibration of classification scores returned by a ML model.
There is a body of work spanning two decades by Bruce Hardie and Peter Fader, who developed statistical models that estimate the remaining LTV (Lifetime Value) of customers, which depends on their survival/activity and repurchase patterns.
We will implement a model for contractual and non-contractual settings on the limited data available on the web. Luckily, that would be sufficient to apply them for your own use-cases.
Last, but not least, I present an overview of curated open datasets from various sources, organized by industry – which should serve as an inspiration for future practice and your project. There is a shortlist of the ones I found most intriguing and representative, then a long tail of alternatives and fun stuff.
In previous courses you probably did a lot of exploratory data analysis. It is a skill which demands lots of practice and experience in order to do it well (finding inspiration and interesting patterns in data). Having a methodology for EDA helps, however, in practice we have to work with databases.
I use a relational dataset from a real e-commerce which was made public in Kaggle, in order to showcase how to interact with databases and highlight the importance of knowing SQL.
We use a recent innovation in databases, duckDB, which allows us to have an analytic, in-process database, with almost zero setup or dependencies. Once the e-commerce data is loaded and modeled inside the DB, we can start cleaning it, putting it all together, and extracting insights from it. This particular dataset is very rich in information and lends itself well to open-ended investigations.