Google DS Interview - sophiekeke/casestudy GitHub Wiki

https://www.interviewquery.com/p/probability-interview-questions

Overview Probability interview questions are often asked in both data science and data analytics fields at FAANG companies and other big tech firms.

Testing your probability knowledge provides companies with a good idea of your analytical reasoning skills and intelligence, and they often take the form of a case study. You will be given a scenario and then asked to compute the probability for that given scenario. Besides case studies, conceptual questions are common too.

To recap, the most common types of data science probability questions are:

Conceptual Probability Interview Questions - These are definition-based questions that cover basic concepts like distributions, covariance and correlation. Events Probability Interview Questions - These are simple case studies where you calculate the probability of an event. Combinatorics Probability Interview Questions - These are logic-based questions that assess your knowledge of combinatorics. Probability Distributions Interview Questions - These questions are case-based and ask you to calculate probability based on distributions.

Conceptual questions test your knowledge of probability theory. These are short, quiz-like questions that ask about types of distributions, definitions of concepts like the Central Limit Theorem, or use cases for concepts like Bayes’ Theorem.

To answer this type of probability question successfully, your answer must be accessible to a layperson.

  1. How would you explain a probability distribution to a layperson? Probability distributions represent random variables and associated probabilities of different outcomes. In essence, a distribution maps the probability of various outcomes.

For example, a distribution of test grades might look similar to a normal distribution, AKA bell curve, with the highest number of students receiving Cs and Bs and a smaller percentage of students failing or receiving a perfect score. In this way, the center of the distribution would be the highest, while outcomes at either end of the scale fall lower and lower.

  1. What is the difference between the Bernoulli and binomial distribution? A Bernoulli distribution models the event of conducting one trial of an experiment with only two possible outcomes, like a coin flip (Heads/Tails) or which team will win the Super Bowl in a given year (49ers/Chargers).

Binomial distribution models the event of conducting n number of trials that have two possible outcomes, like tossing a coin 100 times (Heads/Tails) or asking 50 people if they have visited Hong Kong (Yes/No).

  1. Explain how a probability distribution could not be normal and give an example scenario. A probability distribution is not normal if most of its observations do not cluster around the mean, forming the bell curve. An example of a non-normal probability distribution is a uniform distribution, in which all values are equally likely to occur within a given range.

A random number generator set to produce only the numbers 1-5 would create such a non-normal distribution, as each value would be equally represented in your distribution after several hundred iterations.

Sample: Uniform Distribution: In a uniform distribution, every value within a certain range has the same probability of occurring, so there’s no central peak or clustering around a mean. rolling a fair six-sided die multiple times. Each outcome (1, 2, 3, 4, 5, or 6) is equally likely, meaning the distribution is flat, not bell-shaped.

Skewed Distribution: A distribution could also be non-normal if it’s heavily skewed to the left or right. For example, income distribution in many countries is right-skewed, where most people earn near the lower end, with fewer high-income outliers.

  1. What is Bayes’ Theorem? In probability theory and statistics, Bayes’ Theorem refers to the probability of an event based on conditions that exist. Essentially, the theorem allows us to update our beliefs about a random event based on what we know about the event.

For example, if the risk of customer churn increases the longer a user has been inactive, Bayes’ Theorem allows us to more accurately assess the churn risk for users because we can condition the probability of churn to how long the user has been inactive.

image

Customer Churn: Prior Probability P(Churn)P(Churn): This could be the general probability of any customer churning, say, 10%.

Likelihood P(Inactive for X days | Churn)P(Inactive for X days | Churn): If we know that customers who are likely to churn have a higher chance of being inactive for a certain period, we can use that information. For example, customers who churn are 70% likely to be inactive for 30+ days.

Evidence P(Inactive for X days)P(Inactive for X days): This is the overall probability of any customer being inactive for 30+ days, regardless of whether they churn or not.

Posterior Probability P(Churn | Inactive for X days)P(Churn | Inactive for X days): Bayes’ Theorem calculates this updated probability, giving us a more accurate likelihood that a customer will churn, given they’ve been inactive for 30 days.

image Based on this calculation, there’s a 45% probability that a user will churn if they haven’t logged in for 30 days. This is significantly higher than the overall churn rate of 15%, suggesting that a 30-day inactivity period is a strong indicator of potential churn.

  1. What is the difference between covariance and correlation? Provide an example. Covariance can take on any numeric value, while correlation can only take on values between -1 (strong inverse correlation) and 1 (strong direct correlation).

Note: A zero value for correlation means there is no relationship between the two variables.

Therefore, the relationship between two variables can have a covariance that seems high but only a middling correlation value.

Correlation is considered a standardized measure of the relationship between two variables because it adjusts for the units and scale of the data. By dividing the covariance by the product of the standard deviations of both variables, correlation converts the relationship into a dimensionless value between -1 and 1, making it much easier to compare relationships across different datasets or variables with different units and ranges.

  1. What is the difference between the Law of Large Numbers and the Central Limit Theorem? The Law of Large Numbers says that a sample mean is an unbiased estimator for the population mean and that the error of that mean decreases as the sample size grows.

The average of your sample is predictive of the average of the entire population, becoming more accurate with a larger sample, while the Central Limit Theorem states that as a sample size n becomes larger, its distribution can be approximated by the normal distribution (it will appear more like a normal Bell Curve).

Law of Large Numbers (LLN)

The Law of Large Numbers states that as the sample size increases, the sample mean becomes closer to the population mean. Essentially, with a larger sample, we expect the sample mean to "settle" around the true population mean, reducing the error or variability in the sample mean. This law provides assurance that our sample statistics are reliable estimators of population parameters with enough data.

Key Points of LLN:

Convergence of the Mean: With a large enough sample size, the sample mean approximates the population mean.
Reduced Error with Larger Samples: The error (difference between sample mean and population mean) decreases as sample size grows.
No Specific Distribution Assumed: LLN does not require any particular distribution shape (it applies regardless of whether the population distribution is normal, skewed, etc.)

Central Limit Theorem (CLT) The Central Limit Theorem states that as the sample size n becomes larger, the distribution of the sample mean (or sum) approaches a normal distribution, regardless of the original distribution of the population. This is crucial for inferential statistics, as it enables us to apply techniques that assume normality (like confidence intervals and hypothesis tests) even when the original data isn't normally distributed.

  1. How do Probability Mass Functions and Probability Density Functions differ? a probability mass function (sometimes called probability function or frequency function1(https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-1)) is a function that gives the probability that a discrete random variable is exactly equal to some value Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

A probability mass function differs from a probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be integrated over an interval to yield a probability.3(https://en.wikipedia.org/wiki/Probability_mass_function#cite_note-:0-3)

The value of the random variable having the largest probability mass is called the mode.

  1. How would you explain confidence intervals to someone with no data background? In probability, confidence intervals refer to a range of values that you expect your estimate to fall between if you were to rerun a test. Confidence intervals are a range that is equal to the mean of your estimate plus or minus the variation.

For example, if a presidential popularity poll had a confidence interval of 93%, encompassing a 50%-55% approval, it would be expected that, if you re-polled your sample 100 more times, 93 times the estimate would fall between the upper and lower values of your interval.

Those other seven events would fall outside, which is to say either below 50% or above 55%. More polling would allow you to get closer to the true population average and narrow the interval.

9. What is an unbiased estimator? Give an example for a layperson. An unbiased estimator is an accurate statistic that is used to approximate a population parameter. An example would be taking a sample of 10,000 voters in a political poll to estimate the total voting population.

There is no such thing as a perfectly unbiased estimator because this would require you to accurately survey the entire population for your sample, impossible amongst what is often millions of eligible respondents.

10. Say you flip a coin 10 times. It comes up tails 8 times and heads twice. Is this a fair coin? To determine if a coin that lands on tails 8 times out of 10 flips is fair, we use the binomial distribution, which models the probability of outcomes in experiments with two possible results (heads or tails). For a fair coin, the probability of getting exactly 2 heads (and 8 tails) out of 10 flips is quite low, suggesting that the coin might not be fair.

Checking with a different bias, such as a 40% chance of heads, shows a higher probability for this outcome, indicating the coin could be biased towards tails. Advanced statistical methods can further analyze the coin’s fairness.

11. What is the Martingale strategy, and how might it be used in online advertising? Martingale strategy is a gambling concept that involves doubling one’s bet after each loss with the intention of recovering previous losses and making a profit.

This can be adapted and applied in scenarios involving sequential decision-making or hypothesis testing. One potential application is in the field of online advertising, where advertisers may use a Martingale-like approach to adjust bidding strategies for ad placements based on past performance.

Events Probability Interview Questions The main object of study in probability is events. An event is simply an outcome of some experiment, such as flipping a coin. Questions on events typically focus on games of chance and ask you to determine the probability of an event occurring.

These probability interview questions deal with independent and dependent events:

  1. What is the difference between independent and dependent events in probability? Provide an example for each. Independent events do not affect the outcome of another event, while dependent events do affect the other’s outcomes.

For example, if you were asked to toss a coin 100 times, a coin flip would be an independent event because the probability of each successive flip would remain 50-50. Getting heads on the first flip does not influence your chances of either a heads or tails on the second. Drawing a card from a deck (without replacement) would be a dependent event, because with each draw the deck gets smaller, affecting the outcome of each successive draw.

  1. A co-worker tells you he has two children, and that at least one is a boy. What is the probability that the co-worker has two boys? (Assume sex is assigned by the hospital at birth). First, start by listing the possibilities. There are four possible outcomes for this family:

The first and second are boys (BB). First is a boy, and second is a girl (BG). First is a girl, and second is a boy (GB). The first and second are girls (GG). We can rule out No. 4 since we know at least one of the children is already a boy. Out of the three remaining, only 1 is right. Therefore, the probability is ⅓.