Product Fail: Precision vs. Recall

This is a real-world example of the precision vs. recall tradeoff tech teams make. In this case, I am arguing Google Workspace made the wrong decision or has a bigger bad actor problem than we all realize.


I am not sure what is going on with Google Calendar and Gmail. I suspect the Google Workspace team has over-indexed on enterprise businesses. The problem they tried to solve was to keep people from false meetings. The problem the rest of the world is experiencing is probably compounded by a product team that has failed to have a good set of counter-metrics.

These failures have created a horrible user experience for consumers and solopreneurs. It is probably costing small businesses and individuals collectively millions (maybe billions) of dollars.

My Experience

On Sunday, I had a client schedule and then ghost me not once, not twice, but three times. The problem: anyone “out of organization” or “not in your address book” who sends you a calendar invite is marked as spam by default! My client didn’t realize the calendar invite she had signed up for was marked as spam and thus wasn’t added to her calendar. She was frustrated with me, thinking I had not properly responded to her booking.

When the email is marked as spam, the calendar invite doesn’t even show up as a tentative meeting on your calendar, it is stuck in your email, awaiting your acceptance and act of marking it as “not spam”.

In fact, most booking and marketing emails lately from small companies are marked spam by default. Marking emails as not spam happens 10 to 30 times a day for me. I try but fail to teach Google’s algorithms to play nice with my use cases. My poor client didn’t know about the issue because their life doesn’t depend on it. (At least not yet.)

Counter Metrics

Dear Google Workspace Team,

Please look at your metrics. Here are some suggested metrics to consider:

  • # of meetings never attended

  • # of emails marked not-spam

  • # emails marked not-spam after time on the calendar invite

    • this is a rough guess on people slowing figuring out what went wrong

  • # of meetings rescheduled

  • # of Meet calls where no-show occurs and invite was marked as spam

  • # of repeat bookings for same person in a week

  • # of complaints about meetings not showing up in calendar

  • # of small businesses canceling services

  • # posts complaining about missed meetings

  • # of events from Calendly (and other services) that were never confirmed

How did this happen?

Google taught us to trust that if we got an invite in our email, it would automatically link to our calendar. Now, almost every invite we get to our personal email is marked as potential spam and we are missing meetings left and right.

How many people lost job opportunities or clients because the event never showed up on their calendar (but they knew they had seen it in their email)?

Google: How long must I keep tagging ~40% of my email as safe before you learn?


The Technical Tradeoff

In data science or engineering terms, this is a precision and recall tradeoff.

Let's review the concepts first:

  • Precision is the percentage of items in the result set that are relevant.

  • Recall is the percentage of relevant items that are returned in the result set.

Let’s baseline some other terms before diving in:

  • True Positive (TP)

  • True Negative (TN)

  • False Positive (FP)

  • False Negative (FN)

In equation terms,

  • Precision = TP/(TP+FP)

  • Recall = TP/(TP+FN)

This illustration from Wikipedia has helped me for years.

Age-Old Battle: Precision vs. Recall

When we talk about algorithms, in the real world, it is practically impossible to have 100% precision and 100% recall, you have to make a tradeoff. Most of us focus on one over the other. When I was at Google Search Ads, I would see teams index on precision one year and recall the next, yo-yo-ing back and forth as they got complaints for doing too much of one or the other.

In fact, precision and recall are frequently at odds with one another. When you focus on improving recall, precision will suffer, and you will have questionable results. On the other hand, if you improve precision, recall suffers, and your results will omit perfectly good matches.

In this case, my favorite quote is from Daniel Tunkelang:

Recall is about the whole truth, while precision is about nothing but truth.

Applying in the Real World

To reduce the impact of spammy meetings, Gmail and Google Calendar teams seem to have indexed on precision over recall. OR They have focused on recall but their ability to tag false negatives is woefully inadequate.

Gmail is tagging as relevant (non-spam) only a very small percentage of the meetings I am sent as actually being safe. They have done this to ensure I see “nothing but the truth.” Or, their ability to know what is true is so weak they think they have the whole truth but they do not, in fact, have it.

For enterprise companies that have struggled with security breaches, this makes perfect sense. But Gmail and Google Calendar are used by billions of people in situations where the rules of enterprise don’t apply. Collectively, Google is probably costing small businesses more than they are saving enterprises at this point.

Remember, Google Workspace is working at scale. I don’t truly have any idea how many false positives or true negatives the system is seeing every day. They could be focused on slowly improving precision but the spam bots + bad actors make it nearly impossible to keep up.

Quick Solution

One simple heuristic Google could apply would be to partner with major scheduling companies like Calendly and approve them by default for individuals. Please Google Workspace, consider more of a focus on recall or at least build a heuristic to save most of the world from missing crucial appointments, meetings and gatherings. Maybe have a team focus on labeling all relevant elements. Your battle against spam is getting out of hand in the world of Calendar invites.

Or maybe, consider enabling immediate filtering based on my non-spam submissions. I submit a minimum of 3 non-spam notes a day, some days as many as 30. If you can’t fix the algorithms quickly, try a hack and enable more filters and automatically create filters for me.

Or, perhaps, give me a pop-up or email once a day warning me that X invites in my email have been marked as spam and I need to go and review them to make sure I don’t miss a crucial meeting.


The Impact

Note: I lost roughly 10 job leads because recruiters reached out, but the emails went to spam. I use a forwarded university email for my applications. It has been nearly 12 months and the only thing that fixed it was a brute force filter on that forwarded email.

In Their Defence

The real culprit here is bad actors trying to abuse email and calendar services. But with the race to improve AI, how are we still soo far behind on meeting invites? Maybe we are even regressing when it comes to precision and recall balance with simple calendar invites. Really, how many non-enterprise users are subject to bad calendar invites?

If it really is bad, I would love a pop-up statistic every time I have to mark something as not being spam and thank me for fight X bad emails per day.

Resources

Precision & Recall Explained using Fruit

Precision & Recall in Search

Nothing but the Truth

Previous
Previous

Small Biz Tests: Substack Domain Registration

Next
Next

Data-Driven, Data-Informed or Data-Inspired