Metrics: Measure What Team Can Impact

Today I did a mock where I was reminded of the importance of developing metrics that matter to the goal and the team.

My Bias: Too many candidates jump to engagement or DAUs as their default metric. This has caused me to over-index on some indicators of quality. (Think DAU, where Activity is defined as more than just opening the app.) I also worked on optimizing Google Search, so just clicking on a link was never a good metric in my eyes, but sometimes it could be. As a mature product, clicking on a suggestion was something we looked at but would never make our decision metric. We needed to know, did they click on the link for more than a split second.

Today I was working on a thought exercise: How to improve the Apple App Store, and I found myself soo focused on quality that I overcomplicated the measure. My engineering team would have complained that I missed the point of our efforts.

Where I went wrong

I set my goal as improving search. Or improving the user’s ability to find what they are looking for, but I measured for the perfect outcome: finding the right app that you used going forward.

An Exercise for You

With that goal: “Improve Search” which metric do you think is most appropriate (or wrong) and why? (I normally emphasize the importance of three parts of the metric count, action, and time period, for this exercise I want you to focus on count and action).

(I have used the accordion format so you can see my thoughts AFTER you think through the answer for yourself).

  • Some might argue this is a good initial metric.

    This is not my favorite one because it seems too easy to game. Particularly if you are adding a new feature, the novelty effect might drive high clicks even though people don’t like what they find when they click through.

    We should watch it, but I wouldn’t make it my top or decision metric for a feature release.

    There could be a case made for it, but it would only work with a number of counter-metrics to make sure the suggestions are good.

  • This is the metric I would go for if I owned a goal of improving Search in the Apple App store. If we provided suggestions that were compelling enough for the user to download them, is a great indicator that we are suggesting something they like.

    I would want to be looking at the hit rate, so what percentage of presented suggestions are downloaded? I might also want to know # of searches per session. If I searched 5 times before I downloaded something, there is a real problem.

  • This is a great metric if I am focused on the problem of too many unused apps downloaded. If I was focused on optimizing the accuracy and/or quality of recommendations, this would be a decision metric.

    In the case of improving search, this could be a secondary metric as I look to understand quality with a longer-term approach to quality but it wouldn’t be something my team could goal on. There are too many factors that impact why people stop using apps, and in this case, my goal is to improve the search experience.

  • I would rarely make a key decision metric a percentage. There is the old numerator/denominator problem. If the denominator shrinks, you look to be improving your impact but in reality something is impacting your metric.

    Yes, I definitely want to look at this metric, along with its core elements: # of suggestions clicked, over number of suggestions provided during a session. But this is not my primary decision metric.

  • See description above on percentages.

    With that in mind, I definitely want to look at this metric, along with its core elements: # of suggestions downloaded over, over # of suggestions provided

    as well was

    # of suggestions downloaded over # of suggestions clicked

    during a session. But neither way of calculating this percentage would be my primary decision metric.

    I would look at the two percentages to see how much they differ. If they are huge, I am probably not very accurate in my suggestions. If too many suggestions get clicked on, but no action is taken, the human is having to weed through suggestions too much.

How did you do? Do you agree with my logic? Do you have a strong counter to my suggested metrics?

In most execution interviews, being able to explain the why is half the battle. Who you are interviewing could also impact their impressions of your answer. Does your interviewer focus on working at scale, and so simple is better? Or does your interviewer spend their days focused on the tail user accounting for null results and bad experiences?

Some tips to keep in mind:

  • Don’t overcomplicate your metric

  • Find something close to the action you are trying to drive,

  • Think about how the engineering team would feel about the metric

  • Are there a number of external factors which would muddy the waters of your metric?

    • For example: If I am improving the search experience, am I responsible for apps that fail to meet their promise?

  • Think about how your metric can be gamed

  • Balance early actions taken with quality measures.

  • Think about when initial action vs. quality action is the best metric.

    • Think downloads vs. downloads used a week later.

  • What is the problem you are solving based on the user pain point you identified?

  • Are you trying to improve the top or the bottom of the user funnel?

    • Measure accordingly.

  • What is the stage of the product?

  • What are current user trends and expectations?

Photo by Anne Nygård on Unsplash




Previous
Previous

Pen and Paper vs. Digital Document

Next
Next

Senior Candidates Identify the Why