by Matt Macri-WallerIn my role, I often get asked questions like “surely with all of the data you have you can just predict what employees can or will do”. When I was last asked about this, the person asking me used the word “should” instead of “can” or “will”, and it got me thinking about the role and legitimacy of recommendation engines within the employee benefits market.Now, we all live with recommendations and recommendation engines our entire lives; from the friends that are suggested by Facebook, to people like me on LinkedIn, to music on Spotify, to what “people like you” bought on Amazon. The root of most of these recommendation engines is technically called ‘collaborative filtering’. Collaborative filtering is built on the underlying assumption that if person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person. The core idea being that using this approach improves the accuracy of recommendations more than traditional methods.Typically, a collaborative filtering system works in the following way: a user (in this instance, an employee) expresses their preferences by reviewing and selecting items (in this example, benefits). These selections are then viewed as an approximate representation of the employee’s interest in that benefit. The system then matches one employee’s ratings against another and finds the people with most ‘similar’ tastes. With similar employees, the system then recommends benefits that the similar employees have reviewed or selected.There are many mathematical techniques which can be adopted to create recommendations (see cosine similarity) but crucially, an algorithm is created and the system in question uses mathematics to work out the popularity of something across a set of data, population or audience. Now, many algorithms have progressed significantly from these origins but a number (especially the ones I have seen in employee benefits) are still rooted in popularity alone and do not take account of any historic bias. You might think, well that's fine; popularity and “people like me” is what I want my employees to see. But when you start to analyse the basis of that recommendation, it throws up some questions:
- Is this based on your employees only?
- Is the data gathered across everyone your provider manages?
- Is it gathered from everyone, everywhere?
All will have different outcomes, and all will be susceptible to different biases or indeed manipulation (product placement, UX, promotion, existing ownership). Let me give you a few examples [some from our market and some from the wider HR market] of the types of things that concern me:Provider manipulationThere has been much work on behavioural psychology and buying behaviour, particularly by banks and insurance companies, who have large teams of people working to understand both why we buy, and how they can make us spend more with them. From screen placement, to User Experience (UX), to nudging, to special offers or incentives; each will be trying to manipulate, persuade, mould or cajole each of us into trying to achieve their outcome. On top of this, the majority of employees instinctively trust their employer to have filtered offerings they put in front of them (note: there is some variation internationally here). All of these factors combined together can create an interesting candidate for manipulation of employees. If you are working with a broker, or someone who makes revenue on the products offered to employees, do you care about the manipulation techniques used? Or which products are pushed harder than others? You may say no, but I have seen products with as high as 40% introductory commission – would that change your mind as to whether their ‘recommendation’ was the best option for employees?Reinforcing existing biasNow, the data that is used to produce the recommendation is arguably the most important part of any collaborative filtering algorithm, machine learning, or AI. You could crudely aim to use take-up data from submissions and enrolments of employees. As we all know, this data varies wildly by type of organisation, affordability and – most importantly – company funding. As most algorithms either don't have enough data or aren't clever enough to remove these distortion factors from them, would that leave you to question the accuracy of a recommendation? The difference between a funded and not-funded benefit on average is 92%, so would you be happy with a recommendation that could be over 90% inaccurate?Volume of data for recommendationsAs I mentioned above, data is the most critical part of any recommendation engine. When does collaborative filtering become effective?As for the items, you will likely have a few "best sellers", purchased by lots of users, a "middle section" of moderately popular users and a “long tail” of rarely used items. You do need all three of them. Partly because that's the usual behaviour (and you want your dataset to realistically reflect actual behaviour) and partly because, well, algorithms use them:
- Best sellers are not really good candidates for recommendation (people already know them) but they work as connectors (help to connect the item graph). Without them, the engine will typically partition the item space in isolated silos. Besides, recommending them often improves users’ perceived trust in the system.
- Middle items are probably the best source for recommendation candidates (they are the ones that can be extracted by a CF engine with the highest potential of success).
- Long tail items are more difficult to recommend, but if the engine can do it, they are a good source of serendipity.