Building a simplified Recommendation Engine in R

Have you ever been to a beverage store and don't know what do order? I did. Remembering how recommendation system works for Amazon, I wonder if I could build a similar solution to offer recommended drinks based on a drink preference questionnaire. In order to create a rapid prototype, we use dummy user preference data and map with our curated product attributes data to generate personalized drink recommendations. In this exercise, we have created 8 different drinking types and 19 drinks.

R is my preferred scripting language for rapid Machine Learning(ML) prototyping tool, so I utilized the existing R library to find similarities amongst the available drinks based on their product attributes. To better understand the inter connectivity between the drink attributes and the menu drinks, I have transformed the data and constructed a network graph for better visualization.

The network visualization shows the complexity of the data. Another way to better visualize the relationship is to find any potential hidden pattern between the drink attributes for our imaginary coffee shop. Leveraging the Principal Component Analysis(PCA) statistical technique, we discovered over 65% of the variation of the data can be explained by 2 themes.

As can be seen from the above plot, coffee drinks are less likely being a special drink in the drink menu for the first theme. Matcha, however, is closely related to being a special drink. For the second theme, having the floral attribute seems to be the distinct element separating the drinks. The higher the PC2 value, the more likely a drink contains the floral flavor. The lower the PC2 value, the higher the likelihood that the drink is fruity or nutty flavor. Next step, we can use readily R library's function to generate a list of dissimilarity matrix and simulate a user rating the drink preference given a suite of selected products that is relatively more representative from the themes we uncovered.

In the recommendation model, I've placed more importance on the hot drink or cold drink type and coffee or not drink type. The final rating is weighted based on the rated drinks. The final recommendations are Double Rich Matcha(100% similarity), Blue Butterfly(90%) similarity, and Hot Chocolate, Ice Double Rich Matcha, and Roast Almond Milk with 80% of similarity respectively.

The next step would be to build out an end-to-end ML process to a production system. Until next time...