Experience Extended| Decision Option Generator – A Meta Learning Approach in NAVIK

Reinforcement learning (RL) is a branch of machine learning wherein actions are taken based on interactions with an environment. The goal is to maximize a cumulative reward. Because of the ability of reinforcement learning to learn on the fly and auto-correct, it’s finding applications in a variety of fields and showing more benefits than traditional machine learning algorithms.

NAVIK AI’s Decision Option Generator

A host of individual models for churn propensity, collaborative filtering, purchase frequency, etc. are consolidated under the Decision Option Generator (DOG), the NAVIK AI suite’s proprietary algorithm.^[1] This algorithm is what builds the underlying recommendation engine for NAVIK MarketingAI and NAVIK SalesAI.

To incorporate custom business rules in the model outputs, DOG leverages meta learning. This leads to minimal anomalies in the final recommendations.

Meta learning, in its true sense, refers to training a model on a set of various related tasks with a limited number of data points; presented with a similar new task, the model utilizes the learnings from the previous tasks and doesn’t need training from scratch.^[2] Cognitively, human minds work in a similar manner: learning from different experiences and connecting those learnings to a new concept.

What Does Meta Learning Address?

Meta learning, for NAVIK, is needed to address a slew of factors like turbulent market dynamics, custom business needs that change over time, and the post-facto modification of recommendations due to users’ discomfort with them. Owing to meta learning, NAVIK products constantly learn from the above-mentioned factors and have the flexibility to modify the recommendations in the next model refresh cycle.

Appreciating and Depreciating Reinforcement Multipliers

DOG implements meta learning using a reinforcement multiplier – a term based on custom business rules that are multiplied to the final recommendation score.

Based on the number of wins by the number of pitches for various opportunities, there are two specific mathematical terms (known as appreciating and depreciating reinforcement multipliers, or ARM and DRM) that are multiplied to the outcomes (e.g. the Contact Probability Score) of the corresponding constituent DOG model (e.g. Contact Lead Scoring) or the Consolidated Recommendation Score. The values for ARM and DRM can be refreshed every week based on the win rates of the contact-product combinations.

References

[1] Meta Learning in NAVIK Products – Recursive Reinforcement

[2] Ravichandiran, S. (2018). Hands-On Meta Learning with Python

Authored by Nibedita Dutta, Data Scientist at Absolutdata