This post was written by Valery Ayala, Data Engineer at Oscar Insurance.
Oscar is using technology and design to make healthcare simple, intuitive, and human. As an insurance company, we support our members by presenting them with the information they need to make the best possible decisions about their care. As we think about how to provide the best possible experience for our members, one of our primary goals is to help them select the “right” healthcare provider for themselves.
But what does it mean for a provider to be the right fit for a member? A review of patient choice studies echoes what we may intuitively all know – it’s all about the “complex interplay between patient and provider characteristics.” To us, this means that we need to allow our members to search for providers across a wide range of features, including specialization, distance from home or work, and patient panel demographics.
As we set out to build Oscar’s provider search, we knew that the industry standard simplistic search based largely on a provider’s static attributes (e.g. years of experience, education, hospital affiliations, distance) had to be replaced with a more dynamic, personalized experience.
We were now faced with two challenges:
How could we efficiently adjust our sorting algorithm at query time to take into account what we know about the member’s expressed circumstances and preferences?
How could we systematically measure the impact that each of our sorting signals has on the final search results a member sees? Tuning search results with the “eyeballing” methodology is inevitably a recipe for disaster.
Based on our existing infrastructure, we chose to use Solr, an open source search solution built on Apache Lucene. It empowers custom relevancy score calculations through function queries and exposes structured intermediate score calculation steps via an explain output parameter (see debugQuery and debug.explain.structured). Using Solr, we built a custom analysis framework in Python for visualizing and measuring the impact of each sorting signal with matplotlib, pandas, numpy, and scikit-learn.
This allowed us to turn thousands of text-based explanations (like the one you see below) into visualizations and summary statistics.
This graph is just one of the visualizations we use to profile the impact of changes to our sorting algorithm. Specifically, it shows the proportional contributions of each signal to the final relevancy score for a series of test runs. These integration tests are deliberately designed to cover the high-dimensional space of search inputs (e.g. location density, specialty, condition, patient demographics, etc.). We use this to get an overview of the significance of each algorithm input across query types for our population. For example, when adding new sorting features, this visualization quickly reveals if the new signal unintentionally dominates the results for a swath of members.
To test out new algorithms, our Thrift services architecture running on Aurora/Mesos facilitated parallel test suite runs to simulate real member search experiences and compare them side by side. This allowed rapid tuning of new dynamic sorting algorithms, such as helping members find doctors who treat demographically-similar patients or finding specialists who have expertise with rare or idiosyncratic conditions or procedures (for NLP enthusiasts, think TF/IDF). Another such algorithm dynamically calculates provider density for a specific location and condition given a set of member defined filters. These results, in turn, determine how heavily to weigh distance in the sorting. We were able to do this using Solr’s faceting capabilities, adjusting the weight of distance at query time based on how many providers match the search within a given radius of the search location.
Helping our members navigate the healthcare system is no easy task, and with all the data and technology focus at Oscar, we’re really just getting started!
This post originally appeared on Oscar’s Blog.