Majesco Blog
Sep 11, 2015 | By: John Johansen | In: Data Strategy

Is that the data talking or an imposter?

In April, a large life insurer announced plans to use Fitbit data and other health data to award points to insureds with impressive life insurance discounts for those who participated in "wellness-like" behaviors. The assumption is that people who own a Fitbit and who walk should have better mortality. That sounds logical. But we're in insurance. In insurance, logic is less valuable than data-proven facts.

Biases can creep into the models we use to launch new products. Everyone comes to modeling with his or her own set of biases. In some conference room, there is probably a little bit of statistical analysis on a whiteboard that says, "If we can attract people who are 10% more active, in general, we will drive down our costs by 30% allowing us to discount our product by 15%." That is a product model. But that model was not likely based upon tested data, but it was likely a biased supposition pretending to be a model. Someone thought they used data, when all they did was to build a model to validate their assumptions.

Whoa. That kind of thinking should make us all pause, because it is actually a common occurrence - not everything that appears to be valid data is necessarily portraying reality. Any data can be contorted to fit someone's storyline. The key is to know the difference between data cleansing and preparation, and excessive manipulation. We continually have to ask if we are building models to fit a preconceived notion or if we are letting the data drive the business to where it leads us.

The reason we worry about biases is because they are directly related to results. As a kid, my Superman costume didn't make me Superman. It just let me collect some candy from the neighbors. The lesson is that if insurers wish to enter into an alternate reality by using biased data, they shouldn't expect results that match their expectations. Rose-colored glasses tend to make the world look rosy.

Here's the exciting part, however. If we are careful with our assumptions, if we wisely use the new tools of predictive analytics and if we can restrain ourselves from jumping through our hypotheses and into the water too soon, objective data and analytics will transport us to new levels of reality! We will become hyper-knowledgeable instead of pseudo-hyper-knowledgeable.

Data, when it is used properly, is the key to new realms, the passport to new markets and a secure source of future predictive understanding. First, however, we have to make it trustworthy.

Advocating good data stewardship and use.

In general, it should be easy to see when we're placing new products ahead of market testing and analysis. When it comes to insurance, real math knows best. We've spent many decades perfecting actuarial science. We don't want to toss out fact-based decisions now that we have even more complete, accurate data and better tools to analyze it.

When we don't use or properly understand data, weak assumptions begin to form. As more accurate data accumulates and we are forced to compare that data with our pre-conceived notions, we may be faced with the reality that our assumptions took us down the wrong path. A great example of this was the introduction of Long Term Care insurance. Many companies rushed new products to market, later realizing that their pricing assumptions were flawed due to larger than expected claims. Some had to exit the business because the costs were much higher than anticipated. The companies remaining in LTC made major price increases to premiums.

Auto insurers run into the same dangers (and more) with untested assumptions. For example, who receives discounts and who should receive discounts? Recently, a popular auto insurer that was giving discounts to drivers with installed telematics, announced that it would begin increasing premiums on drivers who seemed to have risky driving habits. When the company started using telematics, it made the assumption that those who chose to use telematics would be those who normally drive more safely and that just having the telematics would cause them to drive more safely. The resulting data, however, proved that some discounts were unwarranted and that just because someone was willing to be monitored, didn't mean that they were a safe driver. Now the company is using the smart approach and is basing pricing on actual data. It has also implemented a new pricing model by testing it in one state - another step in the right direction.

When we either predict outcomes before analyzing the data or we use data improperly, we taint the model we're trying to build. It's easy to do. Biases and assumptions can be subtle, creeping silently into otherwise viable formulas.

Let's say, for example, that I'm an auto insurer. Based upon an analysis of the universe of auto claims, I decide to give 20% of my U.S. drivers (the ones with the lowest claims) a discount. That assumes that our mix of drivers is the same as the mix throughout the universe of drivers. After a year of experience, I find that I am having a higher cost than I anticipated on my claims. When I go back to apply my claims experience to my portfolio, I find that actually, only the top 5% are a safe bet for a discount based on a number of factors. Now I've given a discount to 15% more people than ought to have had it. Had I tested the product, I might have found that my top 20% of U.S. drivers were safe drivers, but they were also driving higher priced vehicles - those with a generally higher cost per claim. The global experience didn't match my regional reality.

Predictions based on actual historical experience, such as claims, will always give us a better picture than our "logical" forays into pricing and product development. In some ways, letting data drive your organization's decisions is much like the coming surge of autonomous vehicles. There will be a lot of testing, a little letting go (of the driver's wheel) and then a wave of creativity surrounding how it can be used effectively. The result of letting the real data talk will be the profitability and longevity of superior models and a tidal wave of new uses. Decisions based on reality will be worth the wait.

This blog is co-authored by Jane Turnbull

Jane Turnbull is the Vice President of the Data Science Center of Excellence within Majesco Data Services. She has over twenty years of management, technical, customer-facing and leadership positions in consulting, data science, business intelligence, and product development.

Prior to joining Majesco, Jane was a Senior Manager in the Advanced Analytics practice at Accenture. During her tenure at Accenture, Jane led and managed the design, development, implementation and management of next-generation informatics and advanced analytics solutions. In this role, she specifically focused on sales and marketing analytics, which generated over $500 million in incremental and retained revenue for one large client. Prior to Accenture, Jane held a variety of analytical roles of increasing responsibility at Experian, Equifax, and Acxiom.

Jane has a B.S. in Mathematics from The University of the South and an M.S. in Applied Probability and Statistics from Northern Illinois University.

distributed by