Considerations in Machine Learning-Led Programmatic Underwriting

Privacy and Data Security

Underwriting is critical to insurance profits: Identify, qualify, and quantify the risk that an insurance policy covers and set the premiums across a pool of the policies to cover the risk. It is the original hedge fund, in many ways. But identifying the likelihood of the risk has been difficult historically. Life insurance underwriting, generally speaking, relies on a combination of mortality tables issued every number of years, some basic lifestyle information and some information about the potential insured’s medical history, all grouped together to assign the insured to a particular premium band, such as “male preferred nonsmoker” or “female select.” There has been limited gradation beyond that over the years.

Big data and machine learning are changing that, and the insurance industry is running to keep up. One approach has been to apply a combination of big data and machine learning to identify new correlations in the underwriting process and create ever narrower and more nuanced premium bands, thus increasing underwriting speed and profit, reducing underwriting cost and risk, and increasing the speed of the sales process as customers rely on technology interaction to purchase highly personal insurance policies. Yet this approach also presents risk to an insurance company if not rolled out properly. Here are eight things to consider:

1. Are automated underwriting decisions advantageous, or even permissible, under current or anticipated future data privacy laws? Navigating the constantly changing matrix of global privacy laws and regulations is challenging, to say the least. For example, the European Union’s comprehensive data protection law, the GDPR, permits individuals to object to automated decision making that produces legal effects on that person. Insurers offering policies to European residents—or planning to offer policies in the near future—may therefore find automated underwriting decisions to be less cost efficient or appealing. The same may be true for companies operating in other countries, as GDPR-style laws are being adopted increasingly in countries around the world. And it could be true in the United States, too. As a result of the November 2020 election, California will establish a privacy regulator to enforce an amended version of California’s privacy laws enacted in 2018. The regulator may promulgate regulations affecting automated decision making such as programmatic underwriting; no one knows yet. But the first batch of regulations are due by midsummer 2022.

2. Is it better to buy a model, license one or build it in-house? Data and the processing of it are big business. Many companies have sprung up to offer to corporate America machine learning tools for accomplishing various tasks, and insurance underwriting is no exception. Insurers that decide to license or buy a model from a third party should consider what happens to the data fed into the model for training purposes and for underwriting purposes, and how that data is used by the model. Indeed, if a model is licensed or purchased from a third-party developer, that developer may rely on access to the data to improve the product, which could raise concerns about compliance with privacy laws such as the Gramm-Leach-Bliley Act, the Health Insurance Portability and Accountability Act, or the California Consumer Privacy Act and its successor, the California Consumer Privacy Rights Act. According to how some read California's privacy laws, providing data in exchange for a nonmonetary benefit (such as “to improve the product”) could qualify as a sale, subject to those laws' restrictions on personal-data sales.  Even if that reading is not proven accurate, software is often copyrighted and typically licensed.  Those license agreements will need careful attention to concepts such as derivative works, to avoid raising privacy issues if the machine learning output and feedback loop are considered to be a derivative work owned by the software developer. On the other hand, developing the machine learning capabilities in-house may be prohibitively expensive and present additional risks.

3. What security protocols protect the model data, if stored offsite? Here, too, a company should consider the use of its underwriting data if contracting with a third party to analyze the data. This is an issue if the data is transferred from the company’s on-premises systems and out of its private cloud to any other form of infrastructure. Compromise of those systems could constitute a security incident, presenting myriad risks to the enterprise, and those risks only begin with any breach notification obligations; for example, the New York Department of Financial Services requires notification of a security incident. And what happens when a company’s data is mistakenly mishandled, accessed by unauthorized persons at a vendor or accidentally made available to a competitor? Equally important, are processing requirements or underwriting standards—intellectual property—put at risk?

4. What data is provided to software that will serve as the model? Programmatic underwriting relies on machine learning from a data set, creating a model—but the output is only as good as the input. That means that the artificial intelligence model may have limited ability to make qualitative decisions about the importance of certain pieces of information or to avoid basing pricing on other pieces of information. For example, sexual orientation or religious beliefs should not form the basis for an underwriting decision. And other types of data, such as credit reporting, may be prohibited for use in writing policies in certain states. Consequently, when providing the machine learning algorithm with the training set and additional training data, the underwriting department should consider what information it can and should provide. There are various risks here, including having to avoid now-discarded discriminatory pricing practices (whether it be race or HIV-positive status, for example) to avoid inappropriately contaminating the new machine-learning-derived pricing. Unfortunately, companies that do not have absolute control over and understanding of the data quality, the data input, and the algorithm may not learn of such risk until it’s too late.

5. How can the model be educated on advances in technology and medical treatments that may turn current signals in the data into mere noise? Data elements that may provide useful information today can, as a result of technological advances, become less important. Take HIV. It was a death sentence for years. But thanks to advances in treatment, now HIV now can be managed seemingly as a chronic illness. If changes happen quickly, a model might not know, leaving the insurance company at a competitive disadvantage when writing policies. COVID-19 is just the latest example of a situation in which mortality and morbidity rates may need immediate evaluation. Here, too, the model will take time to catch up, resulting in pricing insurance risk incorrectly.

6. How does the model handle blanks or unavailable data? The underwriting process needs to recognize the difference between bad information and information that is not provided. One life insurer faced a recent lawsuit over its underwriting process, which allegedly treated individuals who did not provide their tobacco-use status on their application as smokers. The court dismissed the suit as barred by the statute of limitations, but the existence of this type of lawsuit demonstrates the need to ensure the model handles unavailable data neutrally or through an acceptable proxy. (An unacceptable proxy opens the company to additional risk. What if it determined that people who lived in certain types of housing were more likely to be smokers?)

7. Is the model a black box or interpretable? Many artificial intelligence programs are black boxes: The data goes in and a result comes out, but the users do not know what data elements the program considered important, how those elements were weighed or how they impacted the result. Happily, a solution exists: the interpretable model. An interpretable model provides both the same output data that a black box model provides and information about the interim data the model relied on and how much it relied on each piece of data. Thus, a person can review the data and course correct:  to adjust changes in facts (e.g., the introduction of HIV drugs); to avoid erroneous reading of the insured’s information (e.g., treatment of certain names as masculine or feminine, notwithstanding the person’s identified gender); or to avoid discriminating based on data that the insurer has chosen not to use in pricing decisions (e.g., sexual orientation). Interpretable models with human review also provide an important backstop against class action litigation over programmatic underwriting: The human review and adjustment or confirmation of the premium may prevent a finding of common issues predominating over individual one in litigation over machine-learning-derived pricing.  On the other hand, using an interpretable model may necessitate developing the software in-house; third-party developers may be reluctant to allow customers to look “under the hood” to see what has been used to develop the modeling software.

8. What is the quality of the global matching data? One promise of big data and programmatic underwriting is the ability to match the insurer’s historical data about its insureds to the world at large. In theory, the insurer can evaluate the likelihood that an event in the world will strike one of its insureds. Here, again, that assessment is only as good at the global matching data. The COVID-19 pandemic and related excess mortality numbers illustrate this issue well. Those numbers are not necessarily precise, and it could be the case that more people have died than expected but from causes other than COVID-19. For example, a death could be the result of a heart attack or stroke when the victim couldn’t get a hospital bed due to overcrowding or because the deceased put off preventive care. Or the deceased could have had both COVID-19 and a significant comorbidity, but death was attributed only to COVID-19 or to the comorbidity. The machine learning algorithm wouldn’t know that and it might assign incorrect correlative risk as a result.

None of these eight points are a reason not to use machine learning to enhance the underwriting process. They are, however, issues that insurers should think about carefully as technology is implemented across their underwriting platforms.



pursuant to New York DR 2-101(f)

© 2021 Manatt, Phelps & Phillips, LLP.

All rights reserved