Data Warehouse, Data Lake or Data Swamp? Actuarial Modeling in the Big Data Era with Tom Peplow

Summary: Listen now as April Shen, FSA, CFA interviews Tom Peplow (Principal, Director of Product Development - LTS) on the interaction between actuarial modeling and big data. In this episode, Tom discusses the basic concept on data warehouse, tools and resources actuaries can use to learn more.

Apr 24, 2020

33 minutes to listen

Topics:: Modeling & Statistical Methods

In this podcast, Tom introduces data-related tools for actuaries from data storage (Databricks, Snowflake, Google BigTable, Amazon RedShift and Microsoft Synapse) to streaming (Kafka, Amazon Kinesis and Azure Stream Analytics). He touches on Parquet, which has become the ubiquitous data format for raw data storage, as each vendor has a native blob store that is optimized. Much movement and innovation is currently present in querying the data lake directly.

Data preparation is prominent for actuaries either for feeding classical actuarial models or preparing data analytics, or for ML experiments. Power BI has helpful data manipulation capabilities that are excellent for data wrangling activities. Azure Data Factory Data Flows is a great tool for manipulating data at scale as is Alteryx.

Emerging Topics

Data Warehouse, Data Lake or Data Swamp? Actuarial Modeling in the Big Data Era with Tom Peplow