Big Data Asset Pricing


Big Data Asset Pricing

Lasse Heje Pedersen (LHP) and Theis Ingerslev Jensen (TIJ)

Course coordinator
Lasse Heje Pedersen

The course is designed as a first-year Ph.D. course. The prerequisites are knowledge of asset pricing theory and econometrics at a M.Sc. level and an ability to work independently with data using a programmatic computer language such as Matlab, R, or Python. Students must participate in the whole course and do all problem sets.

The aim of the class is to introduce Ph.D. students in finance and related fields to empirical asset pricing research methods using big data.

Course content
The course provides students with empirical asset pricing tools to use big data to analyze modern topics in financial economics. The course starts with a quick overview of asset pricing, empirical asset pricing, and how to work with big financial data. The course then covers the factor zoo, multiple testing adjustments, replication, machine learning in asset pricing, and asset pricing with frictions. In addition to the theoretical discussion, the students will gain access to a large data set of global equity returns and use this data to solve several mandatory exercises, which constitute an essential part of the course. Each student must make their own solution to each exercise and be able to explain this solution and present it.  Students are allowed to discuss the exercises and solution methods, but students are not allowed to copy each other. Students must disclose in their solutions if code has been copied from public sources (using public code is perfectly fine, but should be disclosed), and should disclose any other material used.

Teaching style

Lecture plan
16 February 2023-2 March
Watch the videos, download the data, and play with the data (before and in parallel with Lectures 1-2) so you are ready for Exercise 2
Lecture 3: Working with big asset pricing data (video, TIJ 3h)
WRDS, CRSP, Compustat, JKPfactors, global data 

23 February 2023
Lecture 1: A primer on asset pricing (hybrid, LHP 3h)
Stochastic discount factors, tradable and non-tradeable factors, factor models

2 March 2023
Lecture 2: A primer on empirical asset pricing (hybrid, LHP 3h) 
How to make and use factors, time series and cross-sectional regressions, predictability in the time series and the cross section, further methods
Discussion of Exercise 1 (Beta-dollar neutral portfolios)

9 March 2023
Lecture 4: The factor zoo and replication (hybrid, LHP, TIJ 3h)
Replication crisis, frequentist and Bayesian multiple testing adjustments
Discussion of Exercise 2: Construct Value Factors

16 March 2023
Lecture 5: Machine learning in asset pricing (hybrid, LHP, TIJ 6h)
Validation, hyper-parameters, penalized regressions, trees, neural networks, feature importance, asset pricing applications 
Discussion of Exercise 3: Factor replication analysis
Work on Exercise 4 

17 March 2023
Lecture 6: Asset pricing with frictions (hybrid, LHP 3h)
Transaction costs, market liquidity risk, funding liquidity risk, frictions meet machine learning

23 March 2023
Lecture 7: Discussion of Exercise 4: High-dimensional return prediction (hybrid, TIJ 1h)

Learning objectives
The course objectives are to:
• Work with big financial data, including making factors
• Apply factor models to estimate risk and expected return
• Estimate stock return predictability via regressions and portfolio sorts 
• Evaluate potential replication crisis and the factor zoo
• Implement multiple testing adjustments using frequentist and Bayesian methods
• Apply machine learning to asset pricing data
• Analyze financial market frictions

There is no final exam, but students must satisfactory complete of all the mandatory exercises. The class is graded as pass/fail.

The course is offered through The Nordic Finance Network, and the Department of Finance at CBS will cover the course fee for PhD students from other NFN associated universities.

Start date

End date




Course Literature
Course participants are expected to have read the assigned reading before each class. The lecture plan lists the preliminary readings, but final readings will be listed on the “canvas” website. 

Lecture 1: Notes are self-contained, but familiarize yourself with
• Ch. 6 and 12 in Cochrane, J. H. (2009). Asset pricing. Princeton university press. 
• Kozak, Nagel, and Santosh (2018). Interpreting factor models. The Journal of Finance 73(3), 1183-1223.

Lecture 2: Notes are self-contained, but contain a list of classic references that you should be aware of. See in particular:
• Cochrane (2011). Presidential address: Discount rates. The Journal of Finance 66(4), 1047-1108.

Lecture 3: Self-contained

Lecture 4: 
• Harvey, Liu, and Zhu (2016). . . . and the cross-section of expected returns. The Review of Financial Studies 29 (1), 5-68.
• Jensen, Kelly, and Pedersen (2021), “Is There a Replication Crisis in Finance?”
• See references in notes as background reading

Lecture 5: Notes are self-contained, but read the first paper here and familiarize yourself with the next two references:
• Gu, Kelly, Xiu (2020). Empirical asset pricing via machine learning, Review of Financial Studies.
• Kozak, S., S. Nagel, and S. Santosh (2020). Shrinking the cross-section. Journal of Financial Economics 135 (2), 271-292.
• Friedman, Hastie, Tibshirani. The elements of statistical learning. 

Lecture 6
• Ch. 12.2-3 in Campbell, J. Y. (2017). Financial decisions and markets: a course in asset pricing. Princeton University Press.
• Frazzini and Pedersen (2014), Betting Against Beta, Journal of Financial Economics 111 (1), 1-25.
• Also familiarize yourself with other references in notes.

Lecture 7: No reading

Books for background reading:
• Bali, T. G., Engle, R. F., & Murray, S. (2016). Empirical asset pricing: The cross section of stock returns. John Wiley & Sons.
• Campbell, J. Y. (2017). Financial decisions and markets: a course in asset pricing. Princeton University Press.
• Cochrane, J. H. (2009). Asset pricing. Princeton university press.
• Duffie, D. (2010). Dynamic asset pricing theory. Princeton University Press.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
• Ferson, W. (2019). Empirical Asset Pricing: Models and Methods.
• Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning. New York: Springer series in statistics.
• Pedersen, L. H. (2015). Efficiently inefficient. Princeton University Press.


Minimum number of participants

Maximum number of participants

Copenhagen Business School


Class starts:
9 am every day

Contact information
For the content: Lasse Heje Pedersen ( or Theis Ingerslev Jensen ( 

For the administration of the course: Bente S. Ramovic (  

Registration deadline

Please note that your registration is binding after the registration deadline
Register here