Schedule with core readings
Part 1. Text as data approach
Week 1. Introduction (Oct 7th)
Gentzkow, M., Kelly, B., & Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3), 535-574.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 2.
Week 2. Economics (Oct 14th)
Gentzkow, M., Kelly, B., & Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3), 535-574.
Week 3. China Studies (Oct 21st)
King, G., Pan, J., & Roberts, M. E. (2013). How censorship in China allows government criticism but silences collective expression. American Political Science Review, 107(2), 326-343.
Part 2. Dictionary methods
Week 4. Keywords Search (Oct 28th)
Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. The Quarterly Journal of Economics, 131(4), 1593-1636.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 16.
Week 5. Sentiment Analysis (Nov 4th)
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
Week 6. Document Similarity (Nov 11th)
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 7.
Kelly, B., Papanikolaou, D., Seru, A., & Taddy, M. (2021). Measuring technological innovation over the long run. American Economic Review: Insights, 3(3), 303-320.
Week 7. Mid-term presentation and feedback (or guest lecture) (Nov 13th, Wed)
Part 3. Machine learning
Week 8. Text Regression: A supervised learning (Dec 2nd)
Gentzkow, M., Kelly, B., & Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3), 535-574.
James, G., Witten, D., Hastie, T, and Tibshirani, R. (2013). An introduction to statistical learning with applications in R. Springer Publication, Chapter 6. (邦訳:『 Rによる統計的学習入門』朝倉書店)
Week 9. Topic Model: An unsupervised learning (Dec 9th)
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 13.
Week 10. Latent Semantic Scaling: A semi-supervised learning (Dec 16th)
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 20.
Watanabe, K. (2021). Latent Semantic Scaling: A Semisupervised Text Analysis Technique for New Domains and Languages. Communication Methods and Measures 15(2): 81-102.
Part 4. Deep Learning
Week 11. Word Embeddings (Dec 23rd)
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press, Chapter 8.
Rodriguez, P. L., & Spirling, A. (2022). Word embeddings: What works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics, 84(1), 101-115.
Week 12. Large Language Models (Jan 6th)
Hansen, S., Lambert, P. J., Bloom, N., Davis, S. J., Sadun, R., & Taska, B. (2023). Remote work across jobs, companies, and space (No. w31007). National Bureau of Economic Research.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35.
Yang, J., Jin, H., Tang, R., Han, X., Feng, Q., Jiang, H., ... & Hu, X. (2023). Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond. arXiv preprint arXiv:2304.13712.
Week 13. Final presentation (Jan 20th)