AI/LLM applications in biotech regulation and finance
I am looking for a student with strong computational skills and an interest in drug development and/or finance to assist with scraping, processing, and analyzing publicly available documents from FDA and SEC. The role would involve automating data collection, downloading, and processing of PDF files and integrating machine learning techniques to enable large-scale text analysis using large language models.
Background:
I am interested in how biotech and pharma companies decide whether and how to develop new medicines in particular clinical areas. I study questions with practical relevance, where the answers will help industry professionals make smarter choices about how to balance cost, time, risk, and reward in drug R&D.
My interest in these sorts of questions stems from my work as an advisor to biopharma companies and investors. Recent projects, whitepapers, and opinion pieces have focused on drug regulation, drug pricing, corporate and investor decision-making, and clinical trial strategy. (See here for details on these projects and others, and see here for more details on my professional background and expertise.)
The purpose of this project is to create the data infrastructure to be able to ask and answer these sorts of questions more efficiently.
Responsibilities:
- Write Python scripts to scrape relevant PDFs from public websites (e.g., drug approval documents from FDA, 10-K reports from SEC)
- Download, store, and organize large numbers of PDF files
- Extract text from PDFs
- Clean and preprocess text for advanced natural language processing (NLP)
- Set up a vector database for semantic search and text retrieval
- Implement LLM-based analyses (e.g., summarization, topic modeling, sentiment analysis) on the extracted text
- Deploy a simple interface (e.g., using FastAPI or Streamlit) for interacting with the LLM
Work would be independent and remote, with regular videoconference check-ins (approx. once per week) and additional ad hoc interactions as needed by video and/or email.
This work is well-suited to students with computational skills who are interested in future careers in biopharma (R&D, regulatory, project management, etc.), biotech consulting, or biotech finance/investing. Students will learn about key questions and issues in drug development, pharmaceutical regulation, drug commercialization, and biopharma investment. I provide students with 1:1 mentorship on projects as well as individualized career advice.