Resume
Projects
Cloud2Cloud Harvard / NASA Capstone Project
Cloud-Top Height Field Estimation from Aerial Imagery
- Cloud2Cloud focuses on accurately measuring cloud-top heights to enhance the calibration and validation of satellite radiometric instruments. This project is a collaboration between Harvard University and NASA.
- NASA developed the Fly’s Eye GLM Simulator (FEGS), a multi-spectral radiometer system with 30 radiometers and an HD camera, to validate the Geostationary Lightning Mapper (GLM) onboard the GOES-16 satellite. Mounted on the NASA ER-2 aerial laboratory, which flies at 70,000 feet, FEGS collected cloud imagery during a 2017 flight campaign alongside the Cloud Physics LiDAR (CPL), which provides precise but single-point cloud-top height measurements.
- Cloud2Cloud developed a predictive computer vision model that integrates high-definition FEGS imagery, LiDAR data, and aircraft metadata to estimate cloud-top heights and generate a three-dimensional height field. Our approach involved deep learning and optical flow techniques to extend single-point height measurements into full spatial height maps.
- Feature Extraction: The ConvNext-large CNN model was used to extract cloud features from high-definition images after fisheye correction and augmentation.
- CNN-RNN Model: A hybrid neural network processed sequences of cloud images alongside flight metadata to predict cloud-top heights at the center of each image.
- Optical Flow Geometry: The RAFT model, coupled with a parallax-based height estimation method, was used to create full cloud height fields. The Lucas-Kanade and TV-L1 optical flow methods were evaluated for tracking cloud motion.
- Height Field Generation: Optical flow-derived height estimates were calibrated using single-point LiDAR measurements to ensure accuracy.
- Field Stitching: Consecutive height fields were merged to create a large-scale, coherent 3D representation of the cloud structure along the flight path.
- Proposal for the project is located here.
- Final report for the project is located here.
NLP
- Research paper from 2022 on automatic fake news detection: Automatic Fake News Detection: Are current models “fact-checking” or “gut-checking”? presented at FEVER at ACL 2022
- Video provided for the online system for ACL 2022
- GitHub repo for the paper
- I gave an hour-long talk to the NeuLab at Carnegie Mellon in July of 2022
- Short video and repository explaining how RLHF works.
Vision for safety inspections
- August 30th article in Bloomberg "9 Smart Ways To Make Cities Better" mentioned my work on this project in part 6. Links to PDF and image of specific page.
- How AI Could Have Warned Us about the Florida Condo Collapse Before It Happened article for Towards Data Science.
- The video for the TDS article (featured on the page, but here it is directly)
Vision
- Search and Rescue using YOLOv5 using the Weights and Biases report.
- Co-authored a research paper on physical adversarial attacks on face recognition systems for biometric security for S&P 2023: ImU: Physical Impersonating Attack for Face Recognition System with Natural Style Changes
- My recorded presentation on the Gist: Efficient Data Encoding for Deep Neural Network Training paper from Microsoft. Link to slides here.
Visualization
- JavaScript D3 Visualization project on Fake News mostly focuses on COVID-19 propaganda (requires Chrome or Firefox on desktop). It was selected as best project for the class in my Masters program.
- I recorded the 2-minute video for the project in an old-timey mid-Atlantic accent for uh, fun.
Clarifai Blogs
- I've written about 60 blog posts for Clarifai. They can all be found here. Below are a few samples. I also maintained the Clarifai documentation for quite two years, so much of the newer content on their docs site was written by me using Meta's "Docusaurus" platform.
- Blog post on AI bias
- Creating AI workflows post
- Clarifai Quick Start post
Clarifai Videos
- I've recorded a good number of videos for Clarifai, and they can be viewed here. Below are a few samples.
- Enhancing LLMs with Retrieval Augmented Generation (RAG)
- AI-assisted data labeling
- Auto Annotation
- Something I created for a Webinar offered by Acquia / Widen (Digital Asset Management providers) for a demo on generating ChatGPT prompts using image classification.
- Another video for Acquia / Widen on relevant Clarifai features, where they had me re-record the intro after I'd gotten a haircut. I'm sure nobody noticed.
Promotional Clarifai Videos
- I've created slick promotional loops used at tradeshows using Adobe After Effects.
Virmuze
- Virmuze is a startup of mine that I worked on for a while. The National Security Agency (NSA) uses it to host the National Cryptologic Museum's online exhibits. It's an unusual point of pride for me as I also helped them create much of the online exhibit content during the COVID-19 pandemic.
- Link to Virmuze on nsa.gov (it's the colorful footprint icon next to the Twitter logo)
- Link to the museum itself on Virmuze
Database design
- I developed a systems project for a research class in big data systems in C++.
- It's a fully functional, modern LSM-tree (Log Structured Merge tree) write-optimized NoSQL key-value store. It supports tiered, leveled, lazy-leveled, and partial compaction by percentage level policies. It also offers MONKEY (Monkey: Optimal Navigable Key-Value Store) bloom filter optimization, internally multi-threaded range queries and compaction using a threadpool, and is also externally multi-threaded and can support multiple clients concurrently accessing the database with per-level blocking.
- Final report is located here
- A literature review on LSM tree key value stores is located here.
Teaching
- Teaching fellow for Fall 2023, CSCI E-89C Deep Reinforcement Learning.
- I teach a weekly section on foundational and advanced concepts in reinforcement learning and deep learning. I also grade assignments, and answer questions via class forum and email.
- Reinforcement learning topics include Markov Decision Processes (MDP), dynamic programming with the Bellman Equation, application of Monte Carlo methods in reinforcement contexts, temporal-difference Prediction & Control, including SARSA and Q-learning techniques, n-step TD and various Approximation Methods like stochastic-gradient, semi-gradient TD update, and Least-squares TD.
- Deep learning topics include techniques and principles behind training neural networks using backpropagation, strategies for tuning neural networks, with a focus on regularization, convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
- Deep reinforcement learning topics include value-based deep RL using Q-networks, policy-based approaches in Deep RL with REINFORCE, asynchronous methods for deep RL, with a spotlight on advantage actor-critic (A2C).
Retrieval Augmented Generation (RAG)
- I built a custom RAG system with an LLM that scrapes and answers questions on entire websites using LlamaIndex, Weaviate, LangChain, and GPT-3.5. It's hosted on Google Cloud Services and Google Cloud Storage, and uses Docker and Kubernetes for production use. As well, the project hosts a fine-tuned BERT model on Google Vertex for classification of the generated text, and the entire thing runs FastAPI on the backend and React in the frontend.
- Video RAG Detective: Retrieval Augmented Generation with website data
- Medium post
- GitHub repo
Harvard Extension Masters
-
I completed a master's degree in Data Science from Harvard University in December, 2024, and graduated on the Dean's List with a 4.0 GPA. The classes I took were:
- Data Modeling (R)
- Foundations of Data Science and Engineering (Python, SQL, Tableau)
- Deep Learning for NLP (Research, Python, PyTorch)
- Computer Vision (Python, Keras/TensorFlow)
- Deep Reinforcement Learning (Python, I later TA'd this class)
- Elements of Data Science and Statistical Learning with R (R)
- Time Series Analysis with Python (Python)
- Visualization (D3 JavaScript, HTML, CSS, Tableau)
- Big Data Systems (Research, C++)
- Productionizing AI (MLOps): AC215
- Pre-capstone proposal (cloud2cloud)
- Capstone project (cloud2cloud
Contact
Feel free to reach out to me: