Projects
Cloud2Cloud Harvard / NASA Capstone Project - Ongoing
- Cloud2Cloud focuses on accurately measuring cloud-top heights to enhance the calibration and validation of satellite radiometric instruments. It is a joint project with Harvard Extension School and NASA.
- NASA developed the Fly’s Eye GLM Simulator (FEGS), a multi-spectral radiometer system with 30 radiometers and an HD camera, to validate the Geostationary Lightning Mapper (GLM) on the GOES-16 satellite. It's mounted on the NASA ER-2 aerial laboratory, a plane which flies at 70,000 feet. During a 2017 flight campaign, the ER-2 collected data using FEGS and the Cloud Physics LiDAR (CPL) to measure cloud heights.
- While LiDAR provides precise cloud-top heights, it offers only single-point values. Cloud2Cloud aims to develop a predictive computer vision model that combines high-definition images from FEGS with LiDAR data to estimate cloud-top heights accurately and create a three-dimensional height field.
- Proposal for the project is located here.
NLP
- Research paper from last year on automatic fake news detection: Automatic Fake News Detection: Are current models “fact-checking” or “gut-checking”? presented at FEVER at ACL 2022
- Video provided for the online system for ACL 2022
- GitHub repo for the paper
- I gave an hour-long talk to the NeuLab at Carnegie Mellon in July of 2022
Vision for safety inspections
- August 30th article in Bloomberg "9 Smart Ways To Make Cities Better" mentioned my work on this project in part 6. Links to PDF and image of specific page.
- How AI Could Have Warned Us about the Florida Condo Collapse Before It Happened article for Towards Data Science.
- The video for the TDS article (featured on the page, but here it is directly)
Vision
- Search and Rescue using YOLOv5 using the Weights and Biases report.
- Co-authored a research paper on physical adversarial attacks on face recognition systems for biometric security for S&P 2023: ImU: Physical Impersonating Attack for Face Recognition System with Natural Style Changes
- My recorded presentation on the Gist: Efficient Data Encoding for Deep Neural Network Training paper from Microsoft. Link to slides here.
Visualization
- JavaScript D3 Visualization project on Fake News mostly focuses on COVID-19 propaganda (requires Chrome or Firefox on desktop). It was selected as best project for the class in my Masters program.
- I recorded the 2-minute video for the project in an old-timey mid-Atlantic accent for uh, fun.
Clarifai Blogs
- I've written about 60 blog posts for Clarifai. They can all be found here. Below are a few samples. I also maintained the Clarifai documentation for quite two years, so much of the newer content on their docs site was written by me using Meta's "Docusaurus" platform.
- Blog post on AI bias
- Creating AI workflows post
- Clarifai Quick Start post
Clarifai Videos
- I've recorded a good number of videos for Clarifai, and they can be viewed here. Below are a few samples.
- Enhancing LLMs with Retrieval Augmented Generation (RAG)
- AI-assisted data labeling
- Auto Annotation
- Something I created for a Webinar offered by Acquia / Widen (Digital Asset Management providers) for a demo on generating ChatGPT prompts using image classification.
- Another video for Acquia / Widen on relevant Clarifai features, where they had me re-record the intro after I'd gotten a haircut. I'm sure nobody noticed.
Promotional Clarifai Videos
- I've created slick promotional loops used at tradeshows using Adobe After Effects.
Virmuze
- Virmuze is a startup of mine that I worked on for a while. The National Security Agency (NSA) uses it to host the National Cryptologic Museum's online exhibits. It's an unusual point of pride for me as I also helped them create much of the online exhibit content during the COVID-19 pandemic.
- Link to Virmuze on nsa.gov (it's the colorful footprint icon next to the Twitter logo)
- Link to the museum itself on Virmuze
Database design
- I developed a systems project for a research class in big data systems in C++.
- It's a fully functional, modern LSM-tree (Log Structured Merge tree) write-optimized NoSQL key-value store. It supports tiered, leveled, lazy-leveled, and partial compaction by percentage level policies. It also offers MONKEY (Monkey: Optimal Navigable Key-Value Store) bloom filter optimization, internally multi-threaded range queries and compaction using a threadpool, and is also externally multi-threaded and can support multiple clients concurrently accessing the database with per-level blocking.
- Final report is located here
- A literature review on LSM tree key value stores is located here.
Teaching
- Teaching fellow for Fall 2023, CSCI E-89C Deep Reinforcement Learning.
- I teach a weekly section on foundational and advanced concepts in reinforcement learning and deep learning. I also grade assignments, and answer questions via class forum and email.
- Reinforcement learning topics include Markov Decision Processes (MDP), dynamic programming with the Bellman Equation, application of Monte Carlo methods in reinforcement contexts, temporal-difference Prediction & Control, including SARSA and Q-learning techniques, n-step TD and various Approximation Methods like stochastic-gradient, semi-gradient TD update, and Least-squares TD.
- Deep learning topics include techniques and principles behind training neural networks using backpropagation, strategies for tuning neural networks, with a focus on regularization, convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
- Deep reinforcement learning topics include value-based deep RL using Q-networks, policy-based approaches in Deep RL with REINFORCE, asynchronous methods for deep RL, with a spotlight on advantage actor-critic (A2C).
Retrieval Augmented Generation (RAG)
- I built a custom RAG system with an LLM that scrapes and answers questions on entire websites using LlamaIndex, Weaviate, LangChain, and GPT-3.5. It's hosted on Google Cloud Services and Google Cloud Storage, and uses Docker and Kubernetes for production use. As well, the project hosts a fine-tuned BERT model on Google Vertex for classification of the generated text, and the entire thing runs FastAPI on the backend and React in the frontend.
- Video RAG Detective: Retrieval Augmented Generation with website data
- Medium post
- GitHub repo
Harvard Extension Masters
-
I'm a degree candidate for an ALM in Data Science, and have finished the 11 courses for my masters, with only the final capstone project remaining to be completed in December, 2024. I have maintained a 4.0 GPA in the following 11 classes:
- Data Modeling (R)
- Foundations of Data Science and Engineering (Python, SQL, Tableau)
- Deep Learning for NLP (Research, Python, PyTorch)
- Computer Vision (Python, Keras/TensorFlow)
- Deep Reinforcement Learning (Python)
- Elements of Data Science and Statistical Learning with R (R)
- Time Series Analysis with Python (Python)
- Visualization (D3 JavaScript, HTML, CSS, Tableau)
- Big Data Systems (Research, C++)
- Productionizing AI (MLOps): AC215
- Pre-capstone proposal (cloud2cloud)
Contact
Feel free to reach out to me: