Nahin Khan
Education
2023-2025
ETH Zurich
- Masters of Science in Computational Biology and Bioinformatics
- Relevant Courses:
- Big Data, Probabilistic Artificial Intelligence, Applications of Deep Learning on Graphs, Computational Biology
2016-2021
Carnegie Mellon University
- Bachelors of Science in Computer Science
- Bachelors of Science in Biological Sciences
- GPA: 3.94
- Dean’s List High Honors: Fall 2016 - Spring 2021
- Relevant Courses:
- Computer Networks, Computer Security, Natural Language Processing, Complexity Theory, Machine Learning (Coursera), Graph Theory
- Molecular Biology, Immunology, Cell Biology
Work Experience
Jan 2022 - Aug 2023
Research Assistant, Qatar Computing Research Institute
- Contributed to bioinformatics-related projects
- Conducted literature search to familiarize myself with current state of research
- Implemented methods and created pipelines to analyze data and gain biological insights
- Automated processes using Makefiles and common command-line tools (awk, sed, xargs, etc)
- Trained CNN and BERT models to predict protein binding strength using Tensorflow and Hugging Face
- Wrote and submitted papers for publishing with various teams
Oct 2021 - Jan 2022
Full Stack Developer, sKora
- Set up development, staging, and production environments for product development on Linode
- Set up CI/CD pipelines for automated tests and deployments to Kubernetes clusters using GitHub Actions
- Implemented backend API using FastAPI and frontend using ReactJS
- Performed code reviews and gave feedback to team members
Dec 2020 - Jan 2021
Backend Engineer, sKora, Intern
- Scraped millions of records from a football transfer market website
- Populated internal databases after cleaning the data using MySQL-Python integrations
Aug - Dec 2019
Parallel Data Structures and Algorithms Teaching Assistant, CMU Pittsburgh
- Led weekly 50-minute recitation sessions to teach key concepts to students in SML
- Led review sessions to help prepare 250+ students for midterms and exams
Jan - May 2020
Ballroom Dancing Instructor, Carnegie Mellon University Qatar
- Co-instructed a Student-led Course (StuCo) designed to be an introduction to ballroom dancing
- Taught cha cha, waltz, foxtrot, tango, salsa and more
Skills
Coding
- Python, C, Bash, JavaScript, SML, HTML, CSS, Matlab, R, x86 Assembly
Technologies / Environment
- Kubernetes, Docker, GitHub Actions, Django, FastAPI, MySQL, Nginx, ReactJS, Linux
Machine Learning
- Pytorch, Pytorch Geometric, Tensorflow, Keras, CNNs, GNNs, VAEs, Transformers, BERT models
Open Source Contributions
May 2023
CMplot
- R package that allows users to generate various genomic plots
- Enhanced a particular type of plot by adding the option to view multiple x-axes
Apr 2023
langchain
- Enhanced the documentation by fixing typos
Apr 2023
alpaca.cpp
- Fixed a bug that could result in a segfault error to occur on some architectures
May 2022
Tensorflow
- Corrected the tutorial’s explanation for what the attention values mean in a transformer
Mar 2022
Mephisto
- Corrected quoting issues in part of the documentation explaining how to launch via Docker
Mar 2022
atomium
- A Python package useful for parsing PDB (3D molecular) files and manipulating 3D structural data
- Added a feature that allows the user to see the origin (organism) of each protein chain in the PDB file
- For example, an antibody-related PDB file often has human / mouse chains and viral chains together, which this PR makes viewable programmatically
Feb 2022
cs228-notes
- Corrected a slight error on course notes for Probabilistic Graphical Models (Stanford CS228)
Aug 2021
nginx-proxy
- Fixed a typo in the documentation
Aug 2021
ukemi
- Fixed a typo in the docs
Aug 2021
bedtools2
- Fixed a typo in the docs
Jul 2021
Docker Docs
- Fixed an error in the layout of the documentation
- Fixed grammatical errors in other parts of the docs as well
Jul 2021
celery
- Fixed doc grammar
June 2021
Python Packaging User Guide (pypa)
- Fixed a typo in the docs
May 2021
Missing Semester
- A useful resource for people to learn tools that are helpful for everyday use of computers (for people in CS-related fields)
- Improved the clarity for one part of the notes that explained input / output streams in shell programs
May 2021
TLDR
- Useful tool for looking up cheatsheets for console commands
- Added a page that explained how “bfg”, a tool for removing large files or passwords from Git history, works
Awards and Honors
Oct 2019
Andrew Carnegie Scholar and Qatar Campus Scholar, Awardee
- Selected for showing high standards of academic excellence and leadership in the community
May 2019
Fifth Year Scholar, Awardee
- CMU program that sponsors a student to study for an extra year after graduating
2017-2020
Qatar Foundation Scholarship, Recipient
- Competitive merit-based full scholarship given to students in Education City
Research Experience
Apr - Aug 2023
Genome-wide association study for ECG traits, QCRI
- Identified novel genetic regions that contribute to abnormal ECG patterns
Nov 2022 - Apr 2023
Improving Polygenic Risk Scores, QCRI
- Proposed methods for improving genetic risk scores when combining local population and published data
June - Nov 2022
Multiomics Project, Center for Precision Health and Medicine, QCRI
- Combined genomic and metabolomic data from 3000+ people with over 22,000 features
- Utilized multi-omic data to propose molecular mechanisms leading to coronary heart disease risk
- Proposed genetic and metabolomic risk factors to screen for coronary heart disease risk
Feb - Sept 2022
Antibody Project, Qatar Center for AI, Research Assistant
- Extracted binding strength data between antibody and antigen motif sequences using crystal structure data
- Trained machine learning models for predicting the strength of binding between antibodies and antigens
Jan - June 2020
Honors Thesis: A Bioinformatics Tool for Exploring RNA-Protein Interactions
- Developed a CLI tool that collects and visualizes RNA-Protein interactions (https://rnpfind.com)
- Developed features for studying correlation and overall-binding profiles of RNA binding proteins
- Utilized Django and Docker; run and managed on Linode hosts
April - Oct 2019
Genetically Engineered Machine Competition, Boston, Team Bioinformatician
- Built a machine capable of detecting recessive genetic disease in carriers in half an hour
- Collaborated with various international teams on molecular modelling of Cpf1, gRNA, and template DNA
June - Aug 2017
Woolford Lab, Mellon College of Science, Pittsburgh, USA, Research Intern
- Investigated the role of Drs1 in ribosome assembly
- Created spotting assays of mutagenized yeast strains and isolated preribosomes for protein analysis
- Constructed models of Drs1 function in assembly
- Presented results at Meeting of the Minds Conference (2018)
Aug - May 2017
Phage Genomics Research Course, Carnegie Mellon University
- Sequenced extracted DNA from isolated phages usign Ion Torrent machine to obtain DNA strand sequences
- Performed computational assembly and annotated the sequence DNA to generate a gene map
Projects
Dec 2023
Link Prediction on Knowledge Graphs
- Trained and evaluated an RGCN model on the FB15k-237 dataset
- Generated embedding representations for entities in a task-independent manner using contrastive learning
- Achieved mean reciprocal rank of 0.538
Dec 2023
Reinforcement Learning on a Pendulum
- Implemented an off-policy RL algorithm on the Pendulum environment
- Implemented soft actor critic (SAC) to directly predict Q-values and policy
- Used the modified variant which automatically sets the temperature
Nov 2023
Bayesian Optimization
- Optimized an unknown objective function that is costly time-wise to evaluate
- Addressed an unknown constraint function defining regions of unsafe evaluation
- Implemented a variant of the GP-UCB algorithm
Nov 2023
Approximate Bayesian Inference in Deep Neural Networks
- Implemented Stochastic Weight Averaging Gaussian (SWAG)
- Trained a deep CNN with posterior distribution estimates for parameters using SWAG
- Implemented histogram binning and temperature scaling to improve calibration
Oct 2023
Weather prediction with uncertainty estimates
- Used Gaussian Processor Regressors to estimate the weather based on location
- Improved efficiency by k-means clustering and learning multiple GPRs for each cluster
- Evaluated the model with an asymmetric cost to model realistic city requirements
Oct 2023
Custom Message Passing Layers in GNN
- Investigated the effects of breaking permutation invariance in a GNN message passing layer
- Concluded that some datasets benefit from breaking invariance due to hidden structures in node ordering
March 2021
Mini BitTorrent
- Wrote a peer-to-peer program for sharing files between multiple hosts in C
- Designed and implemented a TCP-like protocol for sharing chunks of data (over UDP)
- Implemented congestion control and the sliding window algorithm
Feb 2021
HTTP 1.1 Compliant Server
- Wrote an international standard (RFC2616) compliant web server in C, supporting GET, HEAD, and POST
- Integrated CGI support
May 2021
Question and Answer Model
- Wrote a program that generates questions from an article, and answers questions regarding the article
- Used word2vec word embeddings to find similarity between sentences in the article and the questions
- Utilized co-reference resolution to increase similarity matches
Nov 2020
Mailpile: Open-source Contribution
- Submitted a bugfix for an open-source mail client
- Fixed several issues by fixing the root cause of allowing duplicate email address registration
- Fixed both front-end and back-end
Oct 2020
Dynamic Memory Allocator
- Wrote a dynamic memory allocator (malloc) in C with high utilization (69%) and throughput (14k KOPS)
- Utilized a segregated free list and engineered a custom region for small-sized allocation requests
Nov 2020
Automated Theorem Prover
- Developed a prover of theorems in intuitionistic logic written in SML and Prolog
Nov 2016
Python Chess A.I.
- Developed a chess program with Artificial Intelligence in Python and shared online for a course
- Has been referenced by several repositories on GitHub and has attracted community attention