Sicong Zhao - Portfolio

SICONG ZHAO

Sicong is a data science nerd with 5 years of product design and management experience. He is currently a Master's candidate of Data Science at Duke University, set to graduate in May 2020, and eagerly looking for a data science related position.

His career started at Baidu, and he has had great success with start-ups. Sicong is passionate about using data science and design thinking to make insightful decisions and create great products, especially for those that have a positive influence on users' daily lives.

Since his dream of playing in the NBA did not quite work out, Sicong now enjoys playing basketball recreationally in his free time.

Resume

Contact

Talk with me

Go to Recommendations

PROJECT 1

Predicting Emotional States Using Wearable Devices

OVERVIEW

This project is about determining if negative emotional states can be predicted from wearable activity sensor data. It's an ongoing project and currently, the major achievements are:

1. Two models predict negative emotion:

A model Use Fitbit + Neural&Psychological Features
Recall: 0.63, Precision: 0.48, F1: 0.54

A model use Only Fitbit Data
Recall: 0.55, Precision: 0.35, F1: 0.43

In contrast, the precision of random guess is 0.21.

2. Insights negative emotion: Among all our 161 predictors, Neuroticism stands out with highest feature importance score in terms of both 'Loss Function Change' and 'Prediction Value Change'. It negatively influence human emotion.

In order to achieve our goal, we trained our models based on data collected from the Motivated Cognition and Aging Brain lab in Duke Psychology & Neuroscience department, which contains:

· measures of personality and behavior
· demographic data
· physical health metrics
· activity tracking data (Fitbit)
· functional brain connectivity

MY CONTRIBUTION:

I am in charge of feature engineering and modeling.

I have created meaningful features from band data (steps & heart rate by minute) within a certain period [5m, 10m, 30m, 1h, 3h] before the experience sampling (when we record emotional states of participants). Features including basic statistics of hear rate and steps, resting time, activity level and variation of heart rate. Among all these features, the 'variation of heart rate in last 30 mins' performs the best. And there are 13 engineered feature in top 30 important features (measured by 'Loss Function Change').

Modeling wise, from a business point of view, we need to account for new users and existing users. So I ended up with 2 type models. With first type trained on stratified split data and second type trained on data split by group. Due to the positivity bias (people tend to say they are happy), we have imbalanced data with only 10.8% records report negative emotion. To account for that, I tested different sample technologies (downsample majority class, oversample minority class, SMOTE) with different type of models. The best performance model is trained through oversample minority class with CatBoost.

TECH I USED

· CatBoost / XGBoost / Random Forest / Logistic Regression / Lasso
· Random Grid Search
· Cross Validation with Stratified Data Split, By Group Split
· Manipulate imbalanced data: downsampling, upsampling, SMOTE
· Model Analysis: SHAP Value, Feature Importance

The code is written in Python, and you can find them here in
GitHub

OVERVIEW

The end goal of this project is to help doctors detect risks of cardiology decompensation as early as possible. We collaborate with Duke University Hospital (DUH) to develop a dashboard that indicates cardiac decompensation risks for each patients.

MY CONTRIBUTION

· Extracting data points from historical database
· Help define phenotype definition (rules that generate outcome) from a data science's perspective
· Implement algorithms that determine if an encounter meets any phenotype definitions
· Calculate mortality rate for encounters with different types of phenotype combination

TECH I USED

· SQL
· Pandas / Numpy
· Tableau

NEXT STEPS

1.Use CatBoost and RNN to build predictive models
2.Build meaningful features to improve the model performance
3.Help Integrate Dashboard into Duke University Hospital's system

Cardiology Decompensation in Duke University Hospital

PROJECT 2

TEAMMATES

Mikella Green

Joaquin Menendez

PROJECT 3 [Self-Directed Project]

Value Investing In Python - A Data Science Tutorial

OVERVIEW

This is a tutorial inspired by Preston Pysh and Warren Buffet.

I learned Preston's video tutorial about how Warren Buffet conduct fundamental analysis for stocks, and realize it would be great if I could do fundamental analysis automatically and at large scale. Then researched data source and implemented a python program to automatically conduct fundamental analysis for all 6000+ stocks in us market.

I have posted my tutorial on Medium, and then been invited to publish my articles on 2 Medium Publications: The Capital & Analytics Vidhya.

THE TUTORIALS

1. Syllabus
2. Collecting financial data for fundamental analysis
3. How to Generate these Popular Stock Terms using Python
4. How to calculate the intrinsic value

TECH I USED

· Linear Regression
· RNN - LSTM
· Selenium
· Pandas, including `unstack()` and `pivot()` function

The code is written in Python, and you can find the intrinsic value calculation part here in GitHub, Google Colab.

NEXT STEPS
1. Automate the intrinsic value calculation
2. Serve the results in a webpage. Here is the plan for the webpage.

Use WeChat to Scan and Experience DeepDive

MY CONTRIBUTION

At the beginning of the contest, looking at the crowded building, I felt we were not going to standout if we only work on the models. Given that most of the competitors have a data science / statistical background, I do not think modeling alone would give us a good winning chance. Because it is a large competition, given the same data, eventually the model performance would be very similar. It should be the one team with more information stands out.

So I communicated with my team with my idea, and we agreed to prioritize outside data exploration before modeling. Then I explored a lot of possibility, and finally found the detailed game information on Wikipedia. The information includes substitution, foul, goal with time stamp and the round of each game.

I then scraped the game data, and calculated following features based on my understanding of sports:

Continuous Features:
· The # of times of the change of leading team
· The integral of score difference over time when the Canadian team took the lead
· The integral of score difference over time when the opponent team took the lead

Categorical Features:
· Was it a blow out game?
· Was it a come back game?

The report of this competition can be found here: Report

OVERVIEW

DataFest is a data analysis competition where teams of up to five students attack a large, complex, and surprise dataset over a weekend. I participated with my team in DataFest 2019 @Duke University, and we had the honor of winning 'Best Use of Outside Data' among 82 competing teams!

The contest provided us GPS data, game results and experience sampling data (survey results), and the end goal was to gain insights about the players' fatigue level. We tried different approach and found out XGBoost gave the best performance, achieved a model with AUC to be 0.83.

We impressed the juries by using unique outside information, the game related information and weather data. Therefore, we were rewarded with the 'Best Use of Outside Data'.

PROJECT 4

Fatigue Prediction for Canadian National Women's Rugby Team - DataFest 2019

OVERVIEW

This is a sleep aid app that help users fall asleep naturally. This app provides guided meditation and nature sounds to help users fall asleep. And provide sleep scheduler and daily tasks planner to help users organize their days, just so to enhance the awareness of the time to sleep.

DeepDive is launched on Oct. 31, 2019 in my private social media page, and 681 of my friends tried the app. Up till Feb. 7, 2020, there are still and 42 monthly active user without any marketing (6.2% conversion rate).

DATA DRIVEN DESIGN

The key metrics for me to evaluate how well the app performs is the number of times when a user use my app and actually fall asleep.

I am collecting de-identified interaction data to help improve the performance of my product. The way to tell if a user fall asleep after using the application is that if they stay inactive after 6 hours of the end of meditation guidance or nature sounds.

Besides, I am looking at the distribution of exit time of each sounds. Just so I could understand the users' attitude towards each audio. And using this information to replace the content and rearrange their layout.

USER SUREVY

So far, I have interviewed over 30 users, asking about their feedback about the app, and sleep experience. The key I learned is that, it is the bad habits stops some of them from falling asleep, and it is the good habits / routine that help them have a great sleep. Based on the result, I am designing and building the following version focused on formation of habits which help sleep.

PROJECT 5 [Self-Directed Project]

DeepDive - Sleep Assistant App

RECOMMENDATIONS

Lizhi Wu
Director of Product
AInnovation Inc.

Sicong and I have worked together at Aibee, and I recommend Sicong as an excellent teammate with well-rounded skills. He exhibited lots of outstanding characters such as a strong sense of responsibility, a hard-working team player, and he is also intelligent and creative. I felt very lucky to have Sicong in our team and wish him all the best in his future career!

Will Ratliff
Innovation Program Manager
Duke Health

I have had the distinct privilege of managing Sicong at the Duke Institute for Health Innovation over the summer of 2019 and over the past several months. Sicong stands out as an exceptional colleague in that he has the skill sets to collaboratively manage and directly lead the work to solve a problem. Throughout my experience working with him, Sicong demonstrated that he could effectively assess a problem with thoughtful questioning and a natural curiosity to understand. He would consider and prioritize the best solution, and then rapidly identify and execute on related tasks, effectively communicating with me throughout. Combining this skill set with the strong technical abilities he possesses (data science, data engineering, UI design, product strategy, the list goes on...), Sicong adds tremendous value to any team he is a part of. It has been inspiring to collaborate with someone who shows this level of motivation and ability to own the work.

Jiawei Gu
CEO and Co-founder
Ling Technology

It's a pleasure to work with someone as talented as Sicong. We have worked very close from April 2014 to December 2015 at Baidu Research. As his supervisor, we cooperated in two AI projects, BaiduEye and FaceYou. In his work, Sicong proved he was an excellent product manager with a great product design capability and business acumen. Besides, his understanding of AI technology helped to create novel user-interaction solutions in BaiduEye. I truly believe Sicong would be a great asset to any team. And I am looking forward to cooperating with him in the future.