Experiences

Mediacom Communications

Jan 2023- Present

Paramount

May 2022 - December 2022

-When working with product analytics, understanding why something is happening is arguably more important than understanding what is happening in the first place

-We need to be able to determine if what we are seeing is just a random blip, or the beginning of a shift in the way our users interact with our content

-If multiple anomalies are identified, it's critical to address the more severe anomaly first

-Once the anomaly is identified, to identify the reason behind the anomaly, we developed a contribution analysis model and the core components of the model are

Feature Selection - We only select the x highly correlated features to the metric that we are analyzing
Cramer's V - Cramér's V is a number between 0 and 1 that indicates how strongly two categorical variables are associated
Weighted Correlation
Contribution Score - ( Individual weighted correlation score of the metric * 100 ) / total weighted correlation

- The next project I was working on was Anomaly Severity Analysis and the core components are

Vector Autoregression - VAR model performs its operations by capturing linear relations between multiple variables, The original variables are regressed against their own lagged values and the lagged values of other variables
Impulse Response Function - These trace out of the the effect on dependent variables in the VAR to shock all the variables in the VAR
Scoring the Severity - Classify the severity based on the VAR and standard deviation of the forecast into (caused by other metric, low, medium, high and major)

AISEC LAB, Stevens Institute of Technology

January 2022 - May 2022

-One of the major issues we are facing currently in Machine Learning is the Data Privacy, The users data is being sent into the server and the subsequent actions are taken place based on the data on server, the data leak can happen here

-To overcome this, we were working on a method called as Distributed Distillation

-In this particular approach(P2P network), we will not have any teacher or server, In other words, there will be 'n' number of devices and the devices will learn from each other, In this approach students have non-overlapping user data that cannot be shared

-General Distributed Distillation Algorithm:

1.You have the dataset divided into Private training data, Reference Data, Testing data 2.The model will train on the private data

3.The model will now predict the Reference data and this prediction is then sent to the Teacher, where the teacher will perform operations on that and will send out the updated weights to all the models 4.This will go on till all the models converge 5.Once the convergence is attained, All the models then predict on the Test data

-We modify this algorithm as below

•The student loss here is the training loss

•The distillation loss is the squared error between the logits of the model (Euclidean or 2-norm distance) , the KL divergence between the predictive distributions, On reference data

•In this work, we use the crossentropy error treating the teacher predictive distribution as soft targets.

•In the beginning of training, the distillation term in the loss is not very useful or may even be counterproductive.

•so to maintain model diversity longer and to avoid a complicated loss function schedule we only enable the distillation term in the loss function once training has gotten off the ground. •loss = (alpha * student_loss) + (1 - alpha) * distillation_loss

- We tried this method on the classification problem and we achieved the max accuracy of 96% with 10 different devices with private data and Le-net algorithms on all the devices

The presentation of this project can be found here

Stevens Institute of Artificial Intelligence - Machine Learning Research Assistant

January 2022- May 2022

-This project is carried out in collaboration with Eastech Company.

-Sources of I&I, if left unchecked, can cause backups into homes and businesses, posing health risks and costing millions of dollars to clean up. To proactively prevent these scenarios from happening, our goal is to use data to identify possible sources of I&I to make identification easier

- To do this, we will be utilizing data given to us by EastTech through sensors found in sewer systems. These sensors are strategically placed to measure the flow rate of the sewer pipe contents

- The gains achieved going from a simple baseline model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worthwhile on this problem

-Traditional machine learning methods (naïve bayes, decision trees, etc.) are also not capable of making robust predictions on the given set

-A time series analysis involving ARIMA was very successful in predicting water levels

-The ARIMA model was also able to detect when water levels would surge with a high degree of accuracy

-It appears that only previous water levels are necessary for predicting future levels; gain, temperature, etc do not affect model predictive power

Footprint, Munich - Data Science Intern

September 2021- January 2022

My responsibilities as a Data science Intern here at Footprint included

1. developing an Intelligent CO2 matching algorithm

2.Restructuring and integrating new environmental research data to the backend

3.Deriving dynamic models to calculate the carbon footprint in real-time and streamlining the workflow

4.Create a database of manipulable Inverted Indexes using Data Mining knowledge

Kero Labs - Machine Learning Intern

Aug 2020 - Dec 2020

During my period here as an Intern, I

Developed the Grammar Identifying and Feature Matching attributes to the Applicant Tracking System.
Administered a group of three and worked on API creation of the Applicant Tracking System model.
Worked on ML concepts related to Principal Component Analysis and Independent Component Analysis

Healthism - Digital Marketing Intern

May 2018 - June 2018

During my summer internship in 2018, I worked on a project related to digital marketing along with a highly effective team at Healthism.

Blackfrog Technologies - Electrical Engineering Intern

September 2017 - December 2017

During my four-month stint at this government-funded start-up, I co-developed solutions for the following projects,

1. Automatic sliding and swinging gates

2. Portable Vaccine Carrier

3. Home automation system for IOS systems

AIESEC Suez (Kanaka Cafe) - Frontend Developer Intern

June 2017 - August 2017

During my internship in Egypt, I worked on web development, specifically front end development using javascript. I helped build a scalable e-commerce website for advertising products of a restaurant named "Kanaka Cafe".