Vishesh Gupta

{ Projects }

Project Modules:

[Data Visualization]

[Data Manipulation]

[Data Mining]

[Data Management]

[Exploratory Data Analysis]

Prediction of Bankrupt and Non-Bankrupt Companies

Machine Learning-Based-Bankruptcy-Prediction-for-Polish-Companies

This project focuses on predicting bankruptcies among Polish companies using various classification models. Under the guidance of Prof. Ramin Mohammadi, the research explores logistic regression, Gaussian Naive Bayes, decision tree, and support vector machines.Predicting corporate insolvency is crucial for stakeholders and investors. The project delves into the evolution of bankruptcy prediction, exploring statistical techniques and contemporary approaches.

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Flask

Statistical Learning

Logistic Regression

Bias Variance Trade-off Analysis

Support Vector Machine (SVM)

Decision Trees

pdf

Twitter Tweets Truth Recognition

A novel and effective approach on Truth Discovery Analysis with Wordnet for DataMining

We solved the truth discovery problem, a rich set of principled approaches have been proposed in data mining, and network sensing communities. However, three important challenges have been well addressed for the existing truth discovery solutions in social media sensing applications. The Three important challenges (i.e., misinformation spread, data sparsity, and scalability) in solving the truth discovery problem in big data social media sensing applications

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Flask

Scalabale and Robust Truth Detection

Computation Analysis

Data Visualization(DataViz)

Data Manipulation

Exploratory Data Analysis

pdf

Predictive Healthcare Prediction System

Multi-Condition Health Diagnosis Analysis using Machine Learning

This is a web application that aims to help with the diagnosis of various diseases such as breast cancer, chronic kidney disease, coronary heart disease, liver disease, and diabetes mellitus by analyzing different parameters. The application employs machine learning models on the backend, which is a technique of integrating several ML models to provide accurate and reliable results.

ToolStack: Python (scikit-learn, NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), StreamLit, XGBoost, StatsModels

Breast Cancer

Heart Disease

Kidney Disease

Hepatus Liver Disease

Database & Relationship

Link

Predicting Crime-Patterns in World Wide Comunities

Uncovering Crime Patterns in Communities: A Data-Driven Approach

This project utilizes regression analysis and dimensional reduction techniques to predict crime rates in communities. By training the model on historical crime data, it can generate future crime rate predictions. The regression analysis incorporates socioeconomic indicators, demographic data, and geographical features to capture underlying patterns. Dimensional reduction techniques extract informative features, reducing complexity and improving predictive performance. The trained model allows users to input community data and receive crime rate predictions, enabling proactive measures for crime prevention and resource allocation.

ToolStack: Python (scikit-learn, NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Heroku

Data Mining

Exploratory Data Analysis

Regression

Dimensionality Reduction

Feature Engineering

pdf

Coronary Heart Disorder System

An Analysis on Improved Predicition System for Coronary Heart Disorder

We proposed a Heart Disorder Prediction System that would automatically identify whether or not a personsuffers from a heart disorder. The CHD analysis takes into account a number of heart predictions, such as the risk of heart attack and the likelihood of heart failure. We use the K-Nearest Neighbor, Random Forest, and Decision Tree models to make our predictions.

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Flask

Scalabale And Robust Truth Detection

Cross Validations

Data Visualization(DataViz)

Data Cleansing

Feature Selection

pdf

Investment and Tourism Database System

A Case Analysis on A&S Consulting Group Database Management System

We designed a database for a consulting group using Entity Relationship Diagrams and the Relational Model. To ensure a robust and efficient database structure, we used advanced concepts such as aggregation, composition, and specialization. We created over a dozen tables for investment and tourism data using MySQL and MongoDB. We were able to conduct in-depth analytics and uncover trends in life expectancy, import and export, and fertility rates for the consulting group by connecting the database to the front end with Python.

ToolStack: SQL WorkBench, Python (Pandas), Colab Notebook, MongoDB Compass

Database (DB) Design

Database (DBMS)

MYSQL

Mongo DB

Data Manipulation

Data Visualization(DataViz)

pdf

Air Quality Index and Covid Survillance Analysis

Assessing the Impact of Air Quality on COVID-19 Hospitalization in the United States

This project aims to study the relationship between air quality and the risk of COVID-19 hospitalization in the United States. The project uses two data sources: Air Quality Index (AQI) data from 1980 to 2021 and COVID-19 case surveillance data from 2020 to 2021. The study will use statistical methods and data visualization techniques to answer questions about the effect of lockdown on air quality, the correlation between AQI and COVID-19 hospitalization, controlling for confounding factors, and presentation of results.

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Power BI

Jupyter Notebook

Data Cleaning

Data Manipulation

Data Visualization(DataViz)

Exploratory Data Analysis

pdf

Employee Performance Prediction

The Analysis on predicting the Performance of Employees in the Organizations

We implemented an application domain with the purpose of managing unbalanced employee data and identifying new patterns of employee performance. We have applied Predictive Analytics to the dataframe in an effort to improve accuracy and reduce the number of false positives rate.

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Flask

Predicitve Analytics

Decision Tree

Mathematical Optimization

Performance Management

pdf

loan-calculator - PyPI

A novel and effective approach on Truth Discovery Analysis with Wordnet for DataMining

A Python application for calculating simple interest, annual payment, the rate of interest (ROI), and loan amount.

ToolStack: Python (NumPy, Pandas, Seaborn, Matplotlib, Scikit-learn), Flask

Scalabale and Robust Truth Detection

Computation Analysis

Data Visualization(DataViz)

Data Manipulation

Exploratory Data Analysis

pdf