Logo

Karan Ajay Pisay

"Turning data into actionable insights."

Download Resume

About Me

A dedicated and results-driven Data Professional with a proven track record in transforming data into strategic insights. With expertise in SQL, Python, and machine learning, I have significantly reduced costs, optimized processes, and enhanced decision-making in dynamic environments. My experience spans from automating data pipelines to developing predictive models, always aiming to deliver impactful solutions.
Welcome to my portfolio, where data meets actionable intelligence!

Karan Ajay Pisay

Projects

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Constructed a real-time system to ingest and preprocess data from Twitter and Google News achieving 91% accuracy with NLTK’s VADER and fine-tuned BERT Transformer. Deployed dashboards for visualization and used AWS S3 for storage.

Tech Stack: AWS (SageMaker, Kinesis), Apache Kafka, NLTK, BERT Transformer

Data Pipeline for E-commerce Analytics

Data Pipeline for E-commerce Analytics

Designed a real-time data pipeline reducing processing latency by 85% increasing sales conversion rates by 15% improving inventory management by 20% and providing real-time dashboards that significantly enhanced operational efficiency and strategic planning.

Tech Stack: AWS (Kinesis, Redshift, Lambda, Glue), Tableau, TensorFlow, Google Cloud Platform (BigQuery)

Automotive Customer Insights and Sales Forecasting

Automotive Customer Insights and Sales Forecasting

Engineered data analysis solution with time series analysis and feature engineering that provided actionable insights into customer behavior vehicle preferences and market trends to increase customer retention boost sales and improve decision-making.

Tech Stack: Snowflake, Feature Engineering, Scikit-Learn, Prophet (time series forecasting)

News Shack Live News Summarizer

News Shack Live News Summarizer

Conducted data analysis on CNN dataset (100,000 rows), handling missing values and data anomalies. Achieved accuracy of 91% and ROUGE precision of 94% by fine-tuning T5 transformer model for news Summarization. Created a user-friendly interface with Streamlit, offering real-time news summaries.

Tech Stack: Python, Scikit-learn, Machine Learning Pipelines, Time Series Analysis, NLP, Transformers

Sentiment Analysis on Russia-Ukraine Invasion

Sentiment Analysis on Russia-Ukraine Invasion

Extracted data from Twitter and Google News to create a data model and applied data preprocessing techniques. Performed sentiment analysis using NLTK’s VADER library and fine-tuned BERT Transformer with data accuracy of 91%.

Tech Stack: Python, AWS (EC2, S3), Predictive Modeling, Big Data Analytics, Streamlit

Fraud Detection in Financial Data

Fraud Detection in Financial Data

Implemented Random Forest and Logistic Regression ML models to detect fraudulent transactions. Utilized cross-validation and hyperparameter tuning to optimize model performance and generalization.

Tech Stack: PySpark, Feature Engineering, Machine Learning, Regression Analysis, Hypothesis Testing, DAX

CyberBullying Detector using NLP

CyberBullying Detector using NLP

Developed a web application to detect cyberbullying in tweets using Natural Language Processing (NLP) and machine learning techniques. The system classifies tweets based on age, gender, ethnicity, and religion using a Support Vector Machine (SVM) model with a fine-tuned accuracy of 82.8%. The front-end interface, built with Streamlit, provides real-time predictions and visualizations.

Tech Stack: NLP, SVM, Cyberbullying Detection, Streamlit, Twitter Data, Tokenization, TF-IDF Vectorization

Spotify Charts Analysis

Spotify Charts Analysis

Conducted an in-depth analysis of Spotify charts using PySpark and Databricks to understand trends in music popularity across different regions and periods. The project utilized data from over 3.7 GB of Spotify track metadata, identifying key artists, genres, and patterns in music streaming. Visualizations highlighted significant insights such as the most streamed artists and the correlation between song attributes and popularity.

Tech Stack: PySpark, Databricks, Big Data, Spotify Charts, Music Streaming, Data Cleaning, Data Wrangling

EDA on CDC Sexually Transmitted Diseases Dataset

EDA on CDC Sexually Transmitted Diseases Dataset

Performed Exploratory Data Analysis (EDA) on the CDC dataset for sexually transmitted diseases. The project involved data cleaning, visualization, and analysis to identify trends and patterns in disease prevalence and demographics. Insights included the geographical distribution of cases, age group statistics, and time-series trends of disease outbreaks.

Tech Stack: EDA, CDC, Sexually Transmitted Diseases, Data Cleaning, Data Visualization, Pandas, Matplotlib

Customer Churn Prediction for Subscription Services

Customer Churn Prediction for Subscription Services

Developed a machine learning model to predict customer churn for a subscription-based service using advanced data analysis and predictive modeling techniques. Analyzed customer behavior data to identify patterns indicative of churn, enabling targeted retention strategies and reducing churn rates by 25%.

Tech Stack: AWS (S3, Redshift, SageMaker), PySpark, Scikit-learn, XGBoost, SQL, Tableau, Docker

Work Experience

Tri-County Electric Cooperative Inc.

Enterprise Applications Data Analyst - Tri-County Electric Cooperative Inc. Texas, USA (September 2023 - Present)

  • Automated data extraction with SQL and Alteryx using Tableau for forecasting reducing electricity procurement costs by 60%.
  • Developed high accuracy ML models for forecasting power consumption using real-time data cutting resource wastage by 15%.
  • Enhanced financial efficiency by 30% through ad-hoc analysis using Python and process optimization resulting in enterprise cost savings.
  • Improved resource reliability by geospatially plotting high consumption areas implementing fault-tolerant measures.
LeveragAI

Marketing Data Analyst - LeveragAI California, USA (June 2023 - September 2023)

  • Leveraged predictive models to enhance targeting precision and analyze customer behavior increasing campaign engagement by 25%.
  • Implemented reporting system using AWS QuickSight and pivots in Excel to deliver KPI dashboards and financial reports facilitating strategic decision-making.
  • Led CRM integration and adoption of marketing tools automating processes and achieving a 12% revenue growth in one quarter.
Triangular Automation Pvt Ltd

Data Engineer - Triangular Automation Pvt Ltd, India (July 2018 - June 2021)

  • Engineered robust ETL pipelines with PySpark and SQL to process credit data enhancing data integration and data quality by 40%.
  • Created and maintained machine learning models for customer segmentation improving credit risk assessments.
  • Built scalable data architectures on AWS (S3, Redshift, Glue) optimizing data storage and retrieval for real-time credit risk evaluations.
  • Conducted credit data analysis and developed ARR, MRR, and inbound lead conversion dashboards conducted feature engineering for optimizing the sales funnel and informing strategic customer targeting decisions for the leadership.

Education

University of Maryland Baltimore County

Master of Professional Studies in Data Science - University of Maryland Baltimore County, Baltimore, MD (August 2021 - May 2023)

Coursework:

  • Machine Learning
  • NLP/LLM
  • Data Science
  • Big Data Analytics
  • Data Visualization
Pune University

Bachelor of Engineering in Computer Science - Pune University, India (August 2016 – June 2019)

Coursework:

  • Algorithms
  • Data Structures
  • Database Management Systems
  • Operating Systems
Pune University

Associate of Science in Computer Science - Pune University, India (August 2013 - June 2016)

Coursework:

  • Programming in C
  • Object-Oriented Programming
  • Computer Networks
  • Software Engineering

Technical Skills

Programming Languages

  • Python (NumPy, Pandas, Scikit-learn, Keras, TensorFlow, PyTorch), SQL, R, SAS, JavaScript, HTML/CSS

Databases

  • Microsoft SQL Server, PostgreSQL, Oracle, Snowflake, Neptune, DynamoDB, MongoDB, PL/SQL

Machine Learning

  • Regression, Classification, Statistical Data Modelling, Clustering, Hypothesis Testing, A/B Testing

BI Tools

  • Alteryx, MS Excel, Databricks, SSMS, Dbeaver, Power BI, Tableau, Prep, Streamlit, Looker, Qlik, Hugging Face, SSIS

Big Data and Cloud

  • Apache Spark, Microsoft Azure, Google Cloud Platform (GCP), Cassandra, Hadoop

Certifications

  • Alteryx Machine Learning Fundamentals, AWS Solution Architect, Certified Associate in Project Management (CAPM), Disciplined Agile SCRUM Master (DASM)

Contact

Please reach out if you have any ideas for collaborations or are hiring Data Engineers and Data Scientist! ... or to just say Hi!

Email: karanajaypisay@gmail.com

Mobile: +1 (443) 593-8409

City: Austin, Texas