Karan Ajay Pisay

Data Engineer • ML Enablement • Analytics Systems

Turning data into actionable insights — building reliable pipelines, productionizing ML, and enabling business decisions with high-quality, governed data.

Spark / PySpark Airflow Snowflake AWS dbt Great Expectations

Download Resume Contact

Available for collaborations • Data Engineering • Data Science

About

Background • Focus areas • What I build

A dedicated and results-driven Data Professional with a proven track record in transforming data into strategic insights. With expertise in SQL, Python, and machine learning, I have reduced costs, optimized processes, and enhanced decision-making in dynamic environments. My experience spans automating data pipelines to developing predictive models, always aiming to deliver practical, measurable impact.

Welcome to my portfolio — where data meets actionable intelligence.

Karan Ajay Pisay

Projects

Hover a card for details

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Real-time ingestion + NLP scoring with dashboards for insights.

AWSNLPBERTStreaming

Data Pipeline for E-commerce Analytics

Low-latency pipeline + dashboards to improve decisions and ops.

KinesisRedshiftGlueTableau

Automotive Customer Insights and Sales Forecasting

Automotive Customer Insights and Sales Forecasting

Customer analytics + time-series forecasting for strategic planning.

SnowflakeScikit-LearnProphetEDA

News Shack Live News Summarizer

News Shack Live News Summarizer

Fine-tuned transformer summarization with a simple UI.

TransformersT5StreamlitNLP

Sentiment Analysis on Russia-Ukraine Invasion

Sentiment Analysis on Russia-Ukraine Invasion

Social/news sentiment modeling using VADER + BERT.

VADERBERTAWSStreamlit

Fraud Detection in Financial Data

Fraud Detection in Financial Data

Fraud classification using ML + tuning for generalization.

MLRandom ForestLogRegPySpark

CyberBullying Detector using NLP

CyberBullying Detector using NLP

Tweet classifier with SVM + real-time predictions UI.

SVMTF-IDFStreamlitNLP

Spotify Charts Analysis

Spotify Charts Analysis

Big data analysis in Databricks to uncover streaming trends.

DatabricksPySparkBig DataAnalytics

EDA on CDC Sexually Transmitted Diseases Dataset

EDA on CDC STDs Dataset

Exploratory analysis to identify trends and distributions.

PandasMatplotlibEDATime Series

Customer Churn Prediction for Subscription Services

Customer Churn Prediction

Behavior-based churn model to enable proactive retention.

SageMakerXGBoostSQLTableau

Articles

Selected papers & writing

BIG DATA IN TESLA VEHICLES AND THE INTERNET OF VEHICLES (IOV) PDF

DEEPFAKES AND WHAT COMES WITH IT PDF

ETHICAL ISSUES IN COMBATING CYBER-CRIME PDF

ETHICS' IMPORTANCE IN THE TECHNOLOGICAL ADVANCEMENT PDF

Experience

Roles & impact

Webull Technologies, Inc.

Data Engineer — Webull Technologies, Inc. (Florida, USA)

December 2024 – Present

Designed and optimized ETL/ELT pipelines using Apache Spark (PySpark) and Apache Airflow to integrate large-scale marketing, trading, and behavioral datasets, improving processing speed and reliability by 70%.
Built scalable data pipelines on Snowflake and AWS for automated ingestion, transformation, and storage, ensuring consistency, performance, and compliance with data-governance standards.
Partnered with Data Science teams to productionize ML models in AWS SageMaker (churn prediction, credit-response modeling) and integrated outputs into analytical dashboards for actionable insights.
Implemented data-quality frameworks using SQL, PySpark, and Great Expectations, improving accuracy and monitoring across data pipelines and analytical layers.
Collaborated with Product, Marketing, and Risk teams to standardize data lineage, CI/CD deployment, and dbt-based transformation workflows, enhancing transparency and maintainability.

Tri-County Electric Cooperative Inc.

Enterprise Applications Data Analyst — Tri-County Electric Cooperative Inc. (Texas, USA)

September 2023 – December 2024

Automated data extraction with SQL and Alteryx using Tableau for forecasting, reducing electricity procurement costs by 60%.
Developed high-accuracy ML models for forecasting power consumption using real-time data, cutting resource wastage by 15%.
Enhanced financial efficiency by 30% through ad-hoc analysis using Python and process optimization resulting in enterprise cost savings.
Improved resource reliability by geospatially plotting high consumption areas and implementing fault-tolerant measures.

Marketing Data Analyst — LeveragAI (California, USA)

June 2023 – September 2023

Leveraged predictive models to enhance targeting precision and analyze customer behavior, increasing campaign engagement by 25%.
Implemented reporting using AWS QuickSight and pivots in Excel to deliver KPI dashboards and financial reports for decision-making.
Led CRM integration and adoption of marketing tools, automating processes and achieving 12% revenue growth in one quarter.

Triangular Automation Pvt Ltd

Data Engineer — Triangular Automation Pvt Ltd (India)

July 2018 – June 2021

Engineered robust ETL pipelines with PySpark and SQL to process credit data, enhancing integration and data quality by 40%.
Created and maintained machine learning models for customer segmentation, improving credit risk assessments.
Built scalable architectures on AWS (S3, Redshift, Glue) optimizing storage and retrieval for near real-time evaluations.
Developed ARR/MRR and inbound lead conversion dashboards; performed feature engineering to optimize the sales funnel and inform targeting decisions.

Education

Degrees & coursework

University of Maryland Baltimore County

Master of Professional Studies in Data Science — UMBC (Baltimore, MD)

August 2021 – May 2023

Coursework

Machine Learning
NLP/LLM
Data Science
Big Data Analytics
Data Visualization

Bachelor of Engineering in Computer Science — Pune University (India)

August 2016 – June 2019

Coursework

Algorithms
Data Structures
DBMS
Operating Systems

Associate of Science in Computer Science — Pune University (India)

August 2013 – June 2016

Coursework

Programming in C
OOP
Computer Networks
Software Engineering

Technical Skills

Tools, systems, and strengths

Programming

Python (NumPy, Pandas, Scikit-learn, Keras, TensorFlow, PyTorch)
SQL, R, SAS, JavaScript, HTML/CSS

Databases

SQL Server, PostgreSQL, Oracle, Snowflake
Neptune, DynamoDB, MongoDB, PL/SQL

Machine Learning

Regression, Classification, Clustering
Hypothesis Testing, A/B Testing, Statistical Modeling

BI / Platforms

Alteryx, Excel, Databricks, SSMS, DBeaver
Power BI, Tableau, Streamlit, Looker, Qlik, Hugging Face, SSIS

Big Data & Cloud

Apache Spark, Hadoop, Cassandra
Azure, GCP, AWS (as used in projects)

Certifications

Alteryx ML Fundamentals, AWS Solution Architect
CAPM, DASM

Contact

Let’s build something useful

Please reach out if you have ideas for collaborations or are hiring Data Engineers and Data Scientists… or to just say hi!

Email: karan.pisay7@gmail.com

Mobile: +1 (443) 593-8409

City: Greater Tampa Bay, Florida

Social