Karan Ajay Pisay

Data Engineer • ML Enablement • Analytics Systems

Turning data into actionable insights — building reliable pipelines, productionizing ML, and enabling business decisions with high-quality, governed data.

Spark / PySpark Airflow Snowflake AWS dbt Great Expectations
Available for collaborations • Data Engineering • Data Science

About

Background • Focus areas • What I build

A dedicated and results-driven Data Professional with a proven track record in transforming data into strategic insights. With expertise in SQL, Python, and machine learning, I have reduced costs, optimized processes, and enhanced decision-making in dynamic environments. My experience spans automating data pipelines to developing predictive models, always aiming to deliver practical, measurable impact.

Welcome to my portfolio — where data meets actionable intelligence.

Karan Ajay Pisay

Projects

Hover a card for details

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Real-time Sentiment Analysis on Cryptocurrency Market Trends

Real-time ingestion + NLP scoring with dashboards for insights.

AWSNLPBERTStreaming
Data Pipeline for E-commerce Analytics

Data Pipeline for E-commerce Analytics

Low-latency pipeline + dashboards to improve decisions and ops.

KinesisRedshiftGlueTableau
Automotive Customer Insights and Sales Forecasting

Automotive Customer Insights and Sales Forecasting

Customer analytics + time-series forecasting for strategic planning.

SnowflakeScikit-LearnProphetEDA
News Shack Live News Summarizer

News Shack Live News Summarizer

Fine-tuned transformer summarization with a simple UI.

TransformersT5StreamlitNLP
Sentiment Analysis on Russia-Ukraine Invasion

Sentiment Analysis on Russia-Ukraine Invasion

Social/news sentiment modeling using VADER + BERT.

VADERBERTAWSStreamlit
Fraud Detection in Financial Data

Fraud Detection in Financial Data

Fraud classification using ML + tuning for generalization.

MLRandom ForestLogRegPySpark
CyberBullying Detector using NLP

CyberBullying Detector using NLP

Tweet classifier with SVM + real-time predictions UI.

SVMTF-IDFStreamlitNLP
Spotify Charts Analysis

Spotify Charts Analysis

Big data analysis in Databricks to uncover streaming trends.

DatabricksPySparkBig DataAnalytics
EDA on CDC Sexually Transmitted Diseases Dataset

EDA on CDC STDs Dataset

Exploratory analysis to identify trends and distributions.

PandasMatplotlibEDATime Series
Customer Churn Prediction for Subscription Services

Customer Churn Prediction

Behavior-based churn model to enable proactive retention.

SageMakerXGBoostSQLTableau

Experience

Roles & impact

Webull Technologies, Inc.

Data Engineer — Webull Technologies, Inc. (Florida, USA)

December 2024 – Present
  • Designed and optimized ETL/ELT pipelines using Apache Spark (PySpark) and Apache Airflow to integrate large-scale marketing, trading, and behavioral datasets, improving processing speed and reliability by 70%.
  • Built scalable data pipelines on Snowflake and AWS for automated ingestion, transformation, and storage, ensuring consistency, performance, and compliance with data-governance standards.
  • Partnered with Data Science teams to productionize ML models in AWS SageMaker (churn prediction, credit-response modeling) and integrated outputs into analytical dashboards for actionable insights.
  • Implemented data-quality frameworks using SQL, PySpark, and Great Expectations, improving accuracy and monitoring across data pipelines and analytical layers.
  • Collaborated with Product, Marketing, and Risk teams to standardize data lineage, CI/CD deployment, and dbt-based transformation workflows, enhancing transparency and maintainability.
Tri-County Electric Cooperative Inc.

Enterprise Applications Data Analyst — Tri-County Electric Cooperative Inc. (Texas, USA)

September 2023 – December 2024
  • Automated data extraction with SQL and Alteryx using Tableau for forecasting, reducing electricity procurement costs by 60%.
  • Developed high-accuracy ML models for forecasting power consumption using real-time data, cutting resource wastage by 15%.
  • Enhanced financial efficiency by 30% through ad-hoc analysis using Python and process optimization resulting in enterprise cost savings.
  • Improved resource reliability by geospatially plotting high consumption areas and implementing fault-tolerant measures.
LeveragAI

Marketing Data Analyst — LeveragAI (California, USA)

June 2023 – September 2023
  • Leveraged predictive models to enhance targeting precision and analyze customer behavior, increasing campaign engagement by 25%.
  • Implemented reporting using AWS QuickSight and pivots in Excel to deliver KPI dashboards and financial reports for decision-making.
  • Led CRM integration and adoption of marketing tools, automating processes and achieving 12% revenue growth in one quarter.
Triangular Automation Pvt Ltd

Data Engineer — Triangular Automation Pvt Ltd (India)

July 2018 – June 2021
  • Engineered robust ETL pipelines with PySpark and SQL to process credit data, enhancing integration and data quality by 40%.
  • Created and maintained machine learning models for customer segmentation, improving credit risk assessments.
  • Built scalable architectures on AWS (S3, Redshift, Glue) optimizing storage and retrieval for near real-time evaluations.
  • Developed ARR/MRR and inbound lead conversion dashboards; performed feature engineering to optimize the sales funnel and inform targeting decisions.

Education

Degrees & coursework

University of Maryland Baltimore County

Master of Professional Studies in Data Science — UMBC (Baltimore, MD)

August 2021 – May 2023
Coursework
  • Machine Learning
  • NLP/LLM
  • Data Science
  • Big Data Analytics
  • Data Visualization
Pune University

Bachelor of Engineering in Computer Science — Pune University (India)

August 2016 – June 2019
Coursework
  • Algorithms
  • Data Structures
  • DBMS
  • Operating Systems
Pune University

Associate of Science in Computer Science — Pune University (India)

August 2013 – June 2016
Coursework
  • Programming in C
  • OOP
  • Computer Networks
  • Software Engineering

Technical Skills

Tools, systems, and strengths

Programming

  • Python (NumPy, Pandas, Scikit-learn, Keras, TensorFlow, PyTorch)
  • SQL, R, SAS, JavaScript, HTML/CSS

Databases

  • SQL Server, PostgreSQL, Oracle, Snowflake
  • Neptune, DynamoDB, MongoDB, PL/SQL

Machine Learning

  • Regression, Classification, Clustering
  • Hypothesis Testing, A/B Testing, Statistical Modeling

BI / Platforms

  • Alteryx, Excel, Databricks, SSMS, DBeaver
  • Power BI, Tableau, Streamlit, Looker, Qlik, Hugging Face, SSIS

Big Data & Cloud

  • Apache Spark, Hadoop, Cassandra
  • Azure, GCP, AWS (as used in projects)

Certifications

  • Alteryx ML Fundamentals, AWS Solution Architect
  • CAPM, DASM

Contact

Let’s build something useful

Please reach out if you have ideas for collaborations or are hiring Data Engineers and Data Scientists… or to just say hi!

Email: karan.pisay7@gmail.com

Mobile: +1 (443) 593-8409

City: Greater Tampa Bay, Florida