A dedicated and results-driven Data Professional with a proven track record in transforming data into strategic insights. With expertise in SQL, Python, and machine learning, I have significantly reduced costs, optimized processes, and enhanced decision-making in dynamic environments. My experience spans from automating data pipelines to developing predictive models, always aiming to deliver impactful solutions.
Welcome to my portfolio, where data meets actionable intelligence!
Constructed a real-time system to ingest and preprocess data from Twitter and Google News achieving 91% accuracy with NLTK’s VADER and fine-tuned BERT Transformer. Deployed dashboards for visualization and used AWS S3 for storage.
Tech Stack: AWS (SageMaker, Kinesis), Apache Kafka, NLTK, BERT Transformer
Designed a real-time data pipeline reducing processing latency by 85% increasing sales conversion rates by 15% improving inventory management by 20% and providing real-time dashboards that significantly enhanced operational efficiency and strategic planning.
Tech Stack: AWS (Kinesis, Redshift, Lambda, Glue), Tableau, TensorFlow, Google Cloud Platform (BigQuery)
Engineered data analysis solution with time series analysis and feature engineering that provided actionable insights into customer behavior vehicle preferences and market trends to increase customer retention boost sales and improve decision-making.
Tech Stack: Snowflake, Feature Engineering, Scikit-Learn, Prophet (time series forecasting)
Conducted data analysis on CNN dataset (100,000 rows), handling missing values and data anomalies. Achieved accuracy of 91% and ROUGE precision of 94% by fine-tuning T5 transformer model for news Summarization. Created a user-friendly interface with Streamlit, offering real-time news summaries.
Tech Stack: Python, Scikit-learn, Machine Learning Pipelines, Time Series Analysis, NLP, Transformers
Extracted data from Twitter and Google News to create a data model and applied data preprocessing techniques. Performed sentiment analysis using NLTK’s VADER library and fine-tuned BERT Transformer with data accuracy of 91%.
Tech Stack: Python, AWS (EC2, S3), Predictive Modeling, Big Data Analytics, Streamlit
Implemented Random Forest and Logistic Regression ML models to detect fraudulent transactions. Utilized cross-validation and hyperparameter tuning to optimize model performance and generalization.
Tech Stack: PySpark, Feature Engineering, Machine Learning, Regression Analysis, Hypothesis Testing, DAX
Developed a web application to detect cyberbullying in tweets using Natural Language Processing (NLP) and machine learning techniques. The system classifies tweets based on age, gender, ethnicity, and religion using a Support Vector Machine (SVM) model with a fine-tuned accuracy of 82.8%. The front-end interface, built with Streamlit, provides real-time predictions and visualizations.
Tech Stack: NLP, SVM, Cyberbullying Detection, Streamlit, Twitter Data, Tokenization, TF-IDF Vectorization
Conducted an in-depth analysis of Spotify charts using PySpark and Databricks to understand trends in music popularity across different regions and periods. The project utilized data from over 3.7 GB of Spotify track metadata, identifying key artists, genres, and patterns in music streaming. Visualizations highlighted significant insights such as the most streamed artists and the correlation between song attributes and popularity.
Tech Stack: PySpark, Databricks, Big Data, Spotify Charts, Music Streaming, Data Cleaning, Data Wrangling
Performed Exploratory Data Analysis (EDA) on the CDC dataset for sexually transmitted diseases. The project involved data cleaning, visualization, and analysis to identify trends and patterns in disease prevalence and demographics. Insights included the geographical distribution of cases, age group statistics, and time-series trends of disease outbreaks.
Tech Stack: EDA, CDC, Sexually Transmitted Diseases, Data Cleaning, Data Visualization, Pandas, Matplotlib
Developed a machine learning model to predict customer churn for a subscription-based service using advanced data analysis and predictive modeling techniques. Analyzed customer behavior data to identify patterns indicative of churn, enabling targeted retention strategies and reducing churn rates by 25%.
Tech Stack: AWS (S3, Redshift, SageMaker), PySpark, Scikit-learn, XGBoost, SQL, Tableau, Docker
Article 1: BIG DATA IN TESLA VEHICLES AND THE INTERNET OF VEHICLES(IOV)
Article 2: DEEPFAKES AND WHAT COMES WITH IT
Article 3: ETHICAL ISSUES IN COMBATING CYBER-CRIME
Article 4: ETHICS' IMPORTANCE IN THE TECHNOLOGICAL ADVANCEMENT
Enterprise Applications Data Analyst - Tri-County Electric Cooperative Inc. Texas, USA (September 2023 - Present)
Marketing Data Analyst - LeveragAI California, USA (June 2023 - September 2023)
Data Engineer - Triangular Automation Pvt Ltd, India (July 2018 - June 2021)
Master of Professional Studies in Data Science - University of Maryland Baltimore County, Baltimore, MD (August 2021 - May 2023)
Coursework:
Bachelor of Engineering in Computer Science - Pune University, India (August 2016 – June 2019)
Coursework:
Associate of Science in Computer Science - Pune University, India (August 2013 - June 2016)
Coursework:
Please reach out if you have any ideas for collaborations or are hiring Data Engineers and Data Scientist! ... or to just say Hi!
Email: karanajaypisay@gmail.com
Mobile: +1 (443) 593-8409
City: Austin, Texas