Product & Data Analyst · UT Arlington · Open to Work

Building analytics
systems that
actually get used

|

MS Data Science @ UT Arlington (GPA 3.8) with 1+ year of industry experience. A track record that goes beyond coursework: four merged contributions to Microsoft's Agent Governance Toolkit, AgentTrust early contributor invitation, A2A/CTEF proposal work, and four deployed analytics systems with live demos. I build systems where data is validated before it reaches reporting, agents are authorized before they act on it, and insights are framed clearly enough to drive a decision.

live · profile · status
Status ● Open to roles
Work Auth F-1 OPT · No sponsorship
Microsoft AGT 4 PRs merged
A2A/CTEF Proposal work
AgentTrust Early Contributor
GPA 3.8 / 4.0
Industry Exp. 1+ year
Reporting Accuracy ↑ +30%
Data Prep Time ↓ −40%
Transactions Processed 541K+
Churn Model ROC-AUC 0.84
What I focus on

Most analytics problems aren't about building dashboards. They're about whether the data behind the dashboard can be trusted, whether the metric actually answers the right question, and whether anyone will act on the output.

Beyond personal projects, I contribute to the protocols and toolkits that govern how AI agents interact with data, including four merged contributions to Microsoft's Agent Governance Toolkit and A2A/CTEF proposal work. The pattern is consistent: find the gap between what a system checks and what it should check, then build the layer that closes it.

Junior Data Analyst
IVS Software Solutions
Jun 2023 – Sep 2024
Pune, India
  • Analyzed 80K–150K records/cycle using SQL and Python to surface KPI trends, detect anomalies, and power stakeholder reporting, directly reducing decision latency for 3 internal business teams.
  • Conducted structured EDA that uncovered systematic data quality gaps, improving downstream reporting accuracy by +30% and establishing recurring data validation checkpoints.
  • Re-engineered SQL pipelines with optimized join logic, window functions, and incremental loads, cutting data preparation time by −40% and accelerating dashboard refresh from daily to near-real-time.
  • Built reusable, analysis-ready datasets supporting weekly and monthly reporting workflows, eliminating 4–6 hrs/cycle of manual data wrangling per stakeholder cycle.
Microsoft
microsoft/agent-governance-toolkit
4 PRs Merged
Agent Governance Toolkit Contributions
  • PR #1818 merged: Added Data Quality-Aware Agent Governance to the official AGT adopter registry.
  • PR #2038 merged: Verified the Agent SRE tutorial end-to-end and fixed documentation/API reference issues.
  • PR #2224 merged: Added a generic data-quality-aware governance example showing AGT policy checks combined with dataset trust signals.
  • PR #2278 merged: Added a dbt-backed DataQualityEvidence adapter showing dbt run_results.json → DataQualityEvidence → AGT policy decision flow.
A2A Project
a2aproject/A2A
Proposal Work
Data-Quality & Governance Proposals for A2A
  • Issue #1786: Proposed a data-quality scheme for the CTEF source_version registry, with conformance-style test vectors covering within-bound, boundary, and out-of-bound values.
  • Issue #1835: Proposed vendor-neutral documentation for governance and security expectations in agent-to-agent systems — covering trust representation, policy boundaries, and audit evidence.
  • Both proposals are awaiting maintainer response; follow-up planned as the protocol evolves.
AgentTrust
agentTrust · contributor program
Early Contributor
AgentTrust — Early Contributor Invitation
  • Invited as an early contributor to the AgentTrust project, recognized for work in AI governance, data quality-aware agent systems, and policy-readable evidence patterns.
  • Areas of alignment include agent trust metadata, policy-aware AI systems, governance evidence, RAG corpus binding concepts, and operational AI governance workflows.
  • Contribution direction covers how agent systems expose policy, security, quality, and reliability signals — and how trust metadata connects to drift and trustworthiness checks.
PROJECT · 01 NEW

DataTrust OS

  • Designed a governance-ready analytics engineering foundation using PostgreSQL, dbt, and Python, combining schema validation, data quality testing, and reconciliation checks into a single base layer for downstream reporting and governance tooling.
  • Built governance marts that expose dataset health scores, validation results, and anomaly flags as structured outputs — turning pipeline-level dbt test results into signals other systems can read.
  • Implemented anomaly detection and reconciliation logic comparing transformed records against source records, surfacing discrepancies before they reach a dashboard or stakeholder.
  • Established the data quality evidence layer that the AI Data Governance Platform (Layer 1 of the Governance Stack) is built on top of, connecting dbt test outcomes to governance-ready trust signals.
Outcome The foundational analytics engineering layer beneath the Governance Stack — the schema validation, data quality testing, and governance marts built here are what the AI Data Governance Platform reports on.
PROJECT · 02

Metric Decomposition Engine

  • Built an automated root-cause analysis system that answers the most common analyst question — "why did this metric change?" — by decomposing any event-based metric across multiple dimensions and ranking each segment's contribution to the total movement.
  • Engineered a dimensional drilldown pipeline on GitHub Archive data (1.3M+ events) using PostgreSQL and pandas, processing baseline vs. comparison periods across actor, repository, and organization dimensions in seconds.
  • Generated plain-English incident reports via a Jinja2 template engine: executive summary, dimensional breakdowns, root cause hypothesis, and recommended next steps — stakeholder-ready with no manual formatting.
  • Built an interactive Streamlit dashboard letting analysts select any event type, date range, and dimension set on demand, turning a multi-hour manual investigation into a self-serve workflow.
Outcome What used to take hours of manual slicing is now a self-serve analysis that runs in seconds and outputs a report ready to share with a PM or stakeholder.
PROJECT · 03

Decision Intelligence Experimentation Platform

  • Architected an end-to-end A/B testing platform simulating 50K+ users and 280K+ events, enabling statistically rigorous evaluation of product feature variants across user segments.
  • Designed hypothesis testing pipelines (z-test, t-test, Mann-Whitney) surfacing a conversion-vs-revenue trade-off that directly informed a go/no-go product launch decision.
  • Built interactive Streamlit dashboard for non-technical stakeholders, reducing analyst involvement in experiment readouts by ~60%.
  • Built a dbt-powered metrics layer transforming raw event data into experiment-ready datasets with conversion rate, lift, and AOV calculations.
Outcome Product and analytics teams can run structured experiments and connect statistical results to actual rollout decisions — not just look at charts and guess.
PROJECT · 04

Retail Analytics & Inventory Intelligence

  • Applied Pareto analysis to identify the top 20% of SKUs driving 75% of revenue, enabling data-backed inventory reallocation and merchandising decisions.
  • Designed a retail analytics pipeline transforming 541K+ raw transactions into structured revenue and inventory models using a layered SQL + dbt architecture.
  • Engineered SQL aggregations and indexing strategies reducing dashboard load times by 40%.
  • Automated recurring reporting dashboards saving 3–4 hours of manual effort per week for business stakeholders.
PROJECT · 05

Customer Churn Prediction & Explainability

  • Built an end-to-end churn prediction pipeline using XGBoost on 7,043 customers and 21 features, achieving ROC-AUC 0.84.
  • Applied SHAP explainability to identify the key drivers of churn — then translated those drivers into specific retention strategies a team could act on, not just a model score.
  • Engineered predictive features and applied resampling techniques to address class imbalance, improving recall by 18%.
Outcome A model that doesn't just predict churn accurately, but explains why it's happening in terms a retention team can actually use.
Analytics & Querying
  • SQL
  • Python
  • BigQuery
  • Power BI
  • KPI Development
  • Pandas · NumPy
  • Window Functions · CTEs
  • Query Optimization
  • EDA
  • Data Storytelling
Product & Business Analytics
  • A/B Testing
  • Experiment Design
  • Funnel Analysis
  • Cohort Analysis
  • Retention Analysis
  • Hypothesis Testing
  • Lift Analysis
  • Statistical Inference
  • Tableau
  • Streamlit · Plotly
Data Eng. & Governance
  • dbt
  • PostgreSQL
  • Data Governance
  • Agent Governance · AGT
  • ETL Pipelines
  • Data Quality Monitoring
  • Star / Snowflake Schema
  • Data Validation
  • MySQL · BigQuery
  • Data Lineage
ML & Tools
  • Scikit-learn
  • XGBoost
  • SHAP
  • Feature Engineering
  • Model Evaluation
  • GCP
  • Docker · Git
  • Jira · Salesforce
  • Jupyter
Master of Science — Data Science
The University of Texas at Arlington
Jan 2025 – Dec 2026
GPA 3.8 / 4.0
Relevant Coursework Data Mining · Neural Network & Deep Learning · Convex Optimization · Cloud Computing & Big Data
Bachelor of Engineering — AI & Machine Learning
Savitribai Phule Pune University, India
Aug 2020 – May 2024
GPA 8.2 / 10
Relevant Coursework Artificial Intelligence · Machine Learning · Natural Language Processing · Database Management Systems