Somesh Zanwar — Data Analyst

What I focus on

Most analytics problems aren't about building dashboards. They're about whether the data behind the dashboard can be trusted, whether the metric actually answers the right question, and whether anyone will act on the output.

Beyond personal projects, I contribute to the protocols and toolkits that govern how AI agents interact with data, including four merged contributions to Microsoft's Agent Governance Toolkit and A2A/CTEF proposal work. The pattern is consistent: find the gap between what a system checks and what it should check, then build the layer that closes it.

01

Work Experience

Junior Data Analyst

IVS Software Solutions

Jun 2023 – Sep 2024

Pune, India

Analyzed 80K–150K records/cycle using SQL and Python to surface KPI trends, detect anomalies, and power stakeholder reporting, directly reducing decision latency for 3 internal business teams.
Conducted structured EDA that uncovered systematic data quality gaps, improving downstream reporting accuracy by +30% and establishing recurring data validation checkpoints.
Re-engineered SQL pipelines with optimized join logic, window functions, and incremental loads, cutting data preparation time by −40% and accelerating dashboard refresh from daily to near-real-time.
Built reusable, analysis-ready datasets supporting weekly and monthly reporting workflows, eliminating 4–6 hrs/cycle of manual data wrangling per stakeholder cycle.

02

Featured Projects

// the gap: most systems check whether data is accessible. almost none check whether it should be trusted right now — or whether the agent requesting it is authorized to act on it.

The Governance Stack

Two connected projects. One unified question: can this agent use this data right now? The first governs the data & The second governs the agent .

Featured

Layer 1 · Data Trustworthiness

AI Data Governance Platform

Monitors 150K+ daily events across ingestion pipelines, enforces schema validation, detects anomalies in real time, and cuts incident response time by 25%. A Power BI dashboard with AI-generated explanations translates data failures into clear, actionable context — so teams know what broke, why, and who owns it.

PostgreSQL dbt Python Power BI LLM

Live Demo → GitHub →

Layer 2 · Agent Authorization

Data Quality-Aware Agent Governance

An agent might have permission to query a dataset. The policy engine says allowed. But the dataset failed three validation tests that morning. This prototype combines Microsoft's Agent Governance Toolkit with live data quality signals — blocking agent actions when data is untrustworthy, not only when the agent lacks authorization.

Microsoft AGT Python YAML Policy pytest

Live Demo → GitHub →

Combined governance flow

Agent Request

›

Identity Check

›

AGT Policy

›

Data Quality Registry

›

Allow / Block

›

Unified Audit Log

03

Open Source Contributions

Microsoft

microsoft/agent-governance-toolkit

4 PRs Merged

Agent Governance Toolkit Contributions

PR #1818 merged: Added Data Quality-Aware Agent Governance to the official AGT adopter registry.
PR #2038 merged: Verified the Agent SRE tutorial end-to-end and fixed documentation/API reference issues.

PR #2224 merged: Added a generic data-quality-aware governance example showing AGT policy checks combined with dataset trust signals.
PR #2278 merged: Added a dbt-backed DataQualityEvidence adapter showing dbt run_results.json → DataQualityEvidence → AGT policy decision flow.

A2A Project

a2aproject/A2A

Proposal Work

Data-Quality & Governance Proposals for A2A

Issue #1786: Proposed a data-quality scheme for the CTEF source_version registry, with conformance-style test vectors covering within-bound, boundary, and out-of-bound values.
Issue #1835: Proposed vendor-neutral documentation for governance and security expectations in agent-to-agent systems — covering trust representation, policy boundaries, and audit evidence.
Both proposals are awaiting maintainer response; follow-up planned as the protocol evolves.

AgentTrust

agentTrust · contributor program

Early Contributor

AgentTrust — Early Contributor Invitation

Invited as an early contributor to the AgentTrust project, recognized for work in AI governance, data quality-aware agent systems, and policy-readable evidence patterns.
Areas of alignment include agent trust metadata, policy-aware AI systems, governance evidence, RAG corpus binding concepts, and operational AI governance workflows.
Contribution direction covers how agent systems expose policy, security, quality, and reliability signals — and how trust metadata connects to drift and trustworthiness checks.

04

Projects

PROJECT · 01 NEW

DataTrust OS

Designed a governance-ready analytics engineering foundation using PostgreSQL, dbt, and Python, combining schema validation, data quality testing, and reconciliation checks into a single base layer for downstream reporting and governance tooling.
Built governance marts that expose dataset health scores, validation results, and anomaly flags as structured outputs — turning pipeline-level dbt test results into signals other systems can read.

Implemented anomaly detection and reconciliation logic comparing transformed records against source records, surfacing discrepancies before they reach a dashboard or stakeholder.
Established the data quality evidence layer that the AI Data Governance Platform (Layer 1 of the Governance Stack) is built on top of, connecting dbt test outcomes to governance-ready trust signals.

Outcome The foundational analytics engineering layer beneath the Governance Stack — the schema validation, data quality testing, and governance marts built here are what the AI Data Governance Platform reports on.

PROJECT · 02

Metric Decomposition Engine

Built an automated root-cause analysis system that answers the most common analyst question — "why did this metric change?" — by decomposing any event-based metric across multiple dimensions and ranking each segment's contribution to the total movement.
Engineered a dimensional drilldown pipeline on GitHub Archive data (1.3M+ events) using PostgreSQL and pandas, processing baseline vs. comparison periods across actor, repository, and organization dimensions in seconds.
Generated plain-English incident reports via a Jinja2 template engine: executive summary, dimensional breakdowns, root cause hypothesis, and recommended next steps — stakeholder-ready with no manual formatting.
Built an interactive Streamlit dashboard letting analysts select any event type, date range, and dimension set on demand, turning a multi-hour manual investigation into a self-serve workflow.

Outcome What used to take hours of manual slicing is now a self-serve analysis that runs in seconds and outputs a report ready to share with a PM or stakeholder.

PROJECT · 03

Decision Intelligence Experimentation Platform

Architected an end-to-end A/B testing platform simulating 50K+ users and 280K+ events, enabling statistically rigorous evaluation of product feature variants across user segments.
Designed hypothesis testing pipelines (z-test, t-test, Mann-Whitney) surfacing a conversion-vs-revenue trade-off that directly informed a go/no-go product launch decision.
Built interactive Streamlit dashboard for non-technical stakeholders, reducing analyst involvement in experiment readouts by ~60%.
Built a dbt-powered metrics layer transforming raw event data into experiment-ready datasets with conversion rate, lift, and AOV calculations.

Outcome Product and analytics teams can run structured experiments and connect statistical results to actual rollout decisions — not just look at charts and guess.

PROJECT · 04

Retail Analytics & Inventory Intelligence

Applied Pareto analysis to identify the top 20% of SKUs driving 75% of revenue, enabling data-backed inventory reallocation and merchandising decisions.
Designed a retail analytics pipeline transforming 541K+ raw transactions into structured revenue and inventory models using a layered SQL + dbt architecture.
Engineered SQL aggregations and indexing strategies reducing dashboard load times by 40%.
Automated recurring reporting dashboards saving 3–4 hours of manual effort per week for business stakeholders.

PROJECT · 05

Customer Churn Prediction & Explainability

Built an end-to-end churn prediction pipeline using XGBoost on 7,043 customers and 21 features, achieving ROC-AUC 0.84.
Applied SHAP explainability to identify the key drivers of churn — then translated those drivers into specific retention strategies a team could act on, not just a model score.
Engineered predictive features and applied resampling techniques to address class imbalance, improving recall by 18%.

Outcome A model that doesn't just predict churn accurately, but explains why it's happening in terms a retention team can actually use.

05

Skills

Analytics & Querying

SQL
Python
BigQuery
Power BI
KPI Development
Pandas · NumPy
Window Functions · CTEs
Query Optimization
EDA
Data Storytelling

Product & Business Analytics

A/B Testing
Experiment Design
Funnel Analysis
Cohort Analysis
Retention Analysis
Hypothesis Testing
Lift Analysis
Statistical Inference
Tableau
Streamlit · Plotly

Data Eng. & Governance

dbt
PostgreSQL
Data Governance
Agent Governance · AGT
ETL Pipelines
Data Quality Monitoring
Star / Snowflake Schema
Data Validation
MySQL · BigQuery
Data Lineage

ML & Tools

Scikit-learn
XGBoost
SHAP
Feature Engineering
Model Evaluation
GCP
Docker · Git
Jira · Salesforce
Jupyter

06

Education

Master of Science — Data Science

The University of Texas at Arlington

Jan 2025 – Dec 2026

GPA 3.8 / 4.0

Relevant Coursework Data Mining · Neural Network & Deep Learning · Convex Optimization · Cloud Computing & Big Data

Bachelor of Engineering — AI & Machine Learning

Savitribai Phule Pune University, India

Aug 2020 – May 2024

GPA 8.2 / 10

Relevant Coursework Artificial Intelligence · Machine Learning · Natural Language Processing · Database Management Systems

07

Contact

Email

someshzanwar345@gmail.com

Phone

(682) 403-0698

Arlington, TX

GitHub

github.com/SomeshZanwar

linkedin.com/in/someshzanwar

Building analytics systems that actually get used