Chief Data Scientist · EkaCare
Building Sovereign AI
for Healthcare in India
PhD in Music Information Retrieval (Universitat Pompeu Fabra). B.Tech, IIT Kanpur. 20+ publications in top venues. Currently building purpose-driven LLMs at EkaCare — medical ASR, vision models, multilingual NLP, and evaluation frameworks.
Leading AI/ML at EkaCare to build sovereign, purpose-built intelligence for Indian healthcare. The core thesis: general-purpose LLMs are not enough — Indian healthcare needs models trained on its own language, its own clinical context, its own constraints.
Built the Parrotlet family — a suite of small, domain-specific models. Parrotlet-A (medical ASR, 5B parameters) achieves state-of-the-art on Indian-accented medical speech in Hindi, Tamil, Bengali, and 15+ other languages. Parrotlet-V Lite (4B vision LLM) reads prescriptions, lab reports, and clinical documents with handwriting OCR and structured extraction. Parrotlet-E powers multilingual medical embeddings across 22 Indian languages, anchored by IndicMTEB — a benchmark we built and open-sourced for medical NLP evaluation in India.
Built KARMA (OpenMedEvalKit), an open-source evaluation framework for medical AI — because you can't improve what you can't measure, and most medical AI benchmarks don't reflect real Indian clinical conditions. Released four domain-specific evaluation datasets alongside it.
On the product side: EkaScribe (ambient AI medical scribe, generates structured SOAP notes in real-time from doctor-patient conversations), DocAssist (clinical AI assistant with drug interaction alerts and voice documentation), and Document Understanding (automated parsing of lab reports, prescriptions, and insurance claims with SNOMED-CT/LOINC coding). These run at production scale across thousands of doctors on the EkaCare platform.
Built the data science function at Synaptic, a company selling AI-powered intelligence on private companies to institutional investors. The challenge: most of the interesting signal about private companies is unstructured, noisy, and scattered — employee reviews, job postings, news, regulatory filings, web footprints.
Built NLP pipelines to classify and extract signal from Glassdoor reviews at scale — sentiment, topic modeling, and forward-looking indicators of company trajectory. Built a graph ML system to map competitive relationships across thousands of private companies, identifying clusters, tracking market structure shifts, and flagging emerging competitors before they become obvious.
Also worked on time-series signals as alternate economic indicators: NYC 311 noise complaints as a proxy for urban economic recovery post-COVID, US electricity demand tracking as a leading industrial activity signal. The underlying idea throughout: any trace of human behavior at scale, if measured carefully, tells you something real about what's happening in the economy.