projects | Mayank Sharma

Benchmarks and datasets

CONVOLEARN: Fine-Tuning Dialogic AI Tutors

Dataset of 2,134 teacher-student dialogues labeled across six learning-sciences dimensions to train and evaluate dialogic tutoring behavior in LLMs.

Psychometric Analysis of MRBench V2

Applied CFA, IRT, G-theory, and measurement invariance testing to validate MRBench V2. Found six of eight dimensions form a coherent scale (CFI=0.998, Grel=0.974) but detected non-equivalence across model sizes.

Decoding Actionability in Teacher Observation Feedback

Fine-tuned RoBERTa on 662 annotated feedback examples to classify actionability, then scaled to 12,000+ instances to identify linguistic patterns distinguishing actionable from vague feedback.

AI system evaluation

CantoTalk: Probing Teacher Expertise From Fine-Tuned Representations

Fine-tuned five LLMs on 7,518 Cantonese teacher utterances to classify talk moves (micro-F1=0.81). Probed embeddings to show teacher expertise is linearly separable and clustering reveals three distinct discourse styles.

ClaimCLAIRE: Trust-Aware Multi-Component Fact-Checking

Built a fact-checking agent integrating component-aware decomposition, trust-modulated retrieval, and adaptive gap-filling. Achieved 84.27% accuracy on AVeriTeC by balancing evidence comprehensiveness with source reliability. Accepted for oral presentation at TrustNLP @ ACL 2026.

A Bigger Catch: Fine-Grained Curriculum Alignment on MathFish

Built a three-stage pipeline (hard negatives, cross-encoder reranking, ReAct agent) to predict which of 385 Common Core standards a math problem aligns to, achieving 31.3% exact match (6.5× baseline). Accepted at 21st BEA @ ACL 2026.

Applied psychometrics

SELOS: Social and Emotional Learning and Orientation Scale

Developed and validated an 8-item SEL scale in Hindi with 4,352 students. EFA and CFA revealed a two-factor structure with strong reliability and partial measurement invariance across gender.

CSEL: Measuring Teacher Beliefs About Classroom SEL

Developed a scale measuring teachers' beliefs about classroom SEL with 2,097 teachers. Factor analysis revealed three dimensions (management, culture, relationships) that predict teacher well-being and correlate with emotional intelligence.

DiPeCoS: Digital Pedagogy Competence Scale

Created an 8-item scenario-based assessment of teachers' digital pedagogy competence, validated with 1,315 teachers using IRT. Items show good discrimination and appropriate difficulty, forming a unidimensional construct grounded in UDL.

Real-world outcome evaluation

Game-Based Learning: Building Knowledge and SEL Competencies

Designed a course centered on 'Bury me, my Love' and tested it with 201 adolescents across India and UAE. Found significant increases in both migration knowledge (p<0.001) and empathy/compassion, with interesting gender effects.

Equilibrium in Empathic Response Predicts IWAH

Used polynomial regression with response surface analysis on 634 Indian adolescents to show that equilibrium in empathic concern and personal distress predicts identification with all humanity, while directional disequilibrium (EC>PD) provides no additional benefit.

Global Citizenship Identity Mediates Knowledge, Skills, and Engagement

Mediation analysis with 249 participants showing global citizenship identification accounts for 70.7% of critical inquiry's effect on engagement, 39.9% of awareness's effect, and 33.6% of empathy's effect. Critical inquiry has no direct effect without identity.