Benchmarks and datasets
CONVOLEARN: Fine-Tuning Dialogic AI Tutors
Dataset of 2,134 teacher-student dialogues labeled across six learning-sciences dimensions to train and evaluate dialogic tutoring behavior in LLMs.
Psychometric Analysis of MRBench V2
Applied CFA, IRT, G-theory, and measurement invariance testing to validate MRBench V2. Found six of eight dimensions form a coherent scale (CFI=0.998, Grel=0.974) but detected non-equivalence across model sizes.
Decoding Actionability in Teacher Observation Feedback
Fine-tuned RoBERTa on 662 annotated feedback examples to classify actionability, then scaled to 12,000+ instances to identify linguistic patterns distinguishing actionable from vague feedback.
AI system evaluation
CantoTalk: Probing Teacher Expertise From Fine-Tuned Representations
Fine-tuned five LLMs on 7,518 Cantonese teacher utterances to classify talk moves (micro-F1=0.81). Probed embeddings to show teacher expertise is linearly separable and clustering reveals three distinct discourse styles.
ClaimCLAIRE: Trust-Aware Multi-Component Fact-Checking
Built a fact-checking agent integrating component-aware decomposition, trust-modulated retrieval, and adaptive gap-filling. Achieved 84.27% accuracy on AVeriTeC by balancing evidence comprehensiveness with source reliability. Accepted for oral presentation at TrustNLP @ ACL 2026.
A Bigger Catch: Fine-Grained Curriculum Alignment on MathFish
Built a three-stage pipeline (hard negatives, cross-encoder reranking, ReAct agent) to predict which of 385 Common Core standards a math problem aligns to, achieving 31.3% exact match (6.5× baseline). Accepted at 21st BEA @ ACL 2026.
Applied psychometrics
SELOS: Social and Emotional Learning and Orientation Scale
Developed and validated an 8-item SEL scale in Hindi with 4,352 students. EFA and CFA revealed a two-factor structure with strong reliability and partial measurement invariance across gender.
CSEL: Measuring Teacher Beliefs About Classroom SEL
Developed a scale measuring teachers' beliefs about classroom SEL with 2,097 teachers. Factor analysis revealed three dimensions (management, culture, relationships) that predict teacher well-being and correlate with emotional intelligence.
DiPeCoS: Digital Pedagogy Competence Scale
Created an 8-item scenario-based assessment of teachers' digital pedagogy competence, validated with 1,315 teachers using IRT. Items show good discrimination and appropriate difficulty, forming a unidimensional construct grounded in UDL.
Real-world outcome evaluation
Game-Based Learning: Building Knowledge and SEL Competencies
Designed a course centered on 'Bury me, my Love' and tested it with 201 adolescents across India and UAE. Found significant increases in both migration knowledge (p<0.001) and empathy/compassion, with interesting gender effects.
Equilibrium in Empathic Response Predicts IWAH
Used polynomial regression with response surface analysis on 634 Indian adolescents to show that equilibrium in empathic concern and personal distress predicts identification with all humanity, while directional disequilibrium (EC>PD) provides no additional benefit.
Global Citizenship Identity Mediates Knowledge, Skills, and Engagement
Mediation analysis with 249 participants showing global citizenship identification accounts for 70.7% of critical inquiry's effect on engagement, 39.9% of awareness's effect, and 33.6% of empathy's effect. Critical inquiry has no direct effect without identity.