The integration of generative artificial intelligence into educational settings, particularly in writing instruction, presents a significant pedagogical challenge. Beyond simply providing feedback, educators must equip students with the critical skills to interpret, evaluate, and effectively utilize AI-generated suggestions. A recent study has delved into this complex landscape, investigating the impact of two distinct metacognitive interventions on students’ writing quality and self-assessment accuracy within an AI-assisted writing framework. The research, conducted with 120 undergraduate English majors, employed a 2×2 mixed factorial design to assess the effectiveness of a Feedback Literacy Script (FRAC) and an Assessment-Performance Calibration Activity (APCA). Participants were assigned to one of four conditions: regular AI use, FRAC only, APCA only, and a combined FRAC+APCA intervention. Over the course of four writing tasks, the study meticulously collected data on writing quality gains, self-assessment accuracy, overconfidence, effective feedback uptake, and the depth of revisions undertaken by students. Key Findings Emerge from Interventions The study’s findings reveal differentiated effects stemming from the two metacognitive interventions. The Feedback Literacy Script (FRAC) demonstrated a more pronounced direct impact on improving writing quality, fostering effective feedback uptake, and encouraging deeper revisions. This suggests that FRAC primarily enhances how students process and act upon AI-generated feedback. Conversely, the Assessment-Performance Calibration Activity (APCA) exhibited the most significant influence on self-assessment accuracy and the reduction of overconfidence. This indicates that calibration training more directly strengthens students’ internal judgment and their ability to accurately gauge their own performance. Interestingly, the combined intervention, integrating both FRAC and APCA, yielded the highest overall gains in writing quality. Furthermore, this combined approach demonstrated the strongest retention of these improvements even after AI support was withdrawn. However, it is noteworthy that the combined intervention did not surpass APCA alone in its effect on self-assessment accuracy. Divergent Pathways to Writing Improvement The study’s discussion highlights a crucial distinction: improved writing and more accurate self-evaluation, while related, are distinct outcomes in the context of AI-assisted writing. The educational value derived from AI-assisted writing, the researchers argue, is less about the sheer abundance of feedback and more about the learner’s capacity to critically evaluate that feedback, calibrate their self-judgment, and ultimately transform external support into independent revision abilities. Background and Context: The Rise of AI in Education The proliferation of generative AI tools like ChatGPT and Claude has rapidly reshaped educational landscapes. In writing instruction, these tools offer unprecedented capabilities, generating multi-level suggestions on vocabulary, grammar, structure, argumentation, and style within seconds. This accessibility has the potential to augment traditional feedback mechanisms, which are often limited by time and resources. However, it simultaneously introduces new challenges. Students are now faced with a deluge of information, necessitating a higher level of cognitive engagement to discern the validity and utility of AI-generated advice. Early research on AI-assisted writing has often focused on its capacity to improve linguistic accuracy, fluency, and writing efficiency, particularly in second language and English as a foreign language contexts. Studies have indicated that AI can indeed enhance these surface-level aspects of writing. However, the impact on higher-order writing abilities, such as argument quality, knowledge transfer, and long-term retention, has shown more mixed results. A significant limitation in much of this prior research has been the tendency to view AI as a simple input condition (AI vs. non-AI) rather than examining the intricate processes through which students engage with AI-generated feedback during the revision process. This oversight has left a gap in understanding the internal mechanisms that link feedback utilization, self-assessment calibration, revision behavior, and overall writing improvement. The unique characteristics of AI feedback—its variable credibility, sensitivity to prompts, and the sheer volume and speed of its delivery—demand that students develop a sophisticated approach to its use. Unlike traditional feedback, AI feedback is not inherently authoritative and requires active critical evaluation. This has led to the central argument of the present study: that metacognitive support is as vital as the AI feedback itself in optimizing learning outcomes. The Interventions: Addressing Specific Metacognitive Skills To address these pedagogical challenges, the study focused on two specific metacognitive interventions designed to target different facets of the feedback-revision cycle. The Feedback Literacy Script (FRAC) was developed to enhance students’ ability to process external feedback. It aims to equip learners with the skills to critically evaluate, select, and apply AI-generated suggestions more effectively. This intervention aligns with research underscoring that productive engagement with AI feedback hinges on feedback literacy—the capacity to interpret and utilize feedback meaningfully. FRAC guides students through a structured process of filtering feedback, reasoning about its validity and relevance, acting upon it through specific revisions, and checking the efficacy of those revisions. The Assessment-Performance Calibration Activity (APCA), on the other hand, was designed to bolster students’ internal monitoring and self-evaluation skills. This intervention focuses on helping students develop more accurate self-assessments, mitigate overestimation or underestimation of their abilities, and improve the alignment between their self-perceptions and their actual performance. APCA utilizes a cycle of self-assessment, comparison with external evaluations, and subsequent adjustment of evaluative standards. This approach draws on research indicating that structured self-assessment can significantly enhance learning, and that AI-supported feedback can influence self-assessment accuracy in varied ways depending on a learner’s initial calibration. Methodology: A Rigorous Experimental Design The study’s methodology was meticulously designed to isolate and measure the effects of these interventions. A 2×2 mixed factorial design was employed, allowing for the examination of the independent and interactive effects of FRAC and APCA. Participants: The study involved 120 undergraduate English majors, aged 19-22, who met specific eligibility criteria including advanced English proficiency, prior academic writing experience, and basic familiarity with AI tools. Stratified randomization was used to assign participants to the four groups, ensuring baseline comparability across key variables such as writing ability, English proficiency, and AI use experience. Intervention Procedures: FRAC Group: Participants received training on the FRAC script before the first writing task (T1) and before the second task (T2). They completed decision sheets detailing their feedback processing at T1 and T2. APCA Group: Participants engaged in the APCA calibration cycle from T0 (baseline) through T3. They completed structured calibration logs for each task round, reflecting on their self-assessments and discrepancies. Combined FRAC+APCA Group: Participants received both FRAC training and engaged in the APCA calibration cycle. Regular AI Group: This control group received standard AI feedback without any additional metacognitive training. Writing Tasks and Data Collection: The study spanned six weeks and included four distinct writing task points (T0 to T3). All writing tasks involved academic argumentative essays, with topics selected for comparable difficulty and relevance. T0 (Baseline): Students wrote an essay without AI support to establish baseline performance. T1 (AI Feedback and Revision): Students wrote an essay, received AI feedback, and revised their work. T2 (Transfer Task): Students wrote an essay on a different topic, received AI feedback, and revised, assessing the transferability of intervention effects. T3 (Retention Task): Students wrote an essay without AI support, assessing the retention of intervention effects. Data collection included initial and final drafts, AI feedback logs, feedback decision sheets (for FRAC groups), and calibration logs (for APCA groups). Measures: Writing Quality: Assessed by two experienced English teachers using a 100-point analytic rubric, with high inter-rater reliability (ICC = 0.92). Writing quality gain was calculated as the difference between final and initial draft scores. Self-Assessment Accuracy (SAA): Measured as the absolute difference between students’ predicted scores and teacher-rated scores. Signed error was also analyzed to distinguish overestimation from underestimation. Overconfidence was defined as a predicted score exceeding the actual score by more than five points. Effective Adoption Rate (EAR): Assessed the proportion of AI feedback items that addressed a genuine problem, were implemented appropriately, and led to demonstrable improvement. Revision Depth: Categorized revisions into four levels (L1: surface, L2: lexical/sentence, L3: paragraph structure, L4: content/argument). Deep revision was defined as the proportion of L3 and L4 revisions. Data Analysis: A multilevel mixed-effects framework was used to analyze writing quality gain, accounting for repeated measures and controlling for baseline differences. Analyses for self-assessment accuracy utilized baseline-adjusted models. Between-group comparisons and descriptive profile analyses were conducted for EAR and revision depth. Statistical significance was assessed with appropriate tests, and effect sizes (Cohen’s d) were reported for pairwise comparisons. Results: A Closer Look at the Data The results paint a nuanced picture of the impact of metacognitive interventions in AI-assisted writing. Writing Quality Gains: Descriptively, the combined FRAC+APCA group showed the largest writing quality gains, followed by the FRAC-only group, the APCA-only group, and the regular AI group. This pattern held consistently across the three AI-supported and unsupported tasks, suggesting a robust benefit of the combined intervention for writing improvement. Self-Assessment Accuracy: The APCA-only group demonstrated the most significant improvement in self-assessment accuracy, characterized by lower absolute error and a notably reduced overconfidence rate. While the combined group also showed gains, they did not surpass the APCA-only group in this specific metric. This suggests that direct calibration training is particularly effective in sharpening students’ self-judgment capabilities. Effective Feedback Uptake: Students in the FRAC and combined groups exhibited significantly higher effective adoption rates of AI feedback compared to the regular AI and APCA-only groups. This indicates that FRAC effectively empowers students to discern and act upon useful AI suggestions. Revision Depth: A clear shift towards deeper revisions (L3 and L4) was observed in the FRAC and combined groups. These students were more likely to engage in substantive changes related to paragraph structure, content, and argumentation, whereas the regular AI group predominantly focused on surface-level corrections. Transfer and Retention: The benefits of FRAC and the combined intervention extended beyond the immediate AI-assisted revision tasks. The FRAC and combined groups maintained their performance advantage in a new writing task (T2) and demonstrated stronger retention of these gains when AI support was removed (T3). This finding underscores the potential for these interventions to foster more internalized and transferable revision strategies. Discussion: Deconstructing the Mechanisms of Improvement The study’s findings offer critical insights into how AI feedback can be leveraged for genuine learning. The superior performance of the FRAC group in writing quality, feedback uptake, and deep revision supports the notion that feedback literacy is paramount. Students equipped with FRAC were better able to navigate the AI feedback landscape, critically evaluating suggestions and integrating them meaningfully into their revisions. This moves beyond the simple consumption of AI output to active, informed engagement. The distinct impact of APCA on self-assessment accuracy highlights the importance of metacognitive monitoring. By providing structured opportunities to compare self-judgments with external evaluations, APCA helped students develop a more realistic understanding of their writing strengths and weaknesses, thereby reducing overconfidence and improving the calibration of their internal standards. The finding that the combined intervention achieved the highest writing quality gains, but not superior self-assessment accuracy to APCA alone, suggests a complementary rather than purely additive relationship between the two interventions. FRAC seems to directly drive the revision process and writing improvement, while APCA refines the underlying judgmental accuracy that informs those revisions. The enhanced transfer and retention observed in the FRAC and combined groups suggest that these interventions foster more autonomous and adaptable writing skills, moving students away from passive reliance on AI. Pedagogical Implications and Future Directions From a pedagogical standpoint, the study strongly advocates for integrating metacognitive support into AI-assisted writing instruction. Simply providing AI feedback is insufficient. Educators must proactively guide students in developing the critical faculties to interpret, evaluate, and act upon this feedback. FRAC and APCA represent effective strategies for cultivating these essential skills. The findings suggest that AI writing tools should not be viewed as standalone solutions but as components within a broader instructional framework designed to foster self-regulated learning and critical thinking. Future research could further explore the interplay of these interventions with other factors such as learner agency, trust in AI, and the development of broader critical AI literacy. Investigating these interventions in more diverse student populations and contexts would also be valuable. Conclusion: Towards a More Effective AI-Assisted Writing Pedagogy In conclusion, this research underscores that the efficacy of AI-assisted writing instruction hinges on empowering students with robust metacognitive skills. The study demonstrates that structured interventions like FRAC and APCA can significantly enhance students’ ability to critically engage with AI feedback, improve their writing quality, and foster more accurate self-assessment. By shifting the focus from the mere provision of feedback to the cultivation of students’ internal capacities for evaluation and revision, educators can unlock the true potential of AI to support meaningful and lasting learning in writing. The findings provide a clear roadmap for educators seeking to optimize the use of AI tools, moving beyond passive consumption to active, critical, and ultimately more effective writing development. Post navigation Using Machine Learning to Predict Student Mathematics Performance in Six East Asian Countries: Evidence from PISA 2022