How I Use AI to Grade Essays — A Teacher's Honest Step-by-Step Guide

Last January, a veteran teacher in my department watched me mark a set of essays and asked what I was doing differently. She'd noticed I was finishing stacks faster than I used to — not rushing, not skimping on feedback, just consistently done in less time.

I told her I'd been using AI to help with the feedback drafting layer. She looked at me the way experienced teachers look at things that sound like shortcuts. "Does it actually work?" she asked. "Or does it just produce the same generic comment forty times?"

It was exactly the right question. Because that's precisely what bad AI essay grading looks like — the same comment, slightly reworded, stamped on every paper. Technically done. Actually useless. The kind of feedback that tells a student nothing specific about their work and nothing about what to do differently.

What I've built over the past several months isn't that. It's a workflow where AI handles the language mechanics of feedback — the structural drafting, the consistent professional register, the time-consuming translation of my professional judgment into written words — while I supply the things AI can't: the observation of this specific student's work, the awareness of their history and growth, and the human recognition that tells a fifteen-year-old their teacher actually read what they wrote.

This is a step-by-step guide to that workflow. Not a list of tools. Not a theoretical framework. A practical, tested process that you can apply to your next stack of essays.

Before Anything Else: What AI Should and Shouldn't Do in Essay Grading

I want to establish this clearly before the first step, because the workflow only works if you hold this distinction throughout.

What AI should do in essay grading:

Draft the written feedback language from your professional judgment
Maintain consistent register and structure across a large stack
Flag potential issues for your review (ambiguous questions, marking inconsistencies)
Reduce the time cost of translating your assessment into written words

What AI should never do in essay grading:

Assign the grade or score — that judgment belongs to you
Decide what's strong or weak about a student's work — that requires reading the essay
Replace your reading of the student's actual writing
Generate feedback for students you haven't assessed yourself

The workflow I'm about to describe assumes you've read the essay. You've formed a professional view of its quality. You have a grade, a key strength, and one or two targets in your head. AI then helps you write those things up faster and more consistently than your twentieth essay of the evening would produce unaided.

If you're looking for a tool that reads and grades essays autonomously — this isn't that guide, and I'd encourage scepticism about any tool that claims to do it reliably. The reading, the judgment, the professional assessment — those are yours. The drafting is where AI earns its place.

The Data Privacy Step — Do This Before Anything Else

Before I describe any specific tool or prompt, the data privacy practice has to come first. Student essays are personal data — in the US protected under FERPA, in the UK under UK GDPR and the Data Protection Act 2018, and in most jurisdictions under equivalent protections.

The non-negotiable practice: Never enter a student's real name, identifying details, or any information that could identify them into an AI tool not covered by your district or school's data processing agreement. This applies regardless of how the tool is marketed, regardless of whether it claims FERPA or GDPR compliance, and regardless of time pressure.

My practice in every step below: I anonymise before pasting. Student names become Student A, Student B, or a generic placeholder. Class identifiers, dates, and any contextual details that could identify a student are removed before any content enters an AI tool. All identifying details are added only in my own gradebook or document system after generating the feedback.

This adds approximately thirty seconds per essay to the workflow. It is non-negotiable.

The Step-by-Step Workflow

Step 1 — Read the Essay and Form Your Own Judgment First (Non-Negotiable)

Before opening any AI tool, read the essay. All of it. With your rubric or mark scheme in front of you.

Make two notes on a sticky note or in your gradebook:

The score or grade you're assigning and why (one sentence — "strong thesis, evidence underdeveloped, mechanics solid — B+")
One genuine strength and one or two specific targets for this student

This step takes the same time it always did. AI doesn't reduce your reading time. It reduces your writing time. You need to have read and judged before you prompt.

Why this order matters: if you prompt before reading, you're asking AI to assess the essay rather than asking it to help you articulate your assessment. The output will reflect the tool's pattern-matching on the text rather than your professional judgment. That's how you end up with the same generic comment on every paper.

Time: same as always — your full reading and marking time per essay

Step 2 — Anonymise the Essay Excerpt (30 Seconds)

If you're pasting any part of the student's writing into the AI tool — which produces the best feedback because it grounds the AI's language in the actual text — remove the student's name and any identifying information first.

Replace name with "the student" or "Student A." Remove any personal details mentioned in the essay that could identify the writer. Remove class codes, assignment headers, or date stamps.

If you're not comfortable pasting student writing into the tool at all — which is a legitimate choice, especially in schools with strict data policies — you can describe the essay in your own words instead. The feedback quality is slightly lower but still significantly better than the tired-twentith-essay version you'd write unaided.

Time: 30 seconds

Step 3 — Build Your Prompt With Your Judgment Already in It

This is the step most teachers skip when they try AI essay grading and get generic results. The prompt must contain your professional judgment, not ask AI to form one.

Here is the prompt template I use:

"I am a teacher providing written feedback on a student essay. I have already read and assessed this essay. My assessment: [your grade/score]. Key strength I observed: [specific strength from your reading]. Target(s) for improvement: [your specific targets]. Here is an anonymised excerpt from the essay for context: [paste anonymised excerpt — 2–3 sentences is sufficient, not the whole essay]. Please draft written feedback of approximately 80–100 words that: communicates the key strength specifically and warmly, addresses the targets with actionable language, uses [subject/level] appropriate terminology, and sounds like a teacher who knows this student's work — not a generic assessment. Do not invent strengths or weaknesses beyond what I have specified."

The last instruction — "do not invent strengths or weaknesses beyond what I have specified" — is the most important element. Without it, AI tools add plausible-sounding observations that may not reflect the actual essay. With it, the output is grounded entirely in your professional judgment.

Time: 3–5 minutes to prompt and generate — compared to 10–20 minutes writing from scratch

Step 4 — Review the Output Against Three Criteria

Never paste AI output directly into a student's feedback without reading it. Review every draft against these three questions:

Is it accurate? Does every claim in the feedback reflect the essay you actually read — not a plausible-sounding generalisation?

Is it specific? Would a student reading this know exactly which part of their essay you're referring to? Generic feedback ("your analysis could be deeper") tells a student nothing they can act on. Specific feedback ("your analysis in paragraph two names the technique but doesn't explain its effect on the reader") gives them a next action.

Does it sound like you? If the feedback reads like it was produced by a machine — overly formal, slightly unnatural, missing your professional voice — edit it. The student is receiving this as your communication to them.

This review takes two to four minutes. It is not optional.

Time: 2–4 minutes per essay

Step 5 — Add the Human Element AI Can't Produce

This is the step Harriet described in my UK grading review — the thirty-second addition that makes feedback feel like it came from a teacher who knows the student, not a system that processed the essay.

One sentence. Specific to this student. Something only you know.

It might be: noticing they've improved on a target you set last time, referencing something they said in class that connects to their writing, acknowledging a genuine risk they took in the essay, or simply naming something in their voice or approach that is distinctively theirs.

AI cannot write this sentence. It doesn't know your students. This sentence is why the feedback matters to the student beyond its functional information content. It's the sentence that tells them: you were seen.

Write it yourself. It takes thirty seconds when you've been reading and thinking about their work. It takes fifteen minutes when you're starting from a blank page at essay number twenty. This workflow creates the conditions where you have the clarity to write it.

Time: 30 seconds

Step 6 — Scale the Workflow to Your Stack

Once the individual essay workflow is established, here's how to scale it across a full stack efficiently.

Batch by similar performance level. Mark all your strong essays consecutively, then mid-range, then developing. The prompt adjustments between levels are smaller within a band than across bands — you'll build a rhythm for each level.

Build a prompt library. After three or four weeks using this workflow, you'll have prompts that reliably produce strong output for your subject and level. Save them. The prompt refinement work you do on week one pays forward to every stack after it.

Use the checklist before every stack, not after. Anonymise, prompt, review — in that order, every time. The teachers who get into trouble with AI grading are the ones who skip steps when they're tired. Those are exactly the moments when skipping matters most.

For summative high-stakes assessments: Add one additional step. For any assessment that goes into a gradebook and will be seen by parents or used in progress reports, have a colleague spot-check five AI-assisted feedback comments against the rubric. The inter-rater reliability check that I documented in my grading tools review applies here — one colleague, five essays, twenty minutes, catches anything systematic you've missed.

Time: Full stack time depends on your pre-existing reading time + approximately 5–7 minutes AI-assisted feedback writing per essay (compared to 12–20 minutes unaided)

The Prompts by Subject — Tested Versions

These are the specific prompts I've tested and refined for the most common essay types. Copy, adapt, and make them yours.

English / Language Arts Essays

"I'm an English teacher providing feedback on a [grade level] essay on [topic/text]. My assessment: [grade]. Key strength: [your observation]. Target(s): [your targets]. Anonymised excerpt: [paste]. Draft feedback of 80–100 words that references the specific writing choices in the excerpt, uses the language of [your assessment framework — e.g. evidence, analysis, commentary / AO references for GCSE], and ends with one actionable next step. Do not invent observations not present in what I've described."

"I'm a history teacher providing feedback on a [grade level] essay on [topic]. My assessment: [grade]. Strength observed: [your observation — e.g. strong use of evidence, clear argument structure]. Target: [your target — e.g. counter-argument underdeveloped, source evaluation missing]. Anonymised excerpt: [paste]. Draft feedback of 80–100 words that acknowledges the historical thinking skill demonstrated, identifies the gap in historical argument specifically, and gives one concrete action for improvement. Historian's vocabulary where appropriate."

Science Extended Writing

"I'm a science teacher providing feedback on a [grade level] extended writing response on [topic]. My assessment: [mark out of total]. The student [strength: e.g. correctly identified the variable / explained the process accurately]. Target: [e.g. the explanation of why doesn't connect cause and effect / units missing from conclusion]. Anonymised excerpt if relevant: [paste]. Draft feedback of 60–80 words that acknowledges the scientific accuracy demonstrated, identifies the specific gap in scientific reasoning, and gives one testable improvement action. Scientific vocabulary appropriate for [level]."

Creative Writing

"I'm an English teacher providing feedback on a [grade level] piece of creative writing. My assessment: [grade]. Voice/strength observed: [specific observation about this piece — e.g. strong sensory detail in opening, distinctive narrative voice]. Target: [e.g. dialogue feels unnatural, ending is rushed]. Anonymised excerpt: [paste]. Draft feedback of 80–100 words that honours the creative risk in the piece, names the specific strength with reference to its effect on the reader, and frames the target as a craft suggestion rather than a correction. Warm, writer-to-writer tone."

What This Workflow Doesn't Solve

Honest accounting requires naming what AI-assisted essay grading doesn't fix.

It doesn't reduce your reading time. The time saving is in feedback writing, not essay reading. If reading is your bottleneck, this workflow helps but doesn't transform the problem.

It doesn't work for every essay type. Highly technical assessments — maths proofs, code review, lab report analysis — require domain-specific judgment that the general feedback prompts above don't capture well. For these, the workflow needs subject-specific adaptation or may not be worth using.

It doesn't replace moderation. If you're a classroom teacher whose grades are moderated by a department head or external examiner, AI-assisted feedback doesn't change the moderation requirement. Your grades and your feedback standards still need to meet the moderation benchmark.

It doesn't eliminate the risk of AI inaccuracy. The review step in Step 4 exists because AI occasionally produces feedback that is plausible but inaccurate — a claim about the essay that doesn't match what's actually there. This happens less when your judgment is in the prompt. It still happens. The review step catches it. Skip the review step and you risk sending inaccurate feedback to students.

The Time Maths — What to Actually Expect

Based on my own tracking and Harriet's from the UK review:

Before AI-assisted workflow:

Reading + assessing per essay: 8–15 minutes (subject and level dependent — unchanged)
Writing feedback per essay: 12–20 minutes for a full set, degrading toward the end of a large stack
Total per essay: 20–35 minutes

After AI-assisted workflow:

Reading + assessing per essay: 8–15 minutes (unchanged)
Anonymising + prompting + generating + reviewing + adding human element: 5–8 minutes
Total per essay: 13–23 minutes

On a 30-essay stack: approximately 3–4 hours saved. On a 90-essay stack (Harriet's half term): approximately 9–12 hours saved.

The saving is real. It's also smaller than the tools' marketing suggests, because the reading time — which is the larger component — doesn't change. Anyone promising AI will cut essay marking by 80% is counting only the feedback-writing time and ignoring the reading time. Don't be misled by that framing.

Who This Workflow Is For

Teachers with high essay marking loads — English, humanities, social studies, extended science writing — who are currently spending significant time on the feedback-writing layer rather than the reading layer will see the most return.

Teachers whose feedback quality degrades across a large stack — who know their thirty-fifth comment is worse than their fifth — will find the consistency benefit as significant as the time saving. AI doesn't get tired. Your judgment informs it throughout.

Teachers new to essay assessment who are still developing their feedback language will find the prompts useful as a professional development tool — the feedback they generate teaches them what good written assessment sounds like, building their own practice alongside saving time.

Teachers in schools with strict data policies who cannot paste student work into external tools: the workflow still applies using Step 3's description method rather than excerpt pasting. The feedback quality is somewhat lower, but the structure, consistency, and time saving remain significant.

Final Verdict

Using AI to grade essays works — specifically, it works for the feedback-writing layer of marking, when your own professional reading and judgment come first and the AI assists with drafting rather than assessing. The workflow is the thing. The tool without the workflow produces the generic comment forty times. The workflow with the tool produces specific, consistent, professionally expressed feedback at a pace that makes a full marking load sustainable.

My colleague asked if it actually works or if it just produces the same generic comment forty times. The honest answer: both outcomes are possible. The workflow in this guide produces the first. Skipping Steps 1, 3, or 5 produces the second.

Read the essay. Form the judgment. Anonymise. Prompt with your judgment already in it. Review. Add the human element. In that order. Every time.

The time you save goes back to your students in the ways that matter most — the energy you bring to the room, the attention you have for the student who needs something specific that day, the thirty seconds you spend telling a fifteen-year-old that you actually saw what they did in their writing.

That's what the marking time was always supposed to protect. This workflow makes it sustainable again.

This is a step-by-step guide to that workflow. Not a list of tools. Not a theoretical framework. A practical, tested process that you can apply to your next stack of essays.

Before Anything Else: What AI Should and Shouldn't Do in Essay Grading

I want to establish this clearly before the first step, because the workflow only works if you hold this distinction throughout.

What AI should do in essay grading:

Draft the written feedback language from your professional judgment
Maintain consistent register and structure across a large stack
Flag potential issues for your review (ambiguous questions, marking inconsistencies)
Reduce the time cost of translating your assessment into written words

What AI should never do in essay grading:

Assign the grade or score — that judgment belongs to you
Decide what's strong or weak about a student's work — that requires reading the essay
Replace your reading of the student's actual writing
Generate feedback for students you haven't assessed yourself

The Data Privacy Step — Do This Before Anything Else

This adds approximately thirty seconds per essay to the workflow. It is non-negotiable.

The Step-by-Step Workflow

Step 1 — Read the Essay and Form Your Own Judgment First (Non-Negotiable)

Before opening any AI tool, read the essay. All of it. With your rubric or mark scheme in front of you.

Make two notes on a sticky note or in your gradebook:

The score or grade you're assigning and why (one sentence — "strong thesis, evidence underdeveloped, mechanics solid — B+")
One genuine strength and one or two specific targets for this student

This step takes the same time it always did. AI doesn't reduce your reading time. It reduces your writing time. You need to have read and judged before you prompt.

Time: same as always — your full reading and marking time per essay

Step 2 — Anonymise the Essay Excerpt (30 Seconds)

Replace name with "the student" or "Student A." Remove any personal details mentioned in the essay that could identify the writer. Remove class codes, assignment headers, or date stamps.

Time: 30 seconds

Step 3 — Build Your Prompt With Your Judgment Already in It

This is the step most teachers skip when they try AI essay grading and get generic results. The prompt must contain your professional judgment, not ask AI to form one.

Here is the prompt template I use:

Time: 3–5 minutes to prompt and generate — compared to 10–20 minutes writing from scratch

Step 4 — Review the Output Against Three Criteria

Never paste AI output directly into a student's feedback without reading it. Review every draft against these three questions:

Is it accurate? Does every claim in the feedback reflect the essay you actually read — not a plausible-sounding generalisation?

This review takes two to four minutes. It is not optional.

Time: 2–4 minutes per essay

Step 5 — Add the Human Element AI Can't Produce

One sentence. Specific to this student. Something only you know.

Time: 30 seconds

Step 6 — Scale the Workflow to Your Stack

Once the individual essay workflow is established, here's how to scale it across a full stack efficiently.

Time: Full stack time depends on your pre-existing reading time + approximately 5–7 minutes AI-assisted feedback writing per essay (compared to 12–20 minutes unaided)