writing an essay proficiency

Guide to the Cambridge C2 Proficiency Writing Exam – Part 1: Essay

Guide to the Cambridge C2 Proficiency Writing Exam - Part 1 - Essay | Oxford House Barcelona

Posted on 19/04/2023
Categories: Blog

Are you preparing for the Cambridge C2 Proficiency (CPE) writing exam? If so, you may be feeling a little nervous and concerned about what lies ahead . Let us help put that fear and anxiety to bed and get started on how your academic writing can leave a positive impression on the examiner.

By the end of this blog post, you’ll know exactly what you need to do, how to prepare and how you can use your knowledge of other parts of the exam to help you.

Although you’ll find the advanced writing skills you’ve mastered at C1 will stand you in good stead for C2 writing, there are clear differences in the exam format in CPE. As in Cambridge C1, there are two parts in the writing exam, and understanding what you need to do before you’ve even put a pen to paper is incredibly important. So, let’s go!

What’s in Part 1?

First, let’s look at the format of Part 1:

Task: essay.
Word count: 240–280 words.
Register: formal.
Overview: a summary of two texts and an evaluation of the ideas.
Suggested structure: introduction, paragraph 1, paragraph 2, conclusion.
Time: 1 hour 30 minutes for Part 1 and 2.

Before we look at an example task, let’s look at how your paper will be assessed. The examiner will mark your paper using four separate assessment scales:

Content – this demonstrates your ability to complete the task, including only relevant information.
Communicative achievement – this shows how well you’ve completed the task, having followed the conventions of the task, used the correct register and maintained the reader’s attention throughout.
Organisation – the overall structure of your essay, the paragraphs and the sentences.
Language – your ability to use a wide range of C2 grammar and vocabulary in a fluent and accurate way.

How can I write a fantastic essay?

Let’s look at an example task:

Example Task_C2 Proficiency Writing Test - Part 1 Essay | Oxford House Barcelona

The key things you’re being asked to do here are summarise, evaluate and include your own ideas, using your own words as far as possible. So, in short, you have to paraphrase. As a Cambridge exams expert, you’ll know that this is a skill you already use throughout the exam.

In Reading and Use of English Part 4, the techniques you are using to make the keyword transformations (active to passive, comparative structures, negative inversions, common word patterns, etc) will show you that you already know how you can say the same thing in other words.

Your ability to do word formation in Reading and Use of English Part 3 is useful here, as you look for verbs that you can change into nouns, and vice versa. This enables you to say reword sentences without losing the original meaning.

You are already adept at identifying the correct options in Reading and Use of English Part 5 and Listening Parts 1 and 3, although the words given are different to the information in the text or audio.

So, be aware of the skills you have already practised, and use them to your advantage!

How should I plan and structure my essay?

Before you even consider writing, read both texts thoroughly . Highlight the key points in each text and make notes about how you can express this in your own words. Look for contrasting opinions and think about how you can connect the ideas together. These contrasting ideas will usually form the basis of paragraphs 2 and 3.

Although there are multiple ways you can organise your essay, here is a tried and tested structure:

Paragraph 1: Introduction

Paragraph 2: Idea 1 with support

Paragraph 3: Idea 2 with support

Paragraph 4: Conclusion

Introduction

Use your introduction as a way to present the general theme. Don’t give anything away in terms of your own opinion, but instead give an overview of what you will discuss. Imagine this as a global comment, talking about how society as a whole may feel about the topic.

Start with a strong sentence. Make your intentions clear, then back up your idea with a supporting sentence and elaborate on it. Use linkers to show how this idea has different stances, paraphrased from the key points you highlighted in the texts.

Follow the same structure as Idea 1, but focus on a different element from the two texts. Introduce it clearly, then provide more support to the idea. Keep emotional distance from the topic – save your opinion for the conclusion!

Here is the opportunity for you to introduce your personal opinion. There shouldn’t be anything new included here other than how you personally feel about the topics discussed. Use your conclusion to refer back to the main point and round up how your opinion differs or is similar.

This is just one example of how you can structure your essay. However, we recommend trying different formats. The more you practise, the more feedback you’ll get from your teacher. Once you’ve settled on the structure that suits you, your planning will be a lot quicker and easier.

What can I do to prepare?

According to the Cambridge English website, ‘A C2 Proficiency qualification shows the world that you have mastered English to an exceptional level. It proves you can communicate with the fluency and sophistication of a highly competent English speaker.’

This means that being a proficient writer in your own language is not enough. So, what can you do to really convince the examiner that you truly are smarter than the average Joe ?

Prepare! Prepare! Prepare!

✔ Read academic texts regularly.

✔ Pay attention to model essay answers and highlight things that stand out.

✔ Always try to upgrade your vocabulary. Challenge yourself to think of synonyms.

✔ Write frequently and study the feedback your teacher gives you.

✔ Study C2 grammar and include it in your writing.

What do I need to avoid?

Don’t overuse the same linkers. Practise using different ones and not only in essays. You can write something much shorter and ask your teacher to check for correct usage.

Don’t constantly repeat the same sentence length and punctuation. Long sentences may seem the most sophisticated, but you should consider adding shorter ones from time to time. This adds variety and a dramatic effect. Try it!
Don’t be discouraged by your mistakes – learn from them! If you struggle with a grammar point, master it. If you spell something incorrectly, write it again and again.
Don’t limit your English studying time. Do as much as possible in English – watch TV, read, listen to podcasts, or meet with English speaking friends. English time should not only be reserved for the classroom.

What websites can help me?

The Official Cambridge English page, where you can find a link to sample papers.

BBC Learning English has a range of activities geared towards advanced level learners.

Flo-joe has very useful writing practice exercises that allow you to see other students’ writing.

Writing apps and tools like Grammarly can improve your writing style with their feedback and suggestions.

Don’t forget about our fantastic C2 blogs too!

Passing Cambridge C2 Proficiency: Part 3 Reading and Use of English

Passing C2 Proficiency: A Guide to Reading Part 5

Passing C2 Proficiency: A Guide to Reading Part 6

Guide to the Cambridge C2 Proficiency Listening Test

Guide to the Cambridge C2 Proficiency Speaking Test

Looking for further support?

If you’re interested in preparing for the C2 Proficiency exam but don’t know where to start, get in touch with us here at Oxford House today! We offer specific courses that are designed especially to help you get ready for the exam. Let our fully qualified teachers use their exam experience to guide you through your learning journey. Sign up now and receive your free mock test!

Glossary for Language Learners

Find the following words in the article and then write down any new ones you didn’t know.

lie ahead (pv): be in the future.

stand you in good stead (id): be of great use to you.

adept at (adj): have a good ability to do something.

thoroughly (adv): completely.

tried and tested (adj): used many times before and proved to be successful.

back up (pv): give support to.

round up (pv): summarise.

settle on (pv): choose after careful consideration

average Joe (n): normal person.

discouraged (adj): having lost your enthusiasm or confidence.

pv = phrasal verb

adj = adjective

Name (required)

Email (required)

Improve your English pronunciation by mastering these 10 tricky words

Posted on 05/04/2023

5 Spelling Rules For Comparative And Superlative Adjectives

Posted on 03/05/2023

A Guide to English Accents Aro

Countries can have extremely different English accents despite sharing the same language. Just take the word ‘water’... Read More

Passing Cambridge C2 Proficien

Many sections of the Cambridge Proficiency are multiple-choice, so Part 2 of the Reading and Use of English can seem cha... Read More

Exploring the Impact of AI in

Gone are the days of learning from phrasebooks and filling in worksheets for homework. Now students have access to a wid... Read More

Everything You Need To Know Ab

Although you learn plural nouns early on, they can be challenging. There are many rules and exceptions to remember plus ... Read More

The Importance of English For

No matter where you live, you’ve probably experienced record-breaking temperatures and severe weather. You may have se... Read More

Discovering Barcelona Through

We all know that Barcelona is a fantastic city to live in. You only need to spend the afternoon wandering around one of ... Read More

8 New Words To Improve Your Vo

The arrival of a new year presents an ideal opportunity to work on your language goals. Whether you’re preparing for a... Read More

Learning English through Chris

It’s beginning to look a lot like Christmas! If you resisted the urge to sing that line instead of saying it, then, we... Read More

24 Christmas Phrases for Joyfu

‘Tis the season to be jolly, and what better way to get ready for the festive period than by learning some typical Chr... Read More

3 Easy Ways To Use Music To Im

Are you ready to embark on your latest journey towards mastering the English language? We all know that music is there f... Read More

Grammar Guide – Understandin

Do you sometimes feel a bit lost when deciding which tense to use? Are you a little unsure of the differences between th... Read More

Halloween Humour: Jokes, Puns

We all need a break from time to time. Sometimes we’re up to our eyeballs in projects at work, and we just need a mome... Read More

English for Business: 7 Ways L

If you’re interested in getting a promotion at work, earning a higher salary or landing your dream job, then working o... Read More

A Beginner’s Guide to Ch

Understanding the need for exams An official exam is a fantastic way to demonstrate your English. Why? Firstly,... Read More

English Tongue Twisters to Imp

One of the most fun ways to practise and improve your pronunciation is with tongue twisters. That’s because they’re ... Read More

25 years of Oxford House – O

We all know that fantastic feeling we have after completing an academic year: nine months of English classes, often twic... Read More

Guide to the Cambridge C2 Prof

Are you working towards the Cambridge C2 Proficiency (CPE) exam? Have you been having sleepless nights thinking about wh... Read More

9 Tips For Communicating With

When travelling to or living in an English-speaking country, getting to know the local people can greatly enhance your e... Read More

Are you preparing for the Cambridge C2 Proficiency (CPE) writing exam? If those pre-exam jitters have started to appear,... Read More

English Vocabulary For Getting

Are you feeling bored of the way your hair looks? Perhaps it’s time for a new you. All you need to do is make an appoi... Read More

5 Spelling Rules For Comparati

Messi or Ronaldo? Pizza or sushi? Going to the cinema or bingeing on a series at home? A beach holiday or a walking trip... Read More

Improve your English pronuncia

What are some of the trickiest words to pronounce in English? Well, we’ve compiled a useful list of ten of the most di... Read More

Using Language Reactor To Lear

If you love watching Netflix series and videos on YouTube to learn English, then you need to download the Language React... Read More

Are you preparing for the Cambridge C2 Proficiency (CPE) exam? Would you like to know some tips to help you feel more at... Read More

How to use ChatGPT to practise

Are you on the lookout for an extra way to practise your English? Do you wish you had an expert available at 2 a.m. that... Read More

Well done. You’ve been moving along your English language journey for some time now. You remember the days of telling ... Read More

Tips for the IELTS listening s

Are you preparing for the IELTS exam and need some help with the listening section? If so, then you’ll know that the l... Read More

7 new English words to improve

A new year is a perfect opportunity to focus on your language goals. Maybe you are working towards an official exam. Per... Read More

How to Write a C1 Advanced Ema

Did you know that there are two parts to the C1 Advanced Writing exam? Part 1 is always a mandatory . Part 2 has ... Read More

5 Interesting Christmas tradit

When you think of the word Christmas, what springs to mind? For most people, it will be words like home, family and trad... Read More

How to write a C1 Advanced Rep

Are you preparing for the Cambridge C1 Advanced exam and need a hand with writing your report/proposal for Part 2 of the... Read More

5 of the best apps to improve

Would you like to improve your English listening skills? With all the technology that we have at our fingertips nowadays... Read More

Tips for the IELTS Reading sec

Looking for some tips to get a high band score in the IELTS Academic Reading exam? If so, then you’re in the right pla... Read More

The 5 best Halloween movies to

Boo! Are you a fan of Halloween? It’s that scary time of year again when the creepy creatures come out to play, and th... Read More

How to Write a Review for Camb

Are you planning to take the Cambridge C1 Advanced (CAE) exam? If so, you will need to complete two pieces of writin... Read More

How To Use Relative Pronouns i

Today we’re taking a look at some English grammar that sometimes trips up language learners. In fact, we’ve just use... Read More

How To Get Top Marks: Cambridg

So you’re taking the ? If so, you’ll know that you have four sections to prepare for: speaking, reading and use of E... Read More

Travel Vocabulary To Get Your

Summer is here and we can’t wait to go on our summer holidays! If you’re thinking about travelling overseas this yea... Read More

How To Get A High Score In The

So you’re preparing for the ! From wanting to live and work abroad to going to university in an English-speaking count... Read More

10 English Idioms To Take To T

Is there anything better than cooling off in the sea on a hot summer’s day? Well, if you live in Barcelona you hav... Read More

Tips for IELTS speaking sectio

Are you preparing for the IELTS test? If so, you’ll need to do the speaking section. While many people find speaking t... Read More

How to use 6 different English

Just when you think English couldn’t get any more confusing, we introduce you to English pronouns! The reason why peop... Read More

How to get top marks: B2 First

Congratulations – you’ve made it to the B2 First Reading and Use of English Part 7! Yet, before we get too excited, ... Read More

5 Of The Best Apps For Improvi

Speaking is often thought to be the hardest skill to master when learning English. What’s more, there are hundreds of ... Read More

Do you like putting together puzzles? If so, your problem solving skills can actually help you with B2 First Reading and... Read More

8 Vocabulary Mistakes Spanish

If you ask a Spanish speaker what they find difficult about English language learning, they may mention false friends an... Read More

How To Get Top Marks: B2 First

Picture this: You’re in your B2 First exam and you’ve finished the Use of English part. You can put it behind you fo... Read More

12 Business Phrasal Verbs to K

Want to improve your English for professional reasons? You’re in the right place. When working in English, it’s comm... Read More

How to use articles (a, an, th

Knowing what articles are and when to use them in English can be difficult for language learners to pick up. Especially ... Read More

Are you preparing for ? Reading and Use of English Part 4 may not be your cup of tea – in fact most students feel quit... Read More

Passing B2 First Part 3: Readi

Are you studying for the B2 First exam? You’re in the right place! In this series of blogs we want to show you al... Read More

8 new English words you need f

New words spring up each year! They often come from popular culture, social and political issues, and innovations in tec... Read More

7 of the Best Apps for Learnin

If you find yourself commuting often and spending a lot of time on the bus, you’ll most likely turn towards playing ga... Read More

The B2 First is one of the most popular English exams for students of English. It is a recognised qualification that can... Read More

4 Different Types Of Modal Ver

What are modal verbs? They are not quite the same as regular verbs such as play, walk and swim. Modal verbs are a type o... Read More

So you’ve decided to take the ! Formerly known as FCE or the First Certificate, this is by far most popular exam. Whe... Read More

Useful Expressions For Negotia

A lot of our global business is conducted in English. So, there’s a strong chance you may have to learn how to negotia... Read More

Passing C1 Advanced Part 8: Re

If you’re wondering how to do Part 8 of the Reading and Use of English paper, you’re in the right place! After s... Read More

The Difference Between IELTS G

You’ve probably heard of . It’s the world’s leading test for study, work and migration after all. And as the world... Read More

Passing C1 Advanced Part 7: Re

Welcome to Part 7 of the Reading and Use of English paper. This task is a bit like a jigsaw puzzle. One where you have ... Read More

The Benefits Of Learning Engli

Who said learning English was just for the young? You're never too old to learn something new. There are plenty of benef... Read More

So, you’re preparing to take the . You’ve been studying for each of the four sections; reading, writing, speaking an... Read More

6 Reels Accounts to Learn Engl

Are you looking for ways to learn English during the summer holidays? We’ve got you covered – Instagram Reels is a n... Read More

Passing Cambridge C1 Advanced

Well done you! You’ve made it to Part 6 of the Reading and Use of English exam. Not long to go now – just three mor... Read More

8 Resources To Help Beginner E

Learning a new language is hard, but fun. If you are learning English but need some help, our monthly course is what y... Read More

5 Famous Speeches To Help you

Everyone likes listening to inspiring speeches. Gifted speakers have a way of making people want to listen and take acti... Read More

How To Write A B2 First Formal

Dear reader… We sincerely hope you enjoyed our previous blog posts about the Writing section of the B2 First. As promi... Read More

4 Conditionals In English And

Conditionals? Is that something you use after shampooing your hair? Not quite. You may have heard your English teacher t... Read More

After racing through the first four parts of the Cambridge English Reading and Use of English paper, you’ve managed t... Read More

7 Of The Best Apps For Learnin

There are roughly 170,000 words in use in the English language. Thankfully, most native English speakers only have a voc... Read More

How to write a B2 First inform

You're probably very familiar with sending emails (and sometimes letters) in your first language. But how about in Engli... Read More

How can I teach my kids Englis

Keep kids’ minds sharp over the Easter holidays with some entertaining, educational activities in English. There are l... Read More

How Roxana went from Beginner

Roxana Milanes is twenty five and from Cuba. She began English classes back in May 2019 at Oxford House, and since then ... Read More

4 Future Tenses In English And

“Your future is whatever you make it, so make it a good one.” - Doc Brown, Back to the future. Just like the and... Read More

10 Business Idioms For The Wor

Business idioms are used throughout the workplace. In meetings, conversations and even whilst making at the coffee mac... Read More

5 Tips For Reading The News In

We spend hours consuming the news. With one click of a button we have access to thousands of news stories all on our pho... Read More

How To Write a Report: Cambrid

Imagine the scene. It’s exam day. You’re nearly at the end of your . You’ve just finished writing Part 1 - , and n... Read More

8 English Words You Need For 2

Back in December 2019, we sat down and attempted to make a list of . No one could have predicted the year that was about... Read More

5 Christmas Movies On Netflix

Christmas movies are one of the best things about the holiday season. They’re fun, they get you in the mood for the ho... Read More

MigraCode: An Inspiring New Pa

Oxford House are extremely proud to announce our partnership with MigraCode - a Barcelona-based charity which trains ref... Read More

The Ultimate Guide To Video Co

The age of telecommunication is well and truly here. Most of our business meetings now take place via video conferencing... Read More

6 Pronunciation Mistakes Spani

One of the biggest challenges for Spanish speakers when learning English is pronunciation. Often it’s a struggle to pr... Read More

6 Ways You Can Learn English w

“Alexa, what exactly are you?” Alexa is a virtual AI assistant owned by Amazon. She is voice-activated - like Sir... Read More

Passing Cambridge C1 Advanced:

Okay, take a deep breath. We’re about to enter the danger zone of the Cambridge exam - Reading and Use of English Par... Read More

What’s new at Oxford House f

Welcome to the new school year! It’s great to have you back. We’d like to remind you that , and classes are all st... Read More

European Languages Day: Where

The 26th of September is . It’s a day to celebrate Europe’s rich linguistic diversity and show the importance of lan... Read More

Back To School: 9 Tips For Lan

It’s the start of a new academic term and new courses are about to begin. This is the perfect opportunity to set your ... Read More

How to Maximise Your Online Co

If there’s one good thing to come out of this year, it’s that learning a language has never been so easy or accessib... Read More

How To Learn English With TikT

Are you bored of Facebook? Tired of Instagram? Don’t feel part of the Twitter generation? Perhaps what you’re lookin... Read More

A Brief Guide To Different Bri

It’s a fact! The UK is obsessed with the way people talk. And with , it’s no surprise why. That’s right, accents a... Read More

Study English This Summer At O

Summer is here! And more than ever, we’re in need of a bit of sunshine. But with travel restrictions still in place, m... Read More

5 Reasons To Learn English Out

As Barcelona and the rest of Spain enters the ‘new normality’, it’s time to plan ahead for the summer. Kids and te... Read More

5 Free Online Resources For Ca

Are you preparing for a Cambridge English qualification? Have you devoured all of your past papers and need some extra e... Read More

6 Different Uses Of The Word �

The word ‘get’ is one of the most common and versatile verbs in English. It can be used in lots of different ways, a... Read More

What Are The 4 Present Tenses

There are three main verb tenses in English - , the present and the future - which each have various forms and uses. Tod... Read More

5 Of The Best Netflix Series T

On average, Netflix subscribers spend streaming their favourite content. With so many binge-worthy series out there, it... Read More

Continue Studying Online At Ox

Due to the ongoing emergency lockdown measures imposed by the Spanish Government . We don’t know when we will be a... Read More

Five Ways To celebrate Sant Jo

The feast of Sant Jordi is one of Barcelona’s most popular and enduring celebrations. Sant Jordi is the patron saint o... Read More

What’s It Like To Study Onli

Educational institutions all over the world have shut their doors. From nurseries to universities, business schools to l... Read More

6 Benefits of Learning English

Whatever your new year’s resolution was this year, it probably didn’t involve staying at home all day. For many of u... Read More

9 Tips For Studying A Language

With the recent outbreak of Covid-19, many of us may have to gather our books and study from home. Schools are clos... Read More

10 Ways To Learn English At Ho

Being stuck inside can make you feel like you’re going crazy. But why not use this time to your advantage, and work on... Read More

Important Information –

Dear students, Due to the recent emergency measures from the Government concerning COVID-19, Oxford House premises wi... Read More

7 Books You Should Read To Imp

Reading is one of the best ways to practice English. It’s fun, relaxing and helps you improve your comprehension skill... Read More

Your Guide To Moving To The US

So that’s it! It’s decided, you’re moving to the USA. It’s time to hike the soaring mountains, listen to country... Read More

How to write a C1 Advanced Ess

The is an excellent qualification to aim for if you’re thinking of studying or working abroad. It’s recognised by u... Read More

Small Talk For Business Englis

Like it or not, small talk is an important part of business. Whether it’s in a lift, at a conference, in a meeting roo... Read More

English Vocabulary For Going O

It’s time for that famous celebration of love and romance - Valentine’s Day! It is inspired by the sad story of Sain... Read More

IELTS: Writing Part 2 –

When it comes to exams, preparation is the key to success - and the IELTS Writing Paper Part 2 is no exception! It is wo... Read More

5 Unmissable Events at Oxford

At Oxford House, we know learning a language extends beyond the classroom. It’s important to practise your skills in m... Read More

Am I ready for the C1 Advanced

Congratulations! You’ve passed your Cambridge B2 First exam. It was a hard road but you did it. Now what’s next? Som... Read More

Ireland is known as the Emerald Isle. When you see its lush green landscape and breathtaking views, it’s easy to see w... Read More

How SMART Goals Can Help You I

New year, new you. As one year ends and another begins, many of us like to set ourselves goals in order to make our live... Read More

15 New English Words You Need

Each year new words enter the English language. Some are added to dictionaries like . Others are old words that are give... Read More

Our Year In Review: Top 10 Blo

2019 went by in a flash - and what a year it’s been! We’re just as excited to be looking back on the past 12 months ... Read More

Telephone Interviews In Englis

Telephone interviews in English can seem scary. Employers often use them to filter-out candidates before the face-to-fa... Read More

How to Write a Great Article i

Writing in your only language can be a challenge, but writing in another language can be a complete nightmare ! Where do... Read More

A Black Friday Guide to Shoppi

Black Friday is the day after Thanksgiving. Traditionally, it signals the start of the Christmas shopping period. Expect... Read More

Passing C1 Advanced: Part 3 Re

The (CAE) is a high-level qualification, designed to show that candidates are confident and flexible language users who... Read More

AI Translators: The Future Of

Many people believe that artificial intelligence (AI) translators are surpassing human translators in their ability to a... Read More

8 Of The Best Apps For Learnin

Apps are a great tool for learning English. They are quick, easy to access and fun. It’s almost like having a mini cla... Read More

6 Ways To Improve Your Speakin

There are four linguistic skills that you utilise when learning a new language: reading, writing speaking and listening.... Read More

So, you’ve moved onto Part 3, and after completing Part 2 it’s probably a welcome relief to be given some help with ... Read More

8 Resources To Build Your Busi

Whether it’s in meetings, telephone conversations or networking events, you’ll find specific vocabulary and buzzword... Read More

5 Ways to Become a Better Lear

It’s time for some back-to-school motivation. The new school year is about to start and everyone is feeling refreshed ... Read More

Our 10 Favourite YouTubers To

Haven’t you heard? Nobody is watching the TV anymore - 2019 is the year of the YouTuber! If you’re an English langu... Read More

So, you’ve completed the of your Cambridge C1 Advanced (CAE). Now it’s time to sit back and enjoy the rest of the e... Read More

The Secret French Words Hidden

“The problem with the French is that they have no word for entrepreneur.” This phrase was attributed to George W. B... Read More

The Ultimate Guide To Gràcia

The Gràcia Festival, or , is an annual celebration taking place in the lovely, bohemian neighbourhood of Gràcia in upt... Read More

5 Things To Do In Barcelona In

Barcelona residents will often tell you than nothing happens in August. It’s too hot and everyone escapes to little vi... Read More

4 Past Tenses and When to Use

Do you have difficulty with the past tenses in English? Do you know the difference between the past simple and past perf... Read More

How To Write A Review: Cambrid

Students who are taking their B2 First Certificate exam (FCE) will be asked to do two pieces of writing within an 80 min... Read More

8 Hidden Benefits of Being Bil

Unless you were raised to be bilingual, speaking two languages can require years of study and hard work. Even once you�... Read More

7 Films to Practise Your Engli

What’s better than watching a fantastic, original-language movie in a theatre? Watching a fantastic, original-language... Read More

The 10 Best Instagram Accounts

Ever wonder how much time you spend on your phone a day? According to the latest studies, the average person spends on ... Read More

Challenge Yourself This Summer

Here comes the sun! That’s right, summer is on its way and, for many, that means a chance to take a well-deserved brea... Read More

You’ve done the hard part and finally registered for your , congratulations! Now all you need to do is pass it! H... Read More

These 5 Soft Skills Will Boost

Everyone is talking about soft skills. They are the personal traits that allow you to be mentally elastic, to adapt to n... Read More

Which English Exam Is Right Fo

Are you struggling to decide which English language exam to take? You’re not alone: with so many different options on ... Read More

Passing C2 Proficiency: A Guid

We’re sure you’ve done a great job answering the questions for of your . But now you’re faced with a completely d... Read More

Sant Jordi – Dragons, Bo

Imagine you have woken up in Barcelona for the first time in your life. You walk outside and you notice something unusua... Read More

5 Ways To Improve Your Listeni

Have you ever put on an English radio station or podcast and gone to sleep, hoping that when you wake up in the morning ... Read More

The Simple Guide To Communicat

What’s the most challenging thing about going on holiday in an English speaking country? Twenty years ago you might ha... Read More

Stop Making These 7 Grammar Mi

No matter how long you've been learning a language, you're likely to make a mistake every once in a while. The big ones ... Read More

How To Pass Your First Job Int

Passing a job interview in a language that’s not your mother tongue is always a challenge – but however daunting i... Read More

5 Ways To Practise Your Speaki

“How many languages do you speak?” This is what we ask when we want to know about someone’s language skills... Read More

You have survived the Use of English section of your , but now you are faced with a long text full of strange language, ... Read More

Improve Your English Accent Wi

Turn on a radio anywhere in the world and it won’t take long before you’re listening to an English song. And, if you... Read More

10 English Expressions To Fall

It’s nearly Valentine’s day and love is in the air at Oxford House. We’ll soon be surrounded by heart-shaped ballo... Read More

7 Graded Readers To Help You P

Graded readers are adaptations of famous stories, or original books aimed at language learners. They are written to help... Read More

6 Tools To Take Your Writing T

Written language is as important today as it has ever been. Whether you want to prepare for an , to respond to or it’... Read More

EF Report: Do Spanish Schools

The new year is here and many of us will be making promises about improving our language skills in 2019. However, how ma... Read More

Our 10 Most Popular Blog Posts

It’s been a whirlwind 2018. We’ve made so many amazing memories - from our twentieth-anniversary party to some enter... Read More

Time For A Career Change? Here

Have you ever wondered what it would be like to get a job in an international company? Perhaps you’ve thought about tr... Read More

Eaquals Accreditation: A Big S

We are delighted to be going through the final stages of our accreditation, which will help us provide the best languag... Read More

A Guide To The Cambridge Engli

Making the decision to do a Cambridge English language qualification can be intimidating. Whether you’re taking it bec... Read More

8 Top Tips To Get The Most Out

A language exchange (or Intercambio in Spanish) is an excellent way to practise English outside of the classroom. The a... Read More

The Haunted History And Terrib

The nights are drawing in and the leaves are falling from the trees. As our minds turn to the cold and frosty winter nig... Read More

Why Oxford House Is More Than

If you’re a student at , you’ll know it is far more than just a language academy. It’s a place to socialise, make ... Read More

10 Crazy Things You Probably D

From funny bananas, super long words and excitable foxes, our latest infographic explores 10 intriguing facts about the ... Read More

Meet our Director of Studies &

If you’ve been studying at Oxford House for a while there’s a good chance that you’ll recognise Judy - with her bi... Read More

Which English Course Is Right

The new school year is about to begin and many of you are probably thinking that it’s about time to take the plunge an... Read More

5 Ways To Get Over The Holiday

We head off on vacation full of excitement and joy. It’s a time to explore somewhere new, relax and spend time with ou... Read More

10 Essential Aussie Expression

Learning English is difficult! With its irregular verbs, tricky pronunciation and even harder spelling, lots of students... Read More

5 Great Apps To Give Your Engl

The next time you’re walking down the street, in a waiting room, or on public transport in Barcelona take a look aroun... Read More

Here’s Why You Should Move T

Many students have aspirations to move abroad. This might be for a number of reasons such as to find a new job, to impro... Read More

Improving Your Pronunciation W

What do English, Maori, Vietnamese and Zulu have in common? Along with another , they all use the . If your first la... Read More

How To Improve Your English Us

Netflix has changed the way we spend our free time. We don’t have to wait a week for a new episode of our favourite TV... Read More

Oxford House Community: Meet O

The year has flown by and we are already into the second week of our summer intensive courses. Today we look back at th... Read More

6 Amazing Events to Make It an

Things are hotting up in Barcelona. There’s so much to see and do during the summer months that it’s hard to know wh... Read More

How to Improve Your English Ov

The long summer holiday is almost here and we’ve got some top tips on how you can keep up your English over the summer... Read More

World Cup Vocabulary: Let’s

Football, football, football: the whole world is going crazy for the 2022 FIFA World Cup in Qatar! The beautiful game i... Read More

The 10 Characteristics Of A �

Learning a second language has a lot in common with learning to play an instrument or sport. They all require frequent p... Read More

Catch Your Child’s Imaginati

Imagine, for a moment, taking a cooking class in a language you didn’t know - it could be Japanese, Greek, Russian. It... Read More

Exam Day Tips: The Written Pap

Exams are nerve-wracking. Between going to class, studying at home and worrying about the results, it’s easy to forget... Read More

10 Reasons to Study English at

Learning a second language, for many people, is one of the best decisions they ever make. Travel, work, culture, educati... Read More

Shadowing: A New Way to Improv

Speech shadowing is an advanced language learning technique. The idea is simple: you listen to someone speaking and you ... Read More

The Best Websites to Help Your

Our children learn English at school from a young age - with some even starting basic language classes from as early as ... Read More

15 Useful English Expressions

When was the last time you painted the town red or saw a flying pig? We wouldn’t be surprised if you are scratchin... Read More

Help Your Teens Practise Engli

Teenagers today are definitely part of the smartphone generation and many parents are concerned about the amount of time... Read More

IELTS: Writing Part 1 –

Are you taking an IELTS exam soon? Feeling nervous about the writing paper? Read this article for some top tips and usef... Read More

Business skills: How to delive

Love them or hate them, at some point we all have to give a business presentation. Occasionally we have to deliver them ... Read More

10 phrasal verbs to help you b

A lot of students think English is easy to learn - that is until they encounter phrasal verbs! We are sure you have hear... Read More

6 Unbelievably British Easter

Have you heard of these fascinating British Easter traditions? Great Britain is an ancient island, full of superstition... Read More

Guide to getting top marks in

Your is coming to an end and exam day is fast approaching. It’s about time to make sure you are prepared for what man... Read More

4 Ways English Words are Born

Have you ever wondered where English words come from? There are a whopping 171,476 words in the . From aardvark to zyzz... Read More

Writing an effective essay: Ca

Students take language certifications like the Cambridge B2 First qualification for lots of different reasons. You might... Read More

5 Powerful Tools to Perfect Yo

Foreign accent and understanding When you meet someone new, what’s the first thing you notice? Is it how they look?... Read More

Essential Ski Vocabulary [Info

Are you a ski-fanatic that spends all week dreaming about white-capped peaks, fluffy snow and hearty mountain food? ... Read More

5 Tips to Get the Best Out of

Quizlet, Duolingo, Busuu...there are lots of apps on the market nowadays to help you learn and improve your English. But... Read More

10 False Friends in English an

Is English really that difficult? English is a Germanic language, which means it has lots of similarities with Germa... Read More

How to Improve your English wi

If you’ve been studying English for a long time, you’ve probably tried lots of different ways of learning the langua... Read More

Myths and Mysteries of the Eng

Learning another language as an adult can be frustrating. We’re problem-solvers. We look for patterns in language and ... Read More

10 Ways to Improve your Englis

Every year is the same. We promise ourselves to eat more healthily, exercise more and save money. It all seems very easy... Read More

10 English words you need for

Languages are constantly on the move and English is no exception! As technology, culture and politics evolve, we’re fa... Read More

Catalan Christmas Vs British C

All countries are proud of their quirky traditions and this is no more evident than . In South Africa they eat deep-fri... Read More

9 Ideas To Kickstart Your Read

You’ve heard about the four skills: reading, writing, and . Some might be more important to you than others. Although... Read More

How to Write the Perfect Busin

Business is all about communication. Whether it’s colleagues, clients or suppliers, we spend a big chunk of our workin... Read More

10 Phrasal Verbs You Should Le

Why are phrasal verbs so frustrating? It’s like they’ve been sent from the devil to destroy the morale of English la... Read More

How to Ace the Cambridge Speak

Exams are terrifying! The big day is here and after all that studying and hard work, it’s finally time to show what y... Read More

7 Podcasts To Improve Your Lis

Speaking in a foreign language is hard work. Language learners have to think about pronunciation, grammar and vocabulary... Read More

IELTS: Your Ticket to the Worl

Have you ever thought about dropping everything to go travelling around the world? Today, more and more people are quit... Read More

6 Language Hacks to Learn Engl

It’s October and you’ve just signed up for an English course. Maybe you want to pass an official exam. Maybe you nee... Read More

5 Reasons to Learn English in

Learning English is more fun when you do it in a fantastic location like Barcelona. Find out why we think this is the pe... Read More

FAQ Cambridge courses and Exam

Is it better to do the paper-based or the computer-based exam? We recommend the computer-based exam to our stud... Read More

Cambridge English Exams or IEL

What exactly is the difference between an IELTS exam and a Cambridge English exam such as the First (FCE) or Advanced (C... Read More

Oxford House Language School C/Diputación 279, Bajos (entre Pau Claris y Paseo de Gracia). 08007 - Barcelona (Eixample) Tel: 93 174 00 62 | Fax: 93 488 14 05 [email protected]

Oxford TEFL Barcelona Oxford House Prague Oxford TEFL Jobs

Legal Notice – Cookie Policy Ethical channel

Remember Me

Privacy Overview

B1 Preliminary (PET)
B2 First (FCE)
C1 Advanced (CAE)
C2 Proficient (CPE)

Not a member yet?

Part 1 0 / 30
Part 5 0 / 25
Part 6 0 / 25
Part 7 0 / 20
Part 2 0 / 30
Part 3 0 / 30
Part 4 0 / 25
Part 1 NEW 0 / 16
Part 2 NEW 0 / 16
Part 3 NEW 0 / 16
Part 4 NEW 0 / 16
Part 1 0 / 25
Part 2 NEW 0 / 29
Part 1 0 / 10
Part 2 0 / 10
Part 3 0 / 10

Get unlimited access from as little as 2.60 € / per month. *One-time payment, no subscription.

New account

Not a memeber yet? Create an account.

Lost your password? Please enter your email address. You will receive mail with link to set new password.

Back to login

Sobre nosotros
Cambridge Methodology
Online English level test for Cambridge
Curso Intensivo Cambridge Online
Curso Online Preparación Cambridge
Improve your listening skills
PET Preliminary – B1
FCE First – B2
CAE Advanced – C1
CPE Proficiency – C2
Speaking Practice for Cambridge
C1 Advanced
C2 Proficiency
Aprende inglés gratis
Our podcast
Clases de conversación
Inglés de negocios
TEFL ONLINE Teacher Training

C2 – Your Writing Guide For The Proficiency Essay

Writing an essay in part 1 of the Proficiency C2 exam can sometimes a bit confusing. In paper 2 you will be assessed on your skill to draft coherent and cohesive texts.

QUICK INFO ABOUT PAPER 2 PART 1:

Time: 1 hour 30 mins
Number of exercises: 2
Part 1 (Essay) is mandatory
Part 2 you can choose from 4 options
Assessment: Organisations, Content, Communicative Achievement and Language
Marks: 20 x 2

In this post we will just focus on Part 1, writing the essay for proficiency.

PART 1 ESSAY

You will be asked to summarise key points in two short text and giving opinions on what is stated in both texts.

IT’S A COMPULSORY TASK YOU MUST DO IT!

CONTENT THAT MUST BE INCLUDED:

You must make sure that you identify and summarise all the key points/opinions in the two texts (two for each text). Don’t forget you also need to give your own opinions on what is stated in the two texts. As the opinions given in the texts are closely related to each other, you will not need to use a lot of words to summarise them – try to do this briefly, while making sure you have not left out a key point. When you give your own opinions, you can agree or disagree with what is stated in the texts.

COMMUNICATIVE ACHIEVEMENT

Your essay should be suitably neutral or fairly formal in register but it does not have to be extremely formal. In it, you need to demonstrate that you have fully understood the main points, by summarising them in your own words, not copying large parts from the texts. The opinions that you give must be closely related to those main points so that your essay is both informative and makes clear sense as a whole.

ORGANIZATION

Make sure that your essay flows well and logically and is divided appropriately into paragraphs. Make sure that there is a clear connection between your opinions and the content of the two texts, and that these features are linked using appropriate linking words and phrases, both between sentences and between paragraphs.

The language that you use needs to be both accurate and not simple/basic. You need to demonstrate that you have a high level of English by using a range of grammatical structures and appropriate vocabulary correctly. Don’t use only simple words and structures throughout your answer. Try to think of ones that show a more advanced level, without making sentences too complicated for the reader to understand. It is advisable to check very carefully for accuracy when you have completed your answer. Also make sure that everything you have written makes clear sense.

THINGS TO KEEP IN MIND

The content of your essay does not have to follow any particular order.
Summarise the main points of the text and then give your own opinions.
Give your opinion on each point from the text as you summarise it.
You can summarise the points in a different order from how they appear in the text.
You must include your own opinions but you can put them anywhere in the essay as long as they connect closely with the points made in the texts.

Download our C2 Essay guide and practise with the full sample and correction we have included to writing an essay for the proficiency exam.

If you need any extra practise for the writing part don’t forget to check out our online course to pass the writing paper of the C2 Proficiency.

Deja una respuesta Cancelar la respuesta

Lo siento, debes estar conectado para publicar un comentario.

CONTÁCTANOS

Paseo de la Igualdad, 4
+34 648510688
[email protected]

Política de privacidad y cookies

NUESTROS CURSOS

Este sitio web utiliza cookies para que usted tenga la mejor experiencia de usuario. Si continúa navegando está dando su consentimiento para la aceptación de las mencionadas cookies y la aceptación de nuestra política de cookies , pinche el enlace para mayor información. plugin cookies

Skip to Content
Current Students
Prospective Students
Business Community
Faculty & Staff

Writing and Learning

Connect. collaborate. achieve..

Early Assessment Program
Written Communication Placement
Math Placement
Academic Coaching
Supplemental Instruction
Study Sessions
Study Strategies Library
Additional Campus Resources
GWR-Certified Courses
GWR Portfolio
Info for Transfer Students
Info for Blended Students

Writing Proficiency Exam Scoring

In general, keep in mind the following three things:

A well-organized essay has clarity both at the paragraph and essay level. Ideas flow logically through the essay and connections between ideas are made for the reader.
A well-developed essay has appropriate examples which support, amplify and clarify points made. Ideas are explored rather than repeated.
A well-expressed essay has not only sentence control and sentence variety but adequate control of grammar, punctuation, spelling and vocabulary.

Scoring Criteria

The exams are read holistically: the score is based on the total impression the essay conveys. Each paper is scored on four areas: comprehension, organization, development and expression.

Two faculty readers score your test on a scale from six (highest) to one (lowest). These scores are then combined. A total score of 8 or more reflects the two readers' agreement that the essay is passing. A score of 6 or less reflects the readers' decision that an essay does not pass. If the test has a pass-fail split (a 4 and a 3), the exam is reviewed carefully by a third reader, and his or her decision determines the final passing or failing score.

6—Exemplary Paper

Comprehension: Demonstrates a thorough understanding of the article in developing an insightful response.
Organization: Answers all parts of the question thoroughly; demonstrates strong essay and paragraph organization.
Development: Strongly develops the topic through specific and appropriate detail; logical, intelligent, and thoughtful; may be creative or imaginative.
Expression: Exhibits proficient sentence structure and usage but may have a few minor slips (e.g. an occasional misused or misspelled word, or comma fault); may show stylistic flair.

5—Proficient Paper

Comprehension: Demonstrates a sound understanding of the article in developing a well-reasoned response.
Organization: Displays effective paragraph and essay organization and answers all parts of the question.
Development: Skillfully and logically employs specific and appropriate details but may lack the level of insight or intelligence found in an exemplary paper.
Expression: Structures sentences effectively but may lack stylistic flair; keeps diction appropriate but may waver in tone; maintains sound grammar though may err occasionally.

4—Acceptable Paper

Comprehension: Demonstrates (sometimes by implication) a generally accurate understanding of the article in developing a sensible response.
Organization: Shows adequate paragraphing and essay organization but may give disproportionate attention to some parts of the question.
Development: Shows adequate logical development of the topic but may not be as fully developed as a superior essay or may respond in a way which is somewhat simplistic or repetitive.
Expression: Shows adequate command of sentence structure, using appropriate diction but may contain some minor problems in grammar, punctuation, or usage (problems which might annoy a reader but will not lead to confusion or misunderstanding).

3—Failing Paper

Comprehension: Demonstrates some understanding of the article but may misconstrue parts of it or make limited use of it in developing a weak response.
Organization: Does not address major aspects of the topic; presents a predominantly narrative response; is deficient in organization at the essay or paragraph level; lacks focus or wanders from the controlling idea.
Development: Consistently generalizes without adequate support; presents conclusions which do not logically follow from the premises or the evidence or consistently repeats rather than explores ideas.
Expression: Shows deficient sentence structure; uses a primer (grade school) style, or contains errors in mechanics (including spelling) which are serious or frequent enough to affect understanding.

2—Seriously Flawed Paper

Demonstrates poor understanding of the main points of the article, does not use the article appropriately in developing a response, or may not use the article at all.
Shows serious flaws in more than one important area of writing (organization, development, or expression).
Sentence level error is so severe and pervasive that other strengths ofthe paper become obscured. Clarity may exist only on the sentence level .

1—Ineffectual Paper

Demonstrates little or no ability to understand the article or to use it in developing a response.
Shows virtually no ability to handle the topic.
Reveals inability to handle the basic elements of prose.

Academic Preparation and Transitions

The Academic Preparation and Transitions Department plays an integral role in help incoming first-year students prepare for a successful college experience through the Early Assessment Program and the Supportive Pathways for First-Year Students program. More information is available on the Academic Preparation and Transitions webpage .

Learning Support Programs

The Learning Support Programs department offers a comprehensive menu of programs and resources designed to help you navigate course expectations and achieve your learning goals: free tutoring for subjects across the curriculum, peer-led supplemental workshops and study sessions provide support for STEM-specific courses, and an online study strategies library. More information is available on the Learning Support Programs webpage.

Graduation Writing Requirement

All undergraduate students who are seeking a Cal Poly degree must fulfill the GWR before a diploma can be awarded. Students must have upp division standing (completed 90 units) before they can attempt to fulfill the requirement and should do so before the senior year. The two pathways to GWR completion are 1) in an approved upper-division course and 2) via the GWR Portfolio. More information is available on the GWR webpage .

Support Learn by Doing

Cambridge C2 Proficiency (CPE): How to Write a Report

Mandatory task : no
Word count : 280-320
Main characteristics : descriptive, comparative, analytical, impersonal, persuasive
Register : normally formal but depends on the task
Structure : introduction, main paragraphs, conclusion (sub-heading for each paragraph)

Introduction

A report is written for a specified audience. This may be a superior, for example, a boss at work, or members of a peer group, colleagues or fellow class members. The question identifies the subject of the report and specifies the areas to be covered. The content of a report is mainly factual and draws on the prompt material, but there will be scope for candidates to make use of their own ideas and experience. Source: Cambridge English Assessment: C2 Proficiency Handbook for teachers

Reports in Cambridge C2 Proficiency are, unlike essays , not mandatory in the writing test. Instead of a report candidates might opt to write an article , a review or a letter .

Reports are very schematic

While some of the writing tasks in C2 Proficiency are quite free and open in terms of their paragraph structure and layout, reports follow a pretty rigid form, which makes it fairly easy to write them. With their sub-headings for each section and similar requirements in every task, candidates get a good grasp of report writing quite quickly.

This article shows you exactly how you can navigate those waters and how to score high marks with ease, so let’s get into it.

What a typical report task looks like

A report task in C2 Proficiency is usually very specific regarding the topic and the more detailed points you need to talk about in your text as well as the target reader you are writing for.

The three things I’ve mentioned above should always be the first things to find out when you analyse a writing task:

the topic of the task
the detailed points you have to include
the target reader

The topic of this specific task is a jobs fair for young people . A more close-up look reveals that we need to describe the event a little bit in general as well as two or three promotions in more detail . Thirdly, we evaluate if and how much a fair like this can open young people’s minds to career opportunities .

Last but not least, we are writing the report for our college website meaning that teachers, students and parents are going to read it. Therefore, the style of language doesn’t need to be super formal, but I would also not write in an informal style. Neutral seems to be the right choice so contractions (I’m, don’t, etc.) as well as some phrasal verbs are fine, but colloquial expressions that we would use just in spoken English are taboo.

How to organise your report

The paragraph structure of a report in C2 Proficiency is fairly straightforward. I would simply start with a title and an introduction that states what the report is about (in this case, we could describe the fair in the introduction), then continue with the sections that address the main points and finish with a conclusion.

Title & introduction

Main sections.

For most reports, this structure works very well. Depending on the task, we mostly use two or three main sections and you are free to choose whatever organisational form you think makes the most sense.

Plan your report before you start writing

I can’t stretch this point enough, but I would always note down a short plan to make sure my ideas are already saved somewhere before I start writing. This reassures you whenever you don’t know how to continue and it can save you a lot of time.

The best way to go about making your plan is to decide on the paragraph structure you want to use for a particular task and then to add a few short ideas of what you definitely want to include in each section. For our example task, I came up with this:

Title & introduction : jobs fair event summary; 10-12 July; 53 organisations; show career opportunities; specialists with valuable information
First special organisation : English teaching jobs abroad; information about the jobs; people were happy
Second special organisation : NGO from Peru; literacy and English classes; accommodation and food included; people had done programme before
Conclusion : everyone happy; event broadens young people’s horizon; visitors surprised by variety of options; definitely recommend it

I’ve decided on four paragraphs as there are basically four things we need to do: introduce the event, talk about two different organisations at the fair and comment on how it opens young people’s minds.

This whole process took less than five minutes, but I know that once I start writing, I have a roadmap prepared that can help me whenever I need it. I won’t have to waste time rearranging my ideas or changing the paragraph structure because it’s all done already.

The different parts of a report

Once you have a plan, you can get started with the actual writing process. Thanks to all the information you’ve already noted down, it should all be smooth sailing, but there are, of course, several things to take note of when writing a report and we are going to look at them paragraph by paragraph.

A report in Cambridge C2 Proficiency typically has a title, which can be descriptive and doesn’t need to be anything special. For example, for our example task from earlier, we could choose a basic title like “Jobs Fair”. That’s it.

The introduction or first paragraph of your text, however, is a little bit more important. Here, you want to show what the report is going to talk about. This can be quite explicit, but you can also give a more subtle description of the subject matter.

Jobs Fair Event summary From 10-12 July, a jobs fair with 53 organisations from all over Europe was held at the college to display career opportunities for the students. The different booths were manned expertly by specialists so as to give the best information possible and to show what the future might hold for graduates of the school.

First of all, there is the title plus the first subheading (Event summary). Subheadings are an essential part of the typical layout of a report so make sure that each section gets one.

Secondly, I state what the fair was about and what purpose the different organisations came to the event for, i.e. to show graduates a variety of future job opportunities, so the reader knows what to expect from this report .

On top of that, I tried immediately to use rather impersonal language , another common feature of reports, which includes passive verb forms, generalisations (It is said that; Many people said; in general; etc.) as well as avoiding personal pronouns like I or we.

With a good introduction under our belt, we can now get into the nitty gritty of the report. The task requires us to point out a few of the organisations present at the fair and explain in a little bit more detail what they were offering.

Obviously, you need to get a little bit creative and come up with some ideas, but as the report is for the college website, I thought it would be nice to talk about some options that are connected to the English language.

Promotions to highlight While the event as a whole went remarkably well, two stands were mentioned by many to be of particular relevance. One provided insights about job opportunities abroad as an English speaker, which includes teaching the language as well as tutoring children and teenagers. There was a wide variety of employment options presented together with the expected salary, working conditions and other things to consider before taking the leap, which quite a lot of visitors commented on in a very positive way. The second organisation that was indicated to me fairly often was an NGO from Peru which runs literacy campaigns and English language courses in rural areas around the country. They were looking to attract young people to their volunteer programme that involves teaching reading and writing to primary school children from disadvantaged backgrounds. Accommodation and food are both included so participants only need to cover the cost for their flights to Peru and back. The people working at the stand had all done it themselves so the visitors at the fair were given first-hand accounts of what working for the NGO is like.

As the two paragraphs belong to the same section of the report, only one subheading is necessary, but apart from that, I really focussed on the basics of good report writing.

I described in detail what the two organisations have to offer using some appropriate vocabulary (remarkably well; particular relevance; a wide variety of employment options; salary; working conditions; NGO; literacy campaigns; disadvantaged backgrounds; first-hand accounts) as well as more impersonal and general language (mentioned by many; visitors commented on it; was indicated to me).

The conclusion is the part where finish your report and make recommendations or suggestions based on the information provided in the previous sections. Here, you can give your personal opinion to round off the text. Again, don’t forget the subheading and use appropriate language.

Benefits of such job fairs After listening to other people’s thoughts on this kind of event I’m of the opinion that job fairs like this can truly broaden the horizon of young people who might not have formed a clear idea of what they want to do after school yet. Several of my friends mentioned that they simply had not been aware of the multitude of options available to them and that they would absolutely recommend it to everyone who needs some inspiration for their future.

I refer back to the previous sections (After listening to …), then give my opinion (… I’m of the opinion …) and make a recommendation (… they would absolutely recommend it …). It is that simple.

It is not that difficult to write a report in C2 Proficiency if you know how to plan the text and what language you should include. Obviously, at this level you should be able to play with the language and adapt each report to the topic of the task, but I hope that this article has helped you get a better idea of what goes into writing this kind of text.

If you want to practise with me, I offer writing feedback as well as private classes and I would love to hear from you soon.

Lots of love,

Teacher Phill 🙂

Cambridge C2 Proficiency (CPE): How to Calculate Your Score

Cambridge C2 Proficiency (CPE): How to Write a Letter

Cambridge C2 Proficiency (CPE): How Your Writing is Marked

Are native speakers better language teachers.

Reading Skills – 7 Great Tips To Improve

How To Stay Calm on Your Cambridge Exam Day

Exploring Students’ Generative AI-Assisted Writing Processes: Perceptions and Experiences from Native and Nonnative English Speakers

Original research
Open access
Published: 30 May 2024

Cite this article

You have full access to this open access article

Chaoran Wang ORCID: orcid.org/0000-0002-4140-2757 1

505 Accesses

Explore all metrics

Generative artificial intelligence (AI) can create sophisticated textual and multimodal content readily available to students. Writing intensive courses and disciplines that use writing as a major form of assessment are significantly impacted by advancements in generative AI, as the technology has the potential to revolutionize how students write and how they perceive writing as a fundamental literacy skill. However, educators are still at the beginning stage of understanding students’ integration of generative AI in their actual writing process. This study addresses the urgent need to uncover how students engage with ChatGPT throughout different components of their writing processes and their perceptions of the opportunities and challenges of generative AI. Adopting a phenomenological research design, the study explored the writing practices of six students, including both native and nonnative English speakers, in a first-year writing class at a higher education institution in the US. Thematic analysis of students’ written products, self-reflections, and interviews suggests that students utilized ChatGPT for brainstorming and organizing ideas as well as assisting with both global (e.g., argument, structure, coherence) and local issues of writing (e.g., syntax, diction, grammar), while they also had various ethical and practical concerns about the use of ChatGPT. The study brought to front two dilemmas encountered by students in their generative AI-assisted writing: (1) the challenging balance between incorporating AI to enhance writing and maintaining their authentic voice, and (2) the dilemma of weighing the potential loss of learning experiences against the emergence of new learning opportunities accompanying AI integration. These dilemmas highlight the need to rethink learning in an increasingly AI-mediated educational context, emphasizing the importance of fostering students’ critical AI literacy to promote their authorial voice and learning in AI-human collaboration.

Students’ voices on generative AI: perceptions, benefits, and challenges in higher education

Examining science education in chatgpt: an exploratory study of generative artificial intelligence.

Artificial Intelligence (AI) Student Assistants in the Classroom: Designing Chatbots to Support Student Success

Avoid common mistakes on your manuscript.

1 Introduction

The rapid development of large language models such as ChatGPT and AI-powered writing tools has led to a blend of apprehension, anxiety, curiosity, and optimism among educators (Warner, 2022 ). While some are optimistic about the opportunities that generative AI brings to classrooms, various concerns arise especially in terms of academic dishonesty and the biases inherent in these AI tools (Glaser, 2023 ). Writing classes and disciplines that use writing as a major form of assessment, in particular, are significantly impacted. Generative AI has the potential to transform how students approach writing tasks and demonstrate learning through writing, thus impacting how they view writing as an essential literacy skill. Educators are concerned that when used improperly, the increasingly AI-mediated literacy practices may AI-nize students’ writing and thinking.

Despite the heated discussion among educators, there remains a notable gap in empirical research on the application of generative AI in writing classrooms (Yan, 2023 ) and minimal research that systematically examines students’ integration of AI in their writing processes (Barrot, 2023a ). Writing–an activity often undertaken outside the classroom walls–eludes comprehensive observation by educators, leaving a gap in instructors’ understandings of students’ AI-assisted writing practices. Furthermore, the widespread institutional skepticism and critical discourse surrounding the use of generative AI in academic writing may deter students from openly sharing their genuine opinions of and experiences with AI-assisted writing. These situations can cause disconnect between students’ real-life practices and instructors’ understandings. Thus, there is a critical need for in-depth investigation into students’ decision-making processes involved in their generative AI-assisted writing.

To fill this research gap, the current study explores nuanced ways students utilize ChatGPT, a generative AI tool, to support their academic writing in a college-level composition class in the US. Specifically, the study adopts a phenomenological design to examine how college students use ChatGPT throughout the various components of their writing processes such as brainstorming, revising, and editing. Using sense-making theory as the theoretical lens, the study also analyzes students’ perceived benefits, challenges, and considerations regarding AI-assisted academic writing. As writing is also a linguistic activity, this study includes both native and non-native speaking writers, since they may have distinct needs and perspectives on the support and challenges AI provides for writing.

2 Literature Review

2.1 ai-assisted writing.

Researchers have long been studying the utilization of AI technologies to support writing and language learning (Schulze, 2008 ). Three major technological innovations have revolutionized writing: (1) word processors, which represented the first major shift from manual to digital writing, replacing traditional typewriters and manual editing processes; (2) the Internet, which introduced web-based platforms, largely promoting the communication and interactivity of writing; and (3) natural language processing (NLP) and artificial intelligence, bringing about tools capable of real-time feedback and content and thinking assistance (Kruse et al., 2023 ). These technologies have changed writing from a traditionally manual and individual activity into a highly digital nature, radically transforming the writing processes, writers’ behaviors, and the teaching of writing. This evolution reflects a broader need towards a technologically sophisticated approach to writing instruction.

AI technologies have been used in writing instruction in various ways, ranging from assisting in the writing process to evaluating written works. One prominent application is automatic written evaluation (AWE), which comprises two main elements: a scoring engine producing automatic scores and a feedback engine delivering automated written corrective feedback (AWCF) (Koltovskaia, 2020 ). Adopting NLP to analyze language features, diagnose errors, and evaluate essays, AWE was first implemented in high-stakes testing and later adopted in writing classrooms (Link et al., 2022 ). Scholars have reported contrasting findings regarding the impact of AWE on student writing (Koltovskaia, 2020 ). Barrot ( 2023b ) finds that tools offering AWCF, such as Grammarly, improves students’ overall writing accuracy and metalinguistic awareness, as AWCF allows students to engage with self-directed learning about writing via personalized feedback. Thus the system can contribute to classroom instruction by reducing the burden on teachers and aiding students in writing, revision, and self-learning (Almusharraf & Alotaibi, 2023 ). However, scholars have also raised concerns regarding its accuracy and its potential misrepresentation of the social nature of writing (Shi & Aryadoust, 2023 ). Another AI application that has been used to assist student writing is intelligent tutoring system (ITS). Research shows that ITS could enhance students’ vocabulary and grammar development, offer immediate sentence- and paragraph-level suggestions, and provide insights into students’ writing behaviors (Jeon, 2021 ; Pandarova et al., 2019 ). Scholars also investigate chatbots as writing partners for scaffolding students’ argumentative writing (Guo et al., 2022 ; Lin & Chang, 2020 ) and incorporating Google’s neural machine translation system in second language (L2) writing (Cancino & Panes, 2021 ; Tsai, 2019 ).

Research suggests that adopting AI in literacy and language education has advantages such as supporting personalized learning experiences, providing differentiated and immediate feedback (Huang et al., 2022 ; Bahari, 2021 ), and reducing students’ cognitive barriers (Gayed et al., 2022 ). Researchers also note challenges such as the varied level of technological readiness among teachers and students as well as concerns regarding accuracy, biases, accountability, transparency, and ethics (e.g., Kohnke et al., 2023 ; Memarian & Doleck, 2023 ; Ranalli, 2021 ).

2.2 Integrating Generative AI into Writing

With sophisticated and multilingual language generation capabilities, the latest advancements of generative AI and large language models, such as ChatGPT, unlock new possibilities and challenges. Scholars have discussed how generative AI can be used in writing classrooms. Tseng and Warschauer ( 2023 ) point out that ChatGPT and AI-writing tools may rob language learners of essential learning experiences; however, if banning them, students will also lose essential opportunities to learn how to use AI in supporting their learning and their future work. They suggest that educators should not try to “beat” but rather “join” and “partner with” AI (p. 1). Barrot ( 2023a ) and Su et al. ( 2023 ) both review ChatGPT’s benefits and challenges for writing, pointing out that ChatGPT can offer a wide range of context-specific writing assistance such as idea generation, outlining, content improvement, organization, editing, proofreading, and post-writing reflection. Similar to Tseng and Warschauer ( 2023 ), Barrot ( 2023a ) is also concerned about students’ learning loss due to their use of generative AI in writing and their over-reliance on AI. Moreover, Su et al. ( 2023 ) specifically raise concerns about the issues of authorship and plagiarism, as well as ChatGPT’s shortcomings in logical reasoning and information accuracy.

Among the existing empirical research, studies have explored the quality of generative AI’s feedback on student essays in comparison to human feedback. Steiss et al. ( 2024 ) analyzed 400 feedback instances—half generated by human raters and half by ChatGPT—on the same essays. The findings showed that human raters provided higher-quality feedback in terms of clarity, accuracy, supportive tone, and emphasis on critical aspects for improvement. In contrast, AI feedback shone in delivering criteria-based evaluations. The study generated important implications for balancing the strengths and limitations of ChatGPT and human feedback for assessing student essays. Other research also examined the role of generative AI tools in L1 multimodal writing instruction (Tan et al., 2024 ), L1 student writers’ perceptions of ChatGPT as writing partner and AI ethics in college composition classes (Vetter et al., 2024 ), and the collaborative experience of writing instructors and students in integrating generative AI into writing (Bedington et al., 2024 ).

Specifically with regard to classroom-based research in L2 writing, Yan ( 2023 ) examined the use of ChatGPT through the design of a one-week L2 writing practicum at a Chinese university. Analyzing eight students’ classroom behaviors, learning logs, and interviews, the study showed that the use of generative AI helped L2 learners write with fewer grammatical errors and more lexical diversity. The study also found that the students’ biggest concerns were the threat to academic honesty and educational equity. This study is a pioneer in exploring students’ strategies and engagement with ChatGPT in writing; however, it was only conducted through a one-week practicum which did not involve authentic writing assignment tasks. Furthermore, students’ use of ChatGPT was limited to editing AI-generated texts instead of incorporating AI in a wider range of writing activities such as pre-writing and revising human generated texts. In another study by Han et al. ( 2023 ), the authors designed a platform that integrated ChatGPT to support L2 writers in improving writing quality in South Korea. Analyzing 213 students’ interaction data with the platform, survey results, as well as a focus group interview with six students and one instructor, the study found that the students generally held positive experiences with ChatGPT in supporting their academic writing. Although the study undertook a more extensive investigation involving a larger poll of participants with authentic writing assignments, it only explored generative AI’s role as a revision tool without examining its use across various stages of writing. Furthermore, participants in this study were tasked with engaging with a ChatGPT embedded platform of predefined prompts designed by the researchers. Consequently, how students interact with ChatGPT in natural settings remains largely unknown for researchers and educators.

2.3 Writing Process

Since the early 1980s until now, scholars have proposed various writing process models (e.g., Abdel Latif, 2021 ; Flower & Hayes, 1981 ; Hayes, 2012 ; Kellogg, 1996 ), yet they are still trying to form a complete understanding of composing processes. Despite the distinct specific aspects that different models highlight in the writing process, they all negate writing as a linear, sequential process of simply a text generation labor, but emphasize the non-linear and recursive nature of the writing process. Abdel Latif ( 2021 ) noted that various components of writing process such as ideational planning, searching for content, and revising interact with each other, and that both novice and experienced writers employ all of the components but with varying degrees and strategies. For instance, skilled writers refine and revise their ideas during writing, whereas novice writers mostly engage in sentence level changes such as fixing grammatical and lexical issues (e.g., Khuder & Harwood, 2015 ). For L2 students, writing can be very complex and cognitively daunting (Mohsen, 2021 ) due to reasons including but not limited to linguistic barriers (Johnson, 2017 ). Furthermore, writing is more than a cognitive process, it is also a social, cultural, and situated activity. For instance, the concept of plagiarism may carry different meanings and consequences across different cultural contexts. Thus, writing should be investigated in consideration of its dynamic interplay with institutional, cultural, and technological factors (Atkinson, 2003 ).

Considering the intricate nature of writing as a cognitive and social activity, it is thus important to investigate how generative AI may impact the different components of students’ writing processes. However, there is still a substantial gap in knowledge and research about students’ real-world integration of AI into their writing workflows, their decision-making processes, and the rationale behind their decision making while they interact with generative AI and utilize the technology in their writing in formal educational settings. While previous studies shed light on the impacts of generative AI on English writing, empirical classroom-based research remains limited. To further understand how students, both L1 and L2 writers, engage with generative AI in real-life classroom contexts, with authentic writing tasks, and throughout their various processes of writing, the current study thus undertook a naturalistic, exploratory direction that focused on how college students utilized ChatGPT in a first-year writing class in the US. Understanding and unpacking students’ AI-assisted writing processes could help educators better adjust their pedagogy in the face of the growing AI influences. The following research questions guided the present study:

How do students utilize ChatGPT in their writing processes?

How do student writers perceive the benefits of integrating ChatGPT into their writing?

What concerns and limitations do students experience when using ChatGPT to assist with their writing?

What considerations do students identify as important when engaging in generative AI-assisted writing?

3 Theoretical Framework

This study adopts sensemaking theory as its theoretical lens. Sensemaking has been conceptualized as the process through which individuals make meaning from ambiguous and puzzling situations that happen in their experience (Golob, 2018 ). Some scholars view sensemaking as a cognitive process of managing and processing information. This perspective focuses on the cognitive strategies employed in connecting and utilizing information to achieve the purpose of explaining the world (Klein et al., 2006 ). Alternatively, a socio-cultural orientation towards sensemaking regard it as construction of collective identity through an individual’s ongoing interactions with the educational context (Weick, 2005 ). Poquet ( 2024 ) integrates these two theoretical orientations, proposing that sensemaking encompasses both the individual and the collective, drawing attention to how learners explain the cognitive aspects of their learning as well as the social and cultural factors shape their learning experiences.

According to Poquet ( 2024 ), there are three components of the sensemaking process: (1) An individual’s understanding of the activity, available tools, and the situation is the antecedent of sensemaking. (2) Noticing and perceiving constitute the process of sensemaking per se. Noticing involves the identification of salient features of the tool(s) for the activity, while perceiving goes beyond noticing through making sense of the observed, taking into account contextual factors such as learner characteristics and the type of activity undertaken. Perceiving leads to the formulation of meaning and potential implications of what is noticed, playing a critical role in decision-making and action. (3) Outcomes of sensemaking may range from perceived affordances of tools for the activity to casual explanations for the observed phenomena. As defined by Poquet ( 2024 ), sensemaking involves learners crafting explanations for unclear situations through dynamically connecting information within the context of a specific activity. Essentially, sensemaking is both an intentional and intuitive process shaped by how learners understand their environment and their role within it.

Because sensemaking theories aim to examine people’s meaning-making, acting, and experience in “unknown,” “less deliberate,” and “more intuitive” situations (Poquet, 2024 , p. 5), it well aligns with the purpose of this study which is to form an emergent understanding of a less known situation given the relatively new phenomenon of generative AI-assisted writing practices among college students. Adopting a sensemaking lens helps to understand how students make sense of generative AI, how they perceive its affordances, what strategies they develop to use it to assist with their writing, what puzzling experiences they may have, and how they make decisions in those puzzling situations. The dual focus of the cognitive and the social is critical when examining how students engage with and perceive the AI technology and how they negotiate these perceptions and experiences within the learning communities of higher education. Sensemaking theory can also capture the range of individual experiences and shared interpretations among them, elucidating how they deal with uncertainty and make judgments generative AI usage.

4 Research Design

This qualitative study adopted a phenomenological research design, which focuses on understanding and interpreting a particular aspect of shared human experience (Moran, 2002 ; Smith, 1996 ). Phenomenology seeks to form a close and clear account of people’s perceptions and lived experiences as opposed to delivering a positivist conclusion of human encounters, as “pure experience is never accessible” (Smith et al., 2009 , p. 33). In the present study, as there is limited understanding of students’ engagement with ChatGPT in their writing process, a phenomenological lens could help capture participants’ own sense making of their AI-assisted writing experiences.

4.1 Context and Participants

The study took place in spring 2023 at a higher education institution in the US. I chose to focus on first-year writing as the study setting, as it is a required course in most colleges and universities, thus a typical writing and learning context for most college students. First-year writing serves as the foundation for cultivating academic writing skills, with the aim of developing students’ essential literacy and writing proficiency needed for their undergraduate learning experiences. The 14-week course focused on English academic reading, writing, and critical thinking and consisted of three major units.

This study focused on the last unit, which was about argumentative writing, a common type of academic essay writing (American Psychological Association, 2020 ). The final essay asked students to form an argumentative response to a research question of their own choice. The unit, lasting for three weeks, was structured as follows (see Fig. 1 ): During the first week, the instructor spent two classes, each for 75 min, introducing ChatGPT (GPT 3.5) as a large language model and inviting students to explore ChatGPT as a tool for writing. The instructor carefully chose and assigned five readings that allowed the students to grasp the ongoing academic and public debates and concerns regarding the use of ChatGPT in writing and educational settings. During the class sessions, students participated in various activities exploring the functionalities of ChatGPT, discussed ethics and academic integrity, and critiqued AI-generated writing. As part of the discussions on ethics, the instructor explicitly addressed academic integrity issues drawing upon both the writing program’s guidelines and the institution’s academic integrity policies to ensure that the students were aware of and committed to ethical use of generative AI in the writing class. During the second week, students learned various strategies for integrating sources in academic writing and practiced ways of using sources to build arguments. During the last week, students spent time peer reviewing each other’s work and met with the instructor individually to improve their drafts.

Unit design with key topics and learning tasks over the three weeks

The final essay allowed but did not mandate students to use ChatGPT. For those who used ChatGPT and AI writing tools, disclosure and transparency about how AI was used were required as part of the submission to the assignment. The instructor left using AI in their final essay as an open option to the students themselves, ensuring that students could pursue the option that works best for their individual needs. Thus the unit provided various opportunities and flexibility for planning, researching, drafting, reviewing, and editing with ChatGPT throughout students’ writing process.

There were 11 students, all freshmen, enrolled in the class. All but one reported using ChatGPT in their writing. Six students were recruited based on their willingness to participate and the diversity of their first language to ensure a balanced coverage. Table 1 shows the demographic information of the students (with pseudonyms).

4.2 Data Collection

Aligned with an interpretive phenomenological design that focuses on exploring participants’ lived experiences and how they construct meaning of their own experiences (Smith & Shinebourne, 2012 ), I collected three major types of data in order to uncover the students’ writing processes involving ChatGPT and their perceptions. First, I collected students’ written products and artifacts such as in-class writing, screenshots of students’ conversations with ChatGPT, informal short writing assignments, and the formal writing assignments for the final argumentative essay. Second, I collected students’ written self-reflections about their use of ChatGPT in writing. Finally, the participants were interviewed for around 30–40 min, and all interviews were audio-recorded. These semi-structured interviews were developed around students’ former experiences with ChatGPT, their views of the tool, and the ways they utilized ChatGPT for their writing assignments in this class.

Students’ conversational screenshots with ChatGPT and their in-class and outside class writing drafts could demonstrate their interactions with AI as well as the changes they made upon contemplating the responses from the chatbot. The interviews and students’ self-reflections could further shed light on their perceptions and decision-making. Multiple sources of data helped to understand students’ behaviors, perceptions, and engagement with AI during different stages of writing. Triangulation of the data also helped me to understand students’ rationale for and practices of integrating, discounting, and reflecting on the chatbot’s output into their writing.

It is important to note that a phenomenological qualitative research design like this aims to provide in-depth understanding and insights into participants’ experiences. The context of the study—a first year writing class—and the specific type of assignment investigated are both common scenarios in college classrooms, thereby enhancing the study’s relevance despite its limited sample size and scale. Furthermore, the incorporation of data collected from multiple and diverse sources for triangulation adds to insights into participants’ experiences, which helps strengthen the credibility of the study.

4.3 Data Analysis

Thematic analysis (Creswell, 2017 ) was used to analyze the written drafts and transcriptions of interview data as it is commonly used in qualitative studies to identify patterns across various types of data (Lapadat, 2012 ). While transcribing all the interview data verbatim into written scripts, I took notes with the research questions in mind. Then I organized and read through the various types of written data to get familiar with and form a holistic impression of participants’ perceptions and experiences of AI-assisted writing. The coding, conducted on Nvivo, a qualitative data analysis software, followed an inductive and iterative process. During the first cycle of coding, I reviewed the data line-by-line and applied in vivo coding to generate initial, descriptive codes using participants’ voices (Saldaña, 2016 ). For the second cycle, I identified patterns across the in vivo codes and synthesized them into 20 pattern codes (Saldaña, 2016 ). During the third cycle, I clustered and grouped the pattern codes into four emerging themes. To finalize and refine the themes, I double checked themes, codes, and the supporting data guided by the research questions. Table 2 shows the themes and pattern codes. To ensure the trustworthiness of the qualitative analysis, I also conducted a peer debriefing (Lincoln & Guba, 1985 ) on the codebook with an experienced qualitative researcher. Furthermore, member check was also conducted with each participant via email to minimize the possible misinterpretations of their perceptions and experiences.

5.1 How Do Students Utilize ChatGPT in Their Writing Processes?

The students reported using ChatGPT throughout different components of writing their argumentative essays including (1) brainstorming, (2) outlining, (3) revising, and (4) editing.

In terms of brainstorming, the students acknowledged the value of ChatGPT in helping them get initial ideas and inspirations prior to the research phase for their essays. For instance, Lydia was interested in writing about the cause of the low fertility rate in South Korea but she “had trouble thinking of any focus areas” (Lydia, Reflection). In order to narrow down the topic and find a good focus, she used ChatGPT for exploring possible directions she could pursue. As she noted:

It immediately gave me suggestions to approach the cause from demographic changes, economic factors, traditional gender roles, governmental policies, and cultural attitudes with detailed explanations beside each suggestion. So, I went on to pick economic reasons, which I think were the most accessible to write about. (Lydia, Reflection)

ChatGPT’s feedback facilitated a smoother decision-making process for Lydia regarding the specific topic to further investigate. Another student Kevin mentioned that running his initial research idea into ChatGPT was helpful because ChatGPT gave him “some relevant ideas that hadn’t crossed his mind when thinking about the topic” (Kevin, Written Assignment).

Considering ChatGPT’s suggestions did not mean that the students just took them for granted and incorporated them unquestioningly. For instance, Nora was interested in writing about the impact of AI on human lives. Upon putting her initial research question into ChatGPT, she found the feedback helpful and decided to do more research on the aspects highlighted by ChatGPT (see Fig. 2 ).

Screenshot of Nora’s conversation with ChatGPT

Students also reported using ChatGPT for outlining. Emma used ChatGPT extensively to help organize her outline and shared her procedure as follows:

I wrote my own outline first consisting of my own ideas and then put it into ChatGPT. I asked ChatGPT to make the outline flow better. I was surprised with the results it gave me. It made the ideas more concise and connected better off of each other...I tried it a few times, and every time it gave me a different version of the outline that I could potentially use. I ultimately compared the information from my sources and chose an outline I thought best suited my essay and my essay question. (Emma, Reflection)

Emma’s approach revolved around utilizing ChatGPT to unearth linkages among her various initial yet disorganized ideas she already had. By experimenting with diverse ways to build coherence and connection among her thoughts with the aid of AI, she shortcut the mental task of structuring her ideas from scratch.

Using ChatGPT for refining the flow of ideas was also a strategy adopted by other students, but not always during the outlining stage. For instance, after completing her first draft, Lydia “copied and pasted her entire essay into the chatbox and asked for suggestions on how to improve the structure and argument” (Lydia, Reflection). Lydia underlined that her revision process with ChatGPT was iterative, as she put her revised version back into the chatbot and went through another round of feedback and subsequent revision. Additional applications reported by students also encompassed employing ChatGPT to reduce redundancy and enhance conciseness of content (Emma) as well as to refine topic sentences for accurately summarizing the main ideas of body paragraphs (Kevin).

Apart from utilizing ChatGPT to assist with global level issues such as structure, argument, and coherence, the students also harnessed the AI tool for sentence-level issues. They unanimously agreed that ChatGPT was a valuable tool for language editing. Alex, a L1 student, commented that ChatGPT could edit essays “exceptionally well.” Alex not only used the AI tool to help improve the syntax of his writing such as “run-on sentences” but also consulted it as his dictionary for “providing academic diction” (Alex, Interview). The L2 participants particularly acknowledged ChatGPT as beneficial for enhancing the accuracy of their writing. Lydia shared that upon completing a paragraph of her essay, she would put it into ChatGPT and ask it to “revise the wording and grammar only” so she could refine her language and keep the content original (Lydia, Reflection). Another L2 student Nora noted that “when I struggle with expressing my thoughts accurately in English words, ChatGPT can help me express those ideas in a more powerful and accurate way. It removes communication barriers” (Nora, Written Assignment).

5.2 How Do Student Writers Perceive the Benefits of Integrating ChatGPT into Their Writing?

Utilizing ChatGPT in their various writing process components, the students reported that ChatGPT had the following benefits: (1) accelerating their writing process, (2) easing their cognitive load, (3) fostering new learning opportunities, (4) getting immediate feedback, and (5) promoting positive feelings about writing.

Students stated that using ChatGPT could “speed up the process of writing” (Alex, Interview) as exemplified by the following quotes: “ChatGPT really helped me to explore the essay topics that I’m interested in within a very short amount of time and identify what can be written about” (Nora, Interview); “I discovered after using it for my final essay that ChatGPT can greatly improve the efficiency of my writing” (Alex, Reflection). For L2 writers, it significantly saved the time they typically spent on editing, as mentioned by Lydia:

As an international student who is not a native English speaker, writing college essays would take me double the amount of time compared to those who write essays in their first language. Oftentimes, the biggest time I spent was on editing the grammar and trying to make my language readable and understandable. (Lydia, Reflection)

The benefits of saving the time and energy on language concerns, grammar, wording, and the organization of ideas and messy drafts, furthermore, reduced the cognitive burden among the student writers, both L1 and L2. For instance, knowing ChatGPT’s editing power, Alex felt that he was able to “focus more on the subject of the writing rather than the language itself” and “spew out thoughts freely” when drafting the essay (Alex, Interview). Likewise, the L2 students noted that ChatGPT allowed them to delay their concerns about the linguistic forms of ideas and alleviate the demanding cognitive load associated with L2 writing. As claimed by Lydia, “It freed my thoughts so that I could spend more time revising the content, but not worry about how to express my ideas for the essay” (Lydia, Interview).

The students conveyed that incorporating ChatGPT in different components of writing also fostered new learning opportunities for them to improve writing. Nora shared that “ChatGPT not only made my language more fluent and comprehensible, but it also helped me to learn new ways of expression in English” (Nora, Interview). Su remarked that although ChatGPT’s feedback was generic, it promoted her to do further research about her topic and learn more writing strategies (Su, Written Assignment).

Students particularly highlighted the “instant and personalized feedback” (Kevin, Reflection) provided by ChatGPT as a strong impetus and benefit. For instance, as a frequent visitor of the school’s writing center, Lydia mentioned she typically scheduled two to three appointments with a writing tutor for each major writing assignment she had worked. With ChatGPT, she could obtain feedback anytime: “Now I don’t have to physically go to the writing center at 11 pm, waiting for the previous visitor to finish their session” (Lydia, Interview). She used “my walking AI tutor” to describe the role of AI in her writing.

Ultimately, the students mentioned that these cognitive and practical benefits of ChatGPT not only improved their efficiency of writing, but also promoted positive feelings about writing. They used words such as “more relieved” (Emma), “sense of accomplishment” (Lydia), and “less anxious” (Nora) to describe the AI-assisted writing process. Although the students expressed different needs and utilization of ChatGPT, they all conveyed that they would like to continue using it in the future.

5.3 What Concerns and Limitations Do Students Experience When Using ChatGPT to Assist with Their Writing?

Despite the benefits and usefulness of ChatGPT for assisting with students’ writing, they also expressed many reservations and limitations regarding the AI tool. The first concern was about the false information it produced and its potential to mislead people. The students commented that ChatGPT tended to “make up information” (Emma), “make assumptions and guesses” (Su), and generate “inaccurate information” (Nora), “wrong information” (Alex), and “nonsense” (Lydia). Furthermore, the students pointed out that ChatGPT was inadequate in addressing high-level questions requiring critical thinking, as Su explained: “When I was consulting with ChatGPT, I learned that it has a very limited understanding of the topic I was talking about” (Su, Reflection). Other students also pointed out that the responses they got from ChatGPT could be “very generalized” (Kevin) and lacked “depth and complexity” (Nora).

The next shortcoming of ChatGPT, as noted by the students, is the lack of creativity and originality. Su highlighted that relying on ChatGPT’s ideas would not yield intriguing essays, as even though ChatGPT’s responses may “appear to make sense,” they usually came across as “cliched and superficial.” Su understood that it was because ChatGPT and large language models “work based on the patterns and data they have been trained on and cannot think outside of this” (Su, Reflection). Therefore, it is “not effective in generating new ideas” for an essay (Alex, Interview).

The algorithm unavoidably led to another limitation as observed by the students, which is the lack of reliable evidence and support for the content generated by ChatGPT. Su acknowledged that ChatGPT was not a good source for writing as it was impossible for a reader to trace the original information. Apart from the lack of clarity and transparency about the sources ChatGPT draws upon, Kevin pointed out an additional drawback that ChatGPT’s ideas were “not up to date,” thus not a good source for academic writing (Kevin, Written Assignment).

5.4 What Considerations Do Students Identify as Important When Engaging in Generative AI-Assisted Writing?

Presented with these limitations of ChatGPT, the students shared some important aspects they think should be considered when incorporating AI into writing, summarized as follows: (1) balanced and moderate use of AI, (2) critical use of AI, (3) ethical considerations, (4) the need for human voice, (5) the importance of authenticity, (6) seizing AI as a learning opportunity, and (7) transparency from and conversation between teachers and students.

The students worried that over-reliance on ChatGPT would undermine their writing ability, so they should use ChatGPT to a balanced and moderate extent. The students believed that ChatGPT should be used as “guidance,” “support,” “supplement,” and “assistant” (Alex, Reflection) rather than a “substitute” or “replacement” (Su, Reflection).

Furthermore, the students emphasized the importance of critical use of AI. Emma noted that AI platforms could “decline the need to think critically” as some students might want to “take the easy route and just get the answer” (Emma, Interview). They insisted keeping a critical eye on the information AI generated as it was not reliable. To do this, students shared similar strategies which was to use ChatGPT as a departure rather than a destination for writing, thinking, and research. They underscored the importance of validation and critical thinking in this process.

Another facet to consider is the ethical use of AI. The students believed that one must be very careful when using ChatGPT as it can easily walk the line of plagiarism. They deemed acts such as using ChatGPT to generate new ideas and write entire essays unethical, as these are forms of taking credit for other people’s work based on their language and ideas (Kevin, In-Class Writing). Thus students emphasized the importance of “doing research on your own” (Emma), “making sure the ideas are my own” (Lydia), and “not using everything (i.e. sentence by sentence, word by word) provided by ChatGPT” (Su).

The students also regarded the issue of retaining human voice a pivotal consideration for AI-assisted writing. They pointed out that writing should be a means to express the writer’s identity and thoughts, but AI was not able to personalize the text to their individual style and voice. Wary of the threat posed by extensive adoption of ChatGPT to individual expressions, Lydia commented, “ChatGPT tended to use similar dictions and patterns of wording and sentence structures. If everyone uses ChatGPT, our style will become more and more alike” (Lydia, Interview). Similarly, Su pointed out that ChatGPT could make the text “sound generic and impersonal,” which is a problem “when you are trying to convey your own ideas, feelings, and perspectives” (Su, Written Assignment). To “truly present a unique perspective and make writing individualized,” one must “take full control” of their writing to deliver a powerful message (Kevin, Reflection). This process requires the discernment to dismiss advice from ChatGPT to avoid generating an impersonal, blunt style of writing that lacks the writer’s distinct character.

Students also pointed out that the involvement of ChatGPT in writing may not only jeopardize how human voice is conveyed through the ideas ChatGPT generates, but also through the language it produces, thus “ruining the authenticity of an essay” (Alex, Reflection). He questioned himself for a paradoxical use of ChatGPT. On the one hand, he utilized ChatGPT for editing and better academic diction; on the other, he was perplexed and apprehended about the tipping point where the essay would start to sound “more like ChatGPT rather than yourself.” As he explained:

ChatGPT suggested some words I never would have used, and I decided not to include them. While they may obviously sound better than my own authentic words, I just did not feel honest using them. For instance, when writing this paper, ChatGPT suggested I use “judiciously” rather than “in moderation.” I never would have used “judiciously,” and it felt unauthentic to use it. (Alex, Reflection)

The students suggested cautious, strategic, and purposeful use of ChatGPT’s editing features to ensure it amplifies rather than conflicts with their own writing style.

However, boundaries like this still appeared to the students as vague. Hence, the students called for guidelines and instructions in the classroom and open conversation between teachers and students. The students expressed their confusion over the lack of clear guidelines across their classes. As Alex commented, “It’s hard to draw lines with different ways of using ChatGPT and which one would be considered cheating or not” (Alex, Interview). The students hoped that all their instructors, instead of only writing teachers, could engage in comprehensive discussions about what specific ways of using ChatGPT would be regarded as acceptable or problematic according to their disciplinary conventions and learning purposes.

Participants also expected that school policies and instructors would not shut down AI as a resource and learning opportunity for students. Emma said, “It’s tricky as there are a lot of different opinions, but technology is the world we live in. We should go with the grain as opposed to against it” (Emma, Interview). Cautious of possible missed learning opportunities that AI might bring to thinking, Lydia commented, “I am afraid of becoming lazy…But I guess it also depends on how you use it. It gives a shortcut for people who do not want to make the effort to learn and think. But it could be useful for those who really want to learn” (Lydia, Interview). Alex noted that to prevent the loss of learning opportunity, for instance, he decided that rather than taking ChatGPT’s diction suggestion immediately, he “would use those words in the next essay,” demonstrating his effort in learning and internalizing the knowledge. In general, the students were still exploring ways to use ChatGPT in critical, authentic, and ethical ways that would promote rather than harm their learning.

6 Discussion

Adopting sensemaking theory, the study investigated how students made sense of their AI-assisted writing practices, providing insights into students’ learning process and their shared practices emerging around the AI technology. Confirming previous research (e.g., Guo et al., 2022 ; Holmes et al., 2019 ; Su et al., 2023 ), this study found that the students overall had positive experiences with generative AI-assisted writing, for it could accelerate their writing process, reduce their cognitive load and anxiety, and provide prompt feedback. The students integrated ChatGPT into various components of their composing process, such as searching for content, ideational planning, language editing, and revising. Although the students acknowledged the cognitive and affective benefits (e.g., Ebadi & Amini, 2022 ; Fryer & Carpenter, 2006 ) of using ChatGPT in writing, they were very cautious about adopting its ideas and suggestions at different discourse levels (i.e., essay, paragraph, and sentence levels) due to critical, ethical, and authentic concerns. This finding extends previous research which identified that students’ primary concerns were academic dishonesty and educational inequity (Yan, 2023 ). Despite recognizing AI’s limitations such as the lack of in-depth insights (Gao et al., 2022 ), originality, creativity, and reliability—qualities essential for good academic writing—the students deemed it necessary to embrace rather than abandon the tool, with the purpose of fostering one’s critical thinking and writing skills. The results suggest that students’ sensemaking of AI-assisted writing is shaped by their prior knowledge and understanding of writing as a cognitive and sociocultural activity, their exploration of AI’s functionalities and strategies for leveraging them to achieve learning goals, and their interrogation of the appropriateness and limitations of AI in the specific context of academic writing.

The study highlights two emerging dilemmas students experienced in their generative AI-assisted writing processes. The first dilemma, as Alex put it, is the choice between sounding better or sounding like me when integrating AI into the decision making process of writing, reflecting a larger issue about academic integrity, authenticity, and voice in human-AI collaboration. The participants believed that it is crucial to prevent their writing from being AI-nized, which could lead to either plagiarism or a writing style resembling AI that overshadows their own voice—the very essence of their “identity and presentation of the self in writing” (Prince & Archer, 2014 , p. 40). The students’ beliefs align with a connectivism paradigm of AI in Education (AIEd) outlined by Ouyang and Jiao ( 2021 ), in which AI serves as a tool to augment human intelligence and capability (Yang et al., 2021 ) and learner agency is placed at the core. Reliance on AI could lead to superficial engagement with writing tasks, discouraging deeper, reflective thought processes essential for original creative expression. Furthermore, when AI suggests similar vocabulary, structures, and styles to various learners, it risks imposing a uniformity in expression that undermines the educational value of cultivating each individual’s unique and creative voice. AI may hinder students from exploring how language variation and linguistic diversity can be rich resources for meaning-making, creativity, identity formation, problem-solving (Wang et al., 2020 ). Such critical engagement with diverse language resources is crucial for developing students’ literacy skills in a digital age where multicultural awareness is an integral part of education (Sánchez-Martín et al., 2019 ). As Dixon-Román et al. ( 2020 ) noted, educators must be wary of AI’s “racializing forces,” which standardize learning processes in ways that can marginalize non-dominant forms of knowledge and communication, as well as students whose experiences and identities are either not represented or misrepresented in the system.

While the participants concurred that upholding human voice and agency entails possessing integrity and alignment not only at the ideational level but also in the linguistic expression of those ideas, the L2 writers in this study added another nuanced dimension to the impact of AI on human voice and authenticity in the context of AI-assisted writing. As the L2 students experienced, ChatGPT’s language suggestions might not pose a threat to their voice but serve as a catalyst for augmenting their voice, as AI helped overcome their language barriers and better express ideas true to themselves. In other words, generative AI afforded the L2 writers powerful language repertoires that enhanced the accuracy and efficiency of “linguistic rehearing” (Abdel Latif, 2021 ) or “translating” (Kellogg, 1996 ) component of their writing process, thus allowing L2 students to produce writing more authentic to themselves. The finding highlights how learner characteristics and individual differences play an important role in students’ sensemaking of AI-assisted writing, complicating the existing understanding of AI’s affordances for learners with diverse linguistic backgrounds and learning needs.

From earlier conceptualizations of authenticity as “ownedness” and “being one’s own” by Heidegger (1927/ 1962 ), to contemporary perceptions as the “self-congruency” of an individual, group, or symbolic identity (Ferrara, 1998 , p. 70), the notion of authenticity has been evolving and becoming more pluralistic. As Rings ( 2017 ) acknowledged, authenticity extends beyond adherent to personally endorsed commitments; it requires a comprehensive consideration of one’s self-awareness and the changing social context. Scholars should further pursue what it means by authenticity and academic integrity in an increasingly AI-mediated educational context, ways to promote students’ authorial voice and agency, as well as the complicated authorship issues (Jabotinsky & Sarel, 2022 ) involved in AI-human collaboratively generated texts. As Eaton ( 2023 ) claims, it is time to contemplate “postplagiarism” and academic integrity in a future where “hybrid human-AI writing will become normal”.

Apart from the sounding better or sounding like me dilemma experienced by students, another paradox is whether AI caused missed learning opportunities or created new learning opportunities . As noted by the previous literature, AI-writing tools may rob students of essential learning experiences (Barrot, 2023a ; Tseng & Warschauer, 2023 ). Adding to this concern from educators and scholars, the present study shows that the students themselves are also cognizant of the possible learning loss due to AI adoption. Furthermore, the study shows that rather than passively indulging themselves in the convenience of AI tools, a common concern among educators (Chan & Hu, 2023 ; Graham, 2023 ), the student writers attempted to seize new learning opportunities that emerged from AI technologies to promote their critical thinking and writing. This finding suggests a nuanced addition to sensemaking theory: the process of making sense of uncertainties in AI-infused literacy practices can also be uncertain, involving reconciling dilemmas and acknowledging perplexing experiences. While not always yielding clear-out outcomes or casual attributions for the observed phenomena and personal experience as suggested by Poquet ( 2024 ), noticing and perceiving the unpredictable impacts of generative AI on students’ own learning processes can, in itself, be empowering. The process fosters a sense of agency and critical engagement, suggesting that the outcomes of sensemaking in the context of AI-assisted writing can be open-ended yet profound.

This important finding leads scholars to reconsider the essence of learning in an era of generative AI. Hwang et al. ( 2020 ) and Southworth et al. ( 2023 ) argued that AI is likely to transform not only the learning environment, but also the learning process, and even what it means to learn. This perspective finds resonance in the experiences of the participants in this study. While AI may shortcut traditional ways of doing writing, it does not inherently imply a reduction in students’ cognitive, behavioral, and affective engagement with writing, learning, and thinking. AI does not necessarily make writing easier; on the contrary, a critical, ethical, and authentic approach to AI-assisted writing pushes students to think further and prioritize their own voice, originality, and creativity, leading to high quality writing. In this sense, when used properly, AI has the potential to introduce a new avenue for humanizing writing and education. As generative AI technologies are advancing rapidly, an expanding array of AI-powered writing assistance, intelligent tutoring systems, and feedback tools has the promise to cater to the diverse needs and learning styles of language learners and writers. These tools are not limited to mere textual assistance; the multimodal functionalities of generative AI can also allow writers to explore creative expressions and multimodal writing, enriching students’ literacy practices by integrating more visual, auditory, and interactive elements into the composition process (Kang & Yi, 2023 ; Li et al., 2024 ). As noted by Cao and Dede ( 2023 ), our educational model has long been centered around the product , directing our focus towards the outcomes and grades students achieve, often overlooking the learning process itself. The judgmental calls involved in students’ interactions with AI, as showcased in the nuances of participants’ AI-assisted writing process in this study, represent emerging learning opportunities that require students to draw upon a range of knowledge, skills, awareness of ethics and the self, criticality, and self-reflection to make informed decisions about AI in learning. The present study shows that such decision making process can play a pivotal role in cultivating students’ “AI literacy” (Ng et al., 2021 ) and promoting their responsible use of AI. Therefore, it should also be recognized as a valuable teaching opportunity that educators should not overlook.

7 Conclusion

This study explored students’ generative AI-assisted writing processes in a first-year writing class in an American college. The study found that students utilized generative AI for assisting with both global (e.g., argument, structure, coherence) and local issues of writing (e.g., syntax, diction, grammar), while they also had various ethical and practical concerns about the use of AI. Findings showed that large language models offer unique benefits for L2 writers to leverage its linguistic capabilities. The study highlights the urgency of explicit teaching of critical AI literacy and the value of (post)process-oriented writing pedagogy (e.g., Graham, 2023 ) in college writing classrooms so that students not only understand AI writing tools’ functions and limitations but also know how to utilize and evaluate them for specific communication and learning purposes.

However, writing instruction is still at the beginning stage of addressing this pressing need. Thus, pedagogical innovations, policy adjustments, new forms of writing assessments, and teacher education (Zhai, 2022 ) are needed to adapt to the potential impact of AI on desired student learning outcomes within specific writing curriculums. For instance, integrating critical digital pedagogy into writing instruction and inviting students to reflect on their relevant AI literacy practices allow writing instructors to more effectively guide students in critically engaging with AI technologies in their academic literacy development. Policy adjustments should aim to cultivate an inclusive rather than “policing” environment (Johnson, 2023 ) that encourages students to use AI responsibly and as a means of fostering self-learning. Furthermore, writing assessment methods should evolve to not just evaluate final learning outcomes such as the written products but also the learning journey itself such as the decision-making involved in their AI-assisted writing. This shift encourages students to appreciate learning processes and the productive struggles they encounter along the way, so that they can move beyond seeing AI as a shortcut but as assistance in their quest for learning and writing development. In this way, students can leverage the linguistic, multimodal, interactive, and adaptable affordances of generative AI tools for personalized learning. This facilitates greater student ownership of their learning, enhancing their learner competence through self-direction, self-assessment, and self-reflection when interacting with AI tools (Barrot, 2023c ; Fariani et al., 2023 ).

Following a phenomenological research design, the present study aims to provide in-depth understanding of college students’ use of ChatGPT in their academic writing, yet it is limited due to its small sample size and duration. Therefore, the findings may not apply to other classroom contexts and to a wide range of student populations. Future research could benefit from adopting a large scale, longitudinal design to examine generative AI’s impacts on student writing and students’ long-term engagement with generative AI tools, both in formal classroom settings and in informal learning contexts. It is also worth exploring students of diverse age groups and language proficiency levels as well as writing courses of different languages, purposes, and writing genres to examine other factors that may influence students’ generative AI assisted writing. After all, the participants in this study have already developed some proficiency and skills in academic writing, but holding learner agency (Ouyang & Jiao, 2021 ) can be more complex and challenging for younger learners. Further research is needed to understand students with varied domain knowledge, expertise, and writing abilities (Yan, 2023 ) and uncover individual differences in AI-assisted writing. Additionally, the participants in this study utilized GPT 3.5 for their AI-assisted writing practices. Given the rapid advancement of AI technologies, new AI models and applications are continuously emerging. Thus, future research should investigate how various AI models and functionalities might differently influence students, taking into account the ongoing developments and innovations in AI.

Data Availability

The data are available from the author upon reasonable request.

Abdel Latif, M. M. A. (2021). Remodeling writers’ composing processes: Implications for writing assessment. Assessing Writing, 50 , 100547. https://doi.org/10.1016/j.asw.2021.100547

Article Google Scholar

Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28 (3), 1015–1031. https://doi.org/10.1007/s10758-022-09592-z

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.).

Atkinson, D. (2003). L2 writing in the post-process era: Introduction. Journal of Second Language Writing, 12 (1), 3–15. https://doi.org/10.1016/S1060-3743(02)00123-6

Bahari, A. (2021). Computer-mediated feedback for L2 learners: Challenges versus affordances. Journal of Computer Assisted Learning, 37 (1), 24–38. https://doi.org/10.1111/jcal.12481

Barrot, J. S. (2023a). Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57 , 100745. https://doi.org/10.1016/j.asw.2023.100745

Barrot, J. S. (2023b). Using automated written corrective feedback in the writing classrooms: Effects on L2 writing accuracy. Computer Assisted Language Learning, 36 (4), 584–607.

Barrot, J. S. (2023c). ChatGPT as a language learning tool: An emerging technology report. Technology, Knowledge and Learning. https://doi.org/10.1007/s10758-023-09711-4

Bedington, A., Halcomb, E. F., McKee, H. A., Sargent, T., & Smith, A. (2024). Writing with generative AI and human-machine teaming: Insights and recommendations from faculty and students. Computers and Composition, 71 , 102833. https://doi.org/10.1016/j.compcom.2024.102833

Cancino, M., & Panes, J. (2021). The impact of google translate on L2 writing quality measures: Evidence from chilean EFL high school learners. System, 98 , 102464. https://doi.org/10.1016/j.system.2021.102464

Cao, L., & Dede, C. (2023). Navigating a world of generative AI: Suggestions for educators . The next level lab at harvard graduate school of education. President and Fellows of Harvard College: Cambridge, MA.

Chan, C., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-023-00411-8

Creswell, J. W. (2017). Qualitative inquiry and research design: Choosing among the five traditions . Sage.

Google Scholar

Dixon-Román, E., Nichols, T. P., & Nyame-Mensah, A. (2020). The racializing forces of/in AI educational technologies. Learning, Media and Technology, 45 (3), 236–250. https://doi.org/10.1080/17439884.2020.1667825

Eaton, S. (2023). Six tenets of postplagiarism: Writing in the age of artificial intelligence. University of Calgary. http://hdl.handle.net/1880/115882 .

Ebadi, S., & Amini, A. (2022). Examining the roles of social presence and human-likeness on Iranian EFL learners’ motivation using artificial intelligence technology: A case of CSIEC chatbot. Interactive Learning Environments, 32 (2), 1–19. https://doi.org/10.1080/10494820.2022.2096638

Fariani, R. I., Junus, K., & Santoso, H. B. (2023). A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning, 28 (2), 449–476. https://doi.org/10.1007/s10758-022-09628-4

Ferrara, A. (1998). Reflective authenticity . Routledge.

Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition & Communication, 32 (4), 365–387.

Fryer, L. K., & Carpenter, R. (2006). Bots as language learning tools. Language Learning & Technology, 10 , 8–14.

Gayed, J. M., Carlon, M. K. J., Oriola, A. M., & Cross, J. S. (2022). Exploring an AI-based writing Assistant’s impact on English language learners. Computers and Education: Artificial Intelligence, 3 , 100055. https://doi.org/10.1016/j.caeai.2022.100055

Glaser, N. (2023). Exploring the potential of ChatGPT as an educational technology: An emerging technology report. Technology, Knowledge and Learning, 28 (4), 1945–1952. https://doi.org/10.1007/s10758-023-09684-4

Golob, U. (2018). Sense-making. In R. L. Heath, W. Johansen, J. Falkheimer, K. Hallahan, J. J. C. Raupp, & B. Steyn (Eds.), The international encyclopedia of strategic communication (pp. 1–9). Wiley.

Graham, S. S. (2023). Post-process but not post-writing: large language models and a future for composition pedagogy. Composition Studies, 51 (1), 162–218.

Guo, K., Wang, J., & Chu, S. K. W. (2022). Using chatbots to scaffold EFL students’ argumentative writing. Assessing Writing, 54 , 100666. https://doi.org/10.1016/j.asw.2022.100666

Han, J., Yoo, H., Kim, Y., Myung, J., Kim, M., Lim, H., Kim, J., Lee, T., Hong, H., Ahn, S., & Oh, A. (2023). RECIPE: How to Integrate ChatGPT into EFL writing education. arXiv:2305.11583 . https://doi.org/10.48550/arXiv.2305.11583

Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29 (3), 369–388. https://doi.org/10.1177/0741088312451260

Heidegger, M. (1962). Being and time (J. Macquarrie & E. Robinson, Trans.). New York: Harper & Row (Original work published 1927).

Holmes, W., Bialik, M., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications for teaching and learning . Center for Curriculum Redesign.

Huang, W., Hew, K., & Fryer, L. (2022). Chatbots for language learning—Are they really useful? A systematic review of chatbot-supported language learning. Journal of Computer Assisted Learning, 38 (1), 237–257.

Hwang, G. J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research issues of Artificial Intelligence in Education. Computers & Education: Artificial Intelligence, 1 , Article 100001. https://doi.org/10.1016/j.caeai.2020.100001

Jabotinsky, H. Y., & Sarel, R. (2022). Co-authoring with an AI? Ethical dilemmas and artificial intelligence. SSRN Scholarly Paper . https://doi.org/10.2139/ssrn.4303959

Jeon, J. (2021). Chatbot-assisted dynamic assessment (CA-DA) for L2 vocabulary learning and diagnosis. Computer Assisted Language Learning, 36 (7), 1–27. https://doi.org/10.1080/09588221.2021.1987272

Johnson, M. D. (2017). Cognitive task complexity and L2 written syntactic complexity, accuracy, lexical complexity, and fluency: A research synthesis and meta-analysis. Journal of Second Language Writing, 37 , 13–38. https://doi.org/10.1016/j.jslw.2017.06.001

Johnson, G. P. (2023). Don’t act like you forgot: Approaching another literacy “crisis” by (re)considering what we know about teaching writing with and through technologies. Composition Studies, 51 (1), 169–175.

Kang, J., & Yi, Y. (2023). Beyond ChatGPT: Multimodal generative AI for L2 writers. Journal of Second Language Writing, 62 , 101070. https://doi.org/10.1016/j.jslw.2023.101070

Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp. 57–71). Laurence Erlbaum Associates.

Khuder, B., & Harwood, N. (2015). Writing in test and non-test situations: Process and product. Journal of Writing Research, 6 (3), 233–278.

Klein, G., Moon, B., & Hoffman, R. R. (2006). Making sense of sensemaking: A macrocognitive model. IEEE Intelligent Systems, 21 (5), 88–92.

Kohnke, L., Moorhouse, B. L., & Zou, D. (2023). Exploring generative artificial intelligence preparedness among university language instructors: A case study. Computers and Education: Artificial Intelligence, 5 , 100156. https://doi.org/10.1016/j.caeai.2023.100156

Koltovskaia, S. (2020). Student engagement with automated written corrective feedback (AWCF) provided by Grammarly: A multiple case study. Assessing Writing, 44 , 100450.

Kruse, O., Rapp, C., Anson, C., Benetos, K., Cotos, E., Devitt, A., & Shibani, A. (Eds.). (2023). Digital writing technologies in higher education . Springer.

Lapadat, J. C. (2012). Thematic analysis. In A. J. Mills, G. Durepos, & E. Weibe (Eds.), The encyclopedia of case study research (pp. 926–927). Sage.

Li, B., Wang, C., Bonk, C., & Kou, X. (2024). Exploring inventions in self-directed language learning with generative AI: Implementations and perspectives of YouTube content creators. TechTrends . https://doi.org/10.1007/s11528-024-00960-3

Lin, M. P. C., & Chang, D. (2020). Enhancing post-secondary writers’ writing skills with a chatbot: A mixed-method classroom study. Journal of Educational Technology & Society, 23 (1), 78–92.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry . Sage.

Book Google Scholar

Link, S., Mehrzad, M., & Rahimi, M. (2022). Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Computer Assisted Language Learning, 35 (4), 605–634. https://doi.org/10.1080/09588221.2020.1743323

Memarian, B., & Doleck, T. (2023). Fairness, accountability, transparency, and ethics (FATE) in artificial intelligence (AI), and higher education: A systematic review. Computers and Education: Artificial Intelligence. https://doi.org/10.1016/j.caeai.2023.100152

Mohsen, M. A. (2021). L1 versus L2 writing processes: What insight can we obtain from a keystroke logging program? Language Teaching Research, 4 , 48–62. https://doi.org/10.1177/13621688211041292

Moran, D. (2002). Introduction to phenomenology . Routledge.

Ng, D., Leung, J., Chu, S., & Qiao, M. (2021). Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence, 2 , 100041. https://doi.org/10.1016/j.caeai.2021.100041

Ouyang, F., & Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers and Education: Artificial Intelligence, 2 , 100020. https://doi.org/10.1016/j.caeai.2021.100020

Pandarova, I., Schmidt, T., Hartig, J., Boubekki, A., Jones, R. D., & Brefeld, U. (2019). Predicting the difficulty of exercise items for dynamic difficulty adaptation in adaptive language tutoring. International Journal of Artificial Intelligence in Education, 29 (3), 342–367. https://doi.org/10.1007/s40593-019-00180-4

Poquet, O. (2024). A shared lens around sensemaking in learning analytics: What activity theory, definition of a situation and affordances can offer. British Journal of Educational Technology . https://doi.org/10.1111/bjet.13435

Prince, R., & Archer, A. (2014). Exploring voice in multimodal quantitative texts. Literacy & Numeracy Studies, 22 (1), 39–57. https://doi.org/10.5130/lns.v22i1.4178

Ranalli, J. (2021). L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing, 52 , 100816. https://doi.org/10.1016/j.jslw.2021.100816

Rings, M. (2017). Authenticity, self-fulfillment, and self-acknowledgment. The Journal of Value Inquiry, 51 (3), 475–489.

Saldaña, J. (2016). The coding manual for qualitative researchers (3rd ed.). Sage.

Sanchez-Martin, C., Hirsu, L., Gonzales, L., & Alvarez, S. P. (2019). Pedagogies of digital composing through a translingual approach. Computers and Composition, 52 , 142–157. https://doi.org/10.1016/j.compcom.2019.02.007

Schulze, M. (2008). AI in CALL: Artificially inflated or almost imminent? CALICO Journal, 25 (3), 510–527. https://doi.org/10.1558/cj.v25i3.510-527

Shi, H., & Aryadoust, V. (2023). A systematic review of automated writing evaluation systems. Education and Information Technologies, 28 (1), 771–795. https://doi.org/10.1007/s10639-022-11200-7

Smith, J. A. (1996). Beyond the divide between cognition and discourse: Using interpretative phenomenological analysis in health psychology. Psychology and Health, 11 (2), 261–271. https://doi.org/10.1080/08870449608400256

Smith, J. A., Flower, P., & Larkin, M. (2009). Interpretative phenomenological analysis: Theory, method and research . Sage.

Smith, J. A., & Shinebourne, P. (2012). Interpretative phenomenological analysis. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher. (Eds.), Research designs: Quantitative, qualitative, neuropsychological, and biological (pp. 73–82). American Psychological Association. https://doi.org/10.1037/13620-005 .

Southworth, J., Migliaccio, K., Glover, J., Reed, D., McCarty, C., Brendemuhl, J., & Thomas, A. (2023). Developing a model for AI Across the curriculum: Transforming the higher education landscape via innovation in AI literacy. Computers and Education: Artificial Intelligence, 4 , 100127. https://doi.org/10.1016/j.caeai.2023.100127

Steiss, J., Tate, T. P., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. (2024). Comparing the quality of human and ChatGPT feedback on students’ writing. Learning and Instruction . https://doi.org/10.1016/j.learninstruc.2024.101894

Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57 , 100752. https://doi.org/10.1016/j.asw.2023.100752

Tan, X., Xu, W., & Wang, C. (2024). Purposeful remixing with generative AI: Constructing designer voice in multimodal composing. arXiv preprint arXiv:2403.19095.

Tsai, S. C. (2019). Using google translate in EFL drafts: A preliminary investigation. Computer Assisted Language Learning, 32 (5–6), 510–526. https://doi.org/10.1080/09588221.2018.1527361

Tseng, W., & Warschauer, M. (2023). AI-writing tools in education: If you can’t beat them, join them. Journal of China Computer-Assisted Language Learning, 3 (2), 258–262. https://doi.org/10.1515/jccall-2023-0008

Vetter, M. A., Lucia, B., Jiang, J., & Othman, M. (2024). Towards a framework for local interrogation of AI ethics: A case study on text generators, academic integrity, and composing with ChatGPT. Computers and Composition, 71 , 102831. https://doi.org/10.1016/j.compcom.2024.102831

Wang, C., Samuelson, B., & Silvester, K. (2020). Zhai nan, mai meng and filial piety: The translingual creativity of Chinese university students in an academic writing course. Journal of Global Literacies, Technologies, and Emerging Pedagogies, 6 (2), 1120–1143.

Warner, B. (2022). AI for Language Learning: ChatGPT and the Future of ELT . TESOL. http://blog.tesol.org/ai-for-language-learning-chatgpt-and-the-future-of-elt/?utm_content=buffer7d9a4&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer .

Weick, K. E., Sutcliffe, K. M., & Obstfeld, D. (2005). Organizing and the process of sensemaking. Organization Science, 16 (4), 409–421.

Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28 , 1–25. https://doi.org/10.1007/s10639-023-11742-4

Yang, S. J., Ogata, H., Matsui, T., & Chen, N. S. (2021). Human-centered artificial intelligence in education: Seeing the invisible through the visible. Computers & Education: Artificial Intelligence, 2 , Article 100008. https://doi.org/10.1016/j.caeai.2021.100008

Zhai, X. (2022). ChatGPT user experience: Implications for education. SSRN Scholarly Paper . https://doi.org/10.2139/ssrn.4312418 .

Download references

The author acknowledges that the research did not receive any funding.

Author information

Authors and affiliations.

Colby College, Waterville, ME, USA

Chaoran Wang

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoran Wang .

Ethics declarations

Conflict of interest.

The author confirms that there are no conflicts of interest.

Ethical Approval

The study is conducted with permission from and following the guidelines of the university’s Institutional Review Board.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Wang, C. Exploring Students’ Generative AI-Assisted Writing Processes: Perceptions and Experiences from Native and Nonnative English Speakers. Tech Know Learn (2024). https://doi.org/10.1007/s10758-024-09744-3

Download citation

Accepted : 14 May 2024

Published : 30 May 2024

DOI : https://doi.org/10.1007/s10758-024-09744-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Artificial Intelligence
Find a journal
Publish with us
Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 03 June 2024

Applying large language models for automated essay scoring for non-native Japanese

Wenchao Li 1 &
Haitao Liu 2

Humanities and Social Sciences Communications volume 11 , Article number: 723 ( 2024 ) Cite this article

129 Accesses

1 Altmetric

Metrics details

Language and linguistics

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Testing theory of mind in large language models and humans

Highly accurate protein structure prediction with AlphaFold

Conventional machine learning technology in aes.

AES has experienced significant growth with the advancement of machine learning technologies in recent decades. In the earlier stages of AES development, conventional machine learning-based approaches were commonly used. These approaches involved the following procedures: a) feeding the machine with a dataset. In this step, a dataset of essays is provided to the machine learning system. The dataset serves as the basis for training the model and establishing patterns and correlations between linguistic features and human ratings. b) the machine learning model is trained using linguistic features that best represent human ratings and can effectively discriminate learners’ writing proficiency. These features include lexical richness (Lu, 2012 ; Kyle and Crossley, 2015 ; Kyle et al. 2021 ), syntactic complexity (Lu, 2010 ; Liu, 2008 ), text cohesion (Crossley and McNamara, 2016 ), and among others. Conventional machine learning approaches in AES require human intervention, such as manual correction and annotation of essays. This human involvement was necessary to create a labeled dataset for training the model. Several AES systems have been developed using conventional machine learning technologies. These include the Intelligent Essay Assessor (Landauer et al. 2003 ), the e-rater engine by Educational Testing Service (Attali and Burstein, 2006 ; Burstein, 2003 ), MyAccess with the InterlliMetric scoring engine by Vantage Learning (Elliot, 2003 ), and the Bayesian Essay Test Scoring system (Rudner and Liang, 2002 ). These systems have played a significant role in automating the essay scoring process and providing quick and consistent feedback to learners. However, as touched upon earlier, conventional machine learning approaches rely on predetermined linguistic features and often require manual intervention, making them less flexible and potentially limiting their generalizability to different contexts.

In the context of the Japanese language, conventional machine learning-incorporated AES tools include Jess (Ishioka and Kameda, 2006 ) and JWriter (Lee and Hasebe, 2017 ). Jess assesses essays by deducting points from the perfect score, utilizing the Mainichi Daily News newspaper as a database. The evaluation criteria employed by Jess encompass various aspects, such as rhetorical elements (e.g., reading comprehension, vocabulary diversity, percentage of complex words, and percentage of passive sentences), organizational structures (e.g., forward and reverse connection structures), and content analysis (e.g., latent semantic indexing). JWriter employs linear regression analysis to assign weights to various measurement indices, such as average sentence length and total number of characters. These weights are then combined to derive the overall score. A pilot study involving the Jess model was conducted on 1320 essays at different proficiency levels, including primary, intermediate, and advanced. However, the results indicated that the Jess model failed to significantly distinguish between these essay levels. Out of the 16 measures used, four measures, namely median sentence length, median clause length, median number of phrases, and maximum number of phrases, did not show statistically significant differences between the levels. Additionally, two measures exhibited between-level differences but lacked linear progression: the number of attributives declined words and the Kanji/kana ratio. On the other hand, the remaining measures, including maximum sentence length, maximum clause length, number of attributive conjugated words, maximum number of consecutive infinitive forms, maximum number of conjunctive-particle clauses, k characteristic value, percentage of big words, and percentage of passive sentences, demonstrated statistically significant between-level differences and displayed linear progression.

Both Jess and JWriter exhibit notable limitations, including the manual selection of feature parameters and weights, which can introduce biases into the scoring process. The reliance on human annotators to label non-native language essays also introduces potential noise and variability in the scoring. Furthermore, an important concern is the possibility of system manipulation and cheating by learners who are aware of the regression equation utilized by the models (Hirao et al. 2020 ). These limitations emphasize the need for further advancements in AES systems to address these challenges.

Deep learning technology in AES

Deep learning has emerged as one of the approaches for improving the accuracy and effectiveness of AES. Deep learning-based AES methods utilize artificial neural networks that mimic the human brain’s functioning through layered algorithms and computational units. Unlike conventional machine learning, deep learning autonomously learns from the environment and past errors without human intervention. This enables deep learning models to establish nonlinear correlations, resulting in higher accuracy. Recent advancements in deep learning have led to the development of transformers, which are particularly effective in learning text representations. Noteworthy examples include bidirectional encoder representations from transformers (BERT) (Devlin et al. 2019 ) and the generative pretrained transformer (GPT) (OpenAI).

BERT is a linguistic representation model that utilizes a transformer architecture and is trained on two tasks: masked linguistic modeling and next-sentence prediction (Hirao et al. 2020 ; Vaswani et al. 2017 ). In the context of AES, BERT follows specific procedures, as illustrated in Fig. 1 : (a) the tokenized prompts and essays are taken as input; (b) special tokens, such as [CLS] and [SEP], are added to mark the beginning and separation of prompts and essays; (c) the transformer encoder processes the prompt and essay sequences, resulting in hidden layer sequences; (d) the hidden layers corresponding to the [CLS] tokens (T[CLS]) represent distributed representations of the prompts and essays; and (e) a multilayer perceptron uses these distributed representations as input to obtain the final score (Hirao et al. 2020 ).

AES system with BERT (Hirao et al. 2020 ).

The training of BERT using a substantial amount of sentence data through the Masked Language Model (MLM) allows it to capture contextual information within the hidden layers. Consequently, BERT is expected to be capable of identifying artificial essays as invalid and assigning them lower scores (Mizumoto and Eguchi, 2023 ). In the context of AES for nonnative Japanese learners, Hirao et al. ( 2020 ) combined the long short-term memory (LSTM) model proposed by Hochreiter and Schmidhuber ( 1997 ) with BERT to develop a tailored automated Essay Scoring System. The findings of their study revealed that the BERT model outperformed both the conventional machine learning approach utilizing character-type features such as “kanji” and “hiragana”, as well as the standalone LSTM model. Takeuchi et al. ( 2021 ) presented an approach to Japanese AES that eliminates the requirement for pre-scored essays by relying solely on reference texts or a model answer for the essay task. They investigated multiple similarity evaluation methods, including frequency of morphemes, idf values calculated on Wikipedia, LSI, LDA, word-embedding vectors, and document vectors produced by BERT. The experimental findings revealed that the method utilizing the frequency of morphemes with idf values exhibited the strongest correlation with human-annotated scores across different essay tasks. The utilization of BERT in AES encounters several limitations. Firstly, essays often exceed the model’s maximum length limit. Second, only score labels are available for training, which restricts access to additional information.

Mizumoto and Eguchi ( 2023 ) were pioneers in employing the GPT model for AES in non-native English writing. Their study focused on evaluating the accuracy and reliability of AES using the GPT-3 text-davinci-003 model, analyzing a dataset of 12,100 essays from the corpus of nonnative written English (TOEFL11). The findings indicated that AES utilizing the GPT-3 model exhibited a certain degree of accuracy and reliability. They suggest that GPT-3-based AES systems hold the potential to provide support for human ratings. However, applying GPT model to AES presents a unique natural language processing (NLP) task that involves considerations such as nonnative language proficiency, the influence of the learner’s first language on the output in the target language, and identifying linguistic features that best indicate writing quality in a specific language. These linguistic features may differ morphologically or syntactically from those present in the learners’ first language, as observed in (1)–(3).

我-送了-他-一本-书

Wǒ-sòngle-tā-yī běn-shū

1 sg .-give. past- him-one .cl- book

“I gave him a book.”

Agglutinative

彼-に-本-を-あげ-まし-た

Kare-ni-hon-o-age-mashi-ta

3 sg .- dat -hon- acc- give.honorification. past

Inflectional

give, give-s, gave, given, giving

Additionally, the morphological agglutination and subject-object-verb (SOV) order in Japanese, along with its idiomatic expressions, pose additional challenges for applying language models in AES tasks (4).

足-が棒-になり-ました

Ashi-ga bo-ni nar-mashita

leg- nom stick- dat become- past

“My leg became like a stick (I am extremely tired).”

The example sentence provided demonstrates the morpho-syntactic structure of Japanese and the presence of an idiomatic expression. In this sentence, the verb “なる” (naru), meaning “to become”, appears at the end of the sentence. The verb stem “なり” (nari) is attached with morphemes indicating honorification (“ます” - mashu) and tense (“た” - ta), showcasing agglutination. While the sentence can be literally translated as “my leg became like a stick”, it carries an idiomatic interpretation that implies “I am extremely tired”.

To overcome this issue, CyberAgent Inc. ( 2023 ) has developed the Open-Calm series of language models specifically designed for Japanese. Open-Calm consists of pre-trained models available in various sizes, such as Small, Medium, Large, and 7b. Figure 2 depicts the fundamental structure of the Open-Calm model. A key feature of this architecture is the incorporation of the Lora Adapter and GPT-NeoX frameworks, which can enhance its language processing capabilities.

GPT-NeoX Model Architecture (Okgetheng and Takeuchi 2024 ).

In a recent study conducted by Okgetheng and Takeuchi ( 2024 ), they assessed the efficacy of Open-Calm language models in grading Japanese essays. The research utilized a dataset of approximately 300 essays, which were annotated by native Japanese educators. The findings of the study demonstrate the considerable potential of Open-Calm language models in automated Japanese essay scoring. Specifically, among the Open-Calm family, the Open-Calm Large model (referred to as OCLL) exhibited the highest performance. However, it is important to note that, as of the current date, the Open-Calm Large model does not offer public access to its server. Consequently, users are required to independently deploy and operate the environment for OCLL. In order to utilize OCLL, users must have a PC equipped with an NVIDIA GeForce RTX 3060 (8 or 12 GB VRAM).

In summary, while the potential of LLMs in automated scoring of nonnative Japanese essays has been demonstrated in two studies—BERT-driven AES (Hirao et al. 2020 ) and OCLL-based AES (Okgetheng and Takeuchi, 2024 )—the number of research efforts in this area remains limited.

Another significant challenge in applying LLMs to AES lies in prompt engineering and ensuring its reliability and effectiveness (Brown et al. 2020 ; Rae et al. 2021 ; Zhang et al. 2021 ). Various prompting strategies have been proposed, such as the zero-shot chain of thought (CoT) approach (Kojima et al. 2022 ), which involves manually crafting diverse and effective examples. However, manual efforts can lead to mistakes. To address this, Zhang et al. ( 2021 ) introduced an automatic CoT prompting method called Auto-CoT, which demonstrates matching or superior performance compared to the CoT paradigm. Another prompt framework is trees of thoughts, enabling a model to self-evaluate its progress at intermediate stages of problem-solving through deliberate reasoning (Yao et al. 2023 ).

Beyond linguistic studies, there has been a noticeable increase in the number of foreign workers in Japan and Japanese learners worldwide (Ministry of Health, Labor, and Welfare of Japan, 2022 ; Japan Foundation, 2021 ). However, existing assessment methods, such as the Japanese Language Proficiency Test (JLPT), J-CAT, and TTBJ Footnote 1 , primarily focus on reading, listening, vocabulary, and grammar skills, neglecting the evaluation of writing proficiency. As the number of workers and language learners continues to grow, there is a rising demand for an efficient AES system that can reduce costs and time for raters and be utilized for employment, examinations, and self-study purposes.

This study aims to explore the potential of LLM-based AES by comparing the effectiveness of five models: two LLMs (GPT Footnote 2 and BERT), one Japanese local LLM (OCLL), and two conventional machine learning-based methods (linguistic feature-based scoring tools - Jess and JWriter).

The research questions addressed in this study are as follows:

To what extent do the LLM-driven AES and linguistic feature-based AES, when used as automated tools to support human rating, accurately reflect test takers’ actual performance?

What influence does the prompt have on the accuracy and performance of LLM-based AES methods?

The subsequent sections of the manuscript cover the methodology, including the assessment measures for nonnative Japanese writing proficiency, criteria for prompts, and the dataset. The evaluation section focuses on the analysis of annotations and rating scores generated by LLM-driven and linguistic feature-based AES methods.

Methodology

The dataset utilized in this study was obtained from the International Corpus of Japanese as a Second Language (I-JAS) Footnote 3 . This corpus consisted of 1000 participants who represented 12 different first languages. For the study, the participants were given a story-writing task on a personal computer. They were required to write two stories based on the 4-panel illustrations titled “Picnic” and “The key” (see Appendix A). Background information for the participants was provided by the corpus, including their Japanese language proficiency levels assessed through two online tests: J-CAT and SPOT. These tests evaluated their reading, listening, vocabulary, and grammar abilities. The learners’ proficiency levels were categorized into six levels aligned with the Common European Framework of Reference for Languages (CEFR) and the Reference Framework for Japanese Language Education (RFJLE): A1, A2, B1, B2, C1, and C2. According to Lee et al. ( 2015 ), there is a high level of agreement (r = 0.86) between the J-CAT and SPOT assessments, indicating that the proficiency certifications provided by J-CAT are consistent with those of SPOT. However, it is important to note that the scores of J-CAT and SPOT do not have a one-to-one correspondence. In this study, the J-CAT scores were used as a benchmark to differentiate learners of different proficiency levels. A total of 1400 essays were utilized, representing the beginner (aligned with A1), A2, B1, B2, C1, and C2 levels based on the J-CAT scores. Table 1 provides information about the learners’ proficiency levels and their corresponding J-CAT and SPOT scores.

A dataset comprising a total of 1400 essays from the story writing tasks was collected. Among these, 714 essays were utilized to evaluate the reliability of the LLM-based AES method, while the remaining 686 essays were designated as development data to assess the LLM-based AES’s capability to distinguish participants with varying proficiency levels. The GPT 4 API was used in this study. A detailed explanation of the prompt-assessment criteria is provided in Section Prompt . All essays were sent to the model for measurement and scoring.

Measures of writing proficiency for nonnative Japanese

Japanese exhibits a morphologically agglutinative structure where morphemes are attached to the word stem to convey grammatical functions such as tense, aspect, voice, and honorifics, e.g. (5).

食べ-させ-られ-まし-た-か

tabe-sase-rare-mashi-ta-ka

[eat (stem)-causative-passive voice-honorification-tense. past-question marker]

Japanese employs nine case particles to indicate grammatical functions: the nominative case particle が (ga), the accusative case particle を (o), the genitive case particle の (no), the dative case particle に (ni), the locative/instrumental case particle で (de), the ablative case particle から (kara), the directional case particle へ (e), and the comitative case particle と (to). The agglutinative nature of the language, combined with the case particle system, provides an efficient means of distinguishing between active and passive voice, either through morphemes or case particles, e.g. 食べる taberu “eat concusive . ” (active voice); 食べられる taberareru “eat concusive . ” (passive voice). In the active voice, “パンを食べる” (pan o taberu) translates to “to eat bread”. On the other hand, in the passive voice, it becomes “パンが食べられた” (pan ga taberareta), which means “(the) bread was eaten”. Additionally, it is important to note that different conjugations of the same lemma are considered as one type in order to ensure a comprehensive assessment of the language features. For example, e.g., 食べる taberu “eat concusive . ”; 食べている tabeteiru “eat progress .”; 食べた tabeta “eat past . ” as one type.

To incorporate these features, previous research (Suzuki, 1999 ; Watanabe et al. 1988 ; Ishioka, 2001 ; Ishioka and Kameda, 2006 ; Hirao et al. 2020 ) has identified complexity, fluency, and accuracy as crucial factors for evaluating writing quality. These criteria are assessed through various aspects, including lexical richness (lexical density, diversity, and sophistication), syntactic complexity, and cohesion (Kyle et al. 2021 ; Mizumoto and Eguchi, 2023 ; Ure, 1971 ; Halliday, 1985 ; Barkaoui and Hadidi, 2020 ; Zenker and Kyle, 2021 ; Kim et al. 2018 ; Lu, 2017 ; Ortega, 2015 ). Therefore, this study proposes five scoring categories: lexical richness, syntactic complexity, cohesion, content elaboration, and grammatical accuracy. A total of 16 measures were employed to capture these categories. The calculation process and specific details of these measures can be found in Table 2 .

T-unit, first introduced by Hunt ( 1966 ), is a measure used for evaluating speech and composition. It serves as an indicator of syntactic development and represents the shortest units into which a piece of discourse can be divided without leaving any sentence fragments. In the context of Japanese language assessment, Sakoda and Hosoi ( 2020 ) utilized T-unit as the basic unit to assess the accuracy and complexity of Japanese learners’ speaking and storytelling. The calculation of T-units in Japanese follows the following principles:

A single main clause constitutes 1 T-unit, regardless of the presence or absence of dependent clauses, e.g. (6).

ケンとマリはピクニックに行きました (main clause): 1 T-unit.

If a sentence contains a main clause along with subclauses, each subclause is considered part of the same T-unit, e.g. (7).

天気が良かったので (subclause)、ケンとマリはピクニックに行きました (main clause): 1 T-unit.

In the case of coordinate clauses, where multiple clauses are connected, each coordinated clause is counted separately. Thus, a sentence with coordinate clauses may have 2 T-units or more, e.g. (8).

ケンは地図で場所を探して (coordinate clause)、マリはサンドイッチを作りました (coordinate clause): 2 T-units.

Lexical diversity refers to the range of words used within a text (Engber, 1995 ; Kyle et al. 2021 ) and is considered a useful measure of the breadth of vocabulary in L n production (Jarvis, 2013a , 2013b ).

The type/token ratio (TTR) is widely recognized as a straightforward measure for calculating lexical diversity and has been employed in numerous studies. These studies have demonstrated a strong correlation between TTR and other methods of measuring lexical diversity (e.g., Bentz et al. 2016 ; Čech and Miroslav, 2018 ; Çöltekin and Taraka, 2018 ). TTR is computed by considering both the number of unique words (types) and the total number of words (tokens) in a given text. Given that the length of learners’ writing texts can vary, this study employs the moving average type-token ratio (MATTR) to mitigate the influence of text length. MATTR is calculated using a 50-word moving window. Initially, a TTR is determined for words 1–50 in an essay, followed by words 2–51, 3–52, and so on until the end of the essay is reached (Díez-Ortega and Kyle, 2023 ). The final MATTR scores were obtained by averaging the TTR scores for all 50-word windows. The following formula was employed to derive MATTR:

${\rm{MATTR}}({\rm{W}})=\frac{{\sum }_{{\rm{i}}=1}^{{\rm{N}}-{\rm{W}}+1}{{\rm{F}}}_{{\rm{i}}}}{{\rm{W}}({\rm{N}}-{\rm{W}}+1)}$

Here, N refers to the number of tokens in the corpus. W is the randomly selected token size (W < N). ${F}_{i}$ is the number of types in each window. The ${\rm{MATTR}}({\rm{W}})$ is the mean of a series of type-token ratios (TTRs) based on the word form for all windows. It is expected that individuals with higher language proficiency will produce texts with greater lexical diversity, as indicated by higher MATTR scores.

Lexical density was captured by the ratio of the number of lexical words to the total number of words (Lu, 2012 ). Lexical sophistication refers to the utilization of advanced vocabulary, often evaluated through word frequency indices (Crossley et al. 2013 ; Haberman, 2008 ; Kyle and Crossley, 2015 ; Laufer and Nation, 1995 ; Lu, 2012 ; Read, 2000 ). In line of writing, lexical sophistication can be interpreted as vocabulary breadth, which entails the appropriate usage of vocabulary items across various lexicon-grammatical contexts and registers (Garner et al. 2019 ; Kim et al. 2018 ; Kyle et al. 2018 ). In Japanese specifically, words are considered lexically sophisticated if they are not included in the “Japanese Education Vocabulary List Ver 1.0”. Footnote 4 Consequently, lexical sophistication was calculated by determining the number of sophisticated word types relative to the total number of words per essay. Furthermore, it has been suggested that, in Japanese writing, sentences should ideally have a length of no more than 40 to 50 characters, as this promotes readability. Therefore, the median and maximum sentence length can be considered as useful indices for assessment (Ishioka and Kameda, 2006 ).

Syntactic complexity was assessed based on several measures, including the mean length of clauses, verb phrases per T-unit, clauses per T-unit, dependent clauses per T-unit, complex nominals per clause, adverbial clauses per clause, coordinate phrases per clause, and mean dependency distance (MDD). The MDD reflects the distance between the governor and dependent positions in a sentence. A larger dependency distance indicates a higher cognitive load and greater complexity in syntactic processing (Liu, 2008 ; Liu et al. 2017 ). The MDD has been established as an efficient metric for measuring syntactic complexity (Jiang, Quyang, and Liu, 2019 ; Li and Yan, 2021 ). To calculate the MDD, the position numbers of the governor and dependent are subtracted, assuming that words in a sentence are assigned in a linear order, such as W1 … Wi … Wn. In any dependency relationship between words Wa and Wb, Wa is the governor and Wb is the dependent. The MDD of the entire sentence was obtained by taking the absolute value of governor – dependent:

MDD = $\frac{1}{n}{\sum }_{i=1}^{n}|{\rm{D}}{{\rm{D}}}_{i}|$

In this formula, $n$ represents the number of words in the sentence, and ${DD}i$ is the dependency distance of the ${i}^{{th}}$ dependency relationship of a sentence. Building on this, the annotation of sentence ‘Mary-ga-John-ni-keshigomu-o-watashita was [Mary- top -John- dat -eraser- acc -give- past] ’. The sentence’s MDD would be 2. Table 3 provides the CSV file as a prompt for GPT 4.

Cohesion (semantic similarity) and content elaboration aim to capture the ideas presented in test taker’s essays. Cohesion was assessed using three measures: Synonym overlap/paragraph (topic), Synonym overlap/paragraph (keywords), and word2vec cosine similarity. Content elaboration and development were measured as the number of metadiscourse markers (type)/number of words. To capture content closely, this study proposed a novel-distance based representation, by encoding the cosine distance between the essay (by learner) and essay task’s (topic and keyword) i -vectors. The learner’s essay is decoded into a word sequence, and aligned to the essay task’ topic and keyword for log-likelihood measurement. The cosine distance reveals the content elaboration score in the leaners’ essay. The mathematical equation of cosine similarity between target-reference vectors is shown in (11), assuming there are i essays and ( L i , …. L n ) and ( N i , …. N n ) are the vectors representing the learner and task’s topic and keyword respectively. The content elaboration distance between L i and N i was calculated as follows:

$\cos \left(\theta \right)=\frac{{\rm{L}}\,\cdot\, {\rm{N}}}{\left|{\rm{L}}\right|{\rm{|N|}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}{N}_{i}}{\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{L}_{i}^{2}}\sqrt{\mathop{\sum }\nolimits_{i=1}^{n}{N}_{i}^{2}}}$

A high similarity value indicates a low difference between the two recognition outcomes, which in turn suggests a high level of proficiency in content elaboration.

To evaluate the effectiveness of the proposed measures in distinguishing different proficiency levels among nonnative Japanese speakers’ writing, we conducted a multi-faceted Rasch measurement analysis (Linacre, 1994 ). This approach applies measurement models to thoroughly analyze various factors that can influence test outcomes, including test takers’ proficiency, item difficulty, and rater severity, among others. The underlying principles and functionality of multi-faceted Rasch measurement are illustrated in (12).

$\log \left(\frac{{P}_{{nijk}}}{{P}_{{nij}(k-1)}}\right)={B}_{n}-{D}_{i}-{C}_{j}-{F}_{k}$

(12) defines the logarithmic transformation of the probability ratio ( P nijk /P nij(k-1) )) as a function of multiple parameters. Here, n represents the test taker, i denotes a writing proficiency measure, j corresponds to the human rater, and k represents the proficiency score. The parameter B n signifies the proficiency level of test taker n (where n ranges from 1 to N). D j represents the difficulty parameter of test item i (where i ranges from 1 to L), while C j represents the severity of rater j (where j ranges from 1 to J). Additionally, F k represents the step difficulty for a test taker to move from score ‘k-1’ to k . P nijk refers to the probability of rater j assigning score k to test taker n for test item i . P nij(k-1) represents the likelihood of test taker n being assigned score ‘k-1’ by rater j for test item i . Each facet within the test is treated as an independent parameter and estimated within the same reference framework. To evaluate the consistency of scores obtained through both human and computer analysis, we utilized the Infit mean-square statistic. This statistic is a chi-square measure divided by the degrees of freedom and is weighted with information. It demonstrates higher sensitivity to unexpected patterns in responses to items near a person’s proficiency level (Linacre, 2002 ). Fit statistics are assessed based on predefined thresholds for acceptable fit. For the Infit MNSQ, which has a mean of 1.00, different thresholds have been suggested. Some propose stricter thresholds ranging from 0.7 to 1.3 (Bond et al. 2021 ), while others suggest more lenient thresholds ranging from 0.5 to 1.5 (Eckes, 2009 ). In this study, we adopted the criterion of 0.70–1.30 for the Infit MNSQ.

Moving forward, we can now proceed to assess the effectiveness of the 16 proposed measures based on five criteria for accurately distinguishing various levels of writing proficiency among non-native Japanese speakers. To conduct this evaluation, we utilized the development dataset from the I-JAS corpus, as described in Section Dataset . Table 4 provides a measurement report that presents the performance details of the 14 metrics under consideration. The measure separation was found to be 4.02, indicating a clear differentiation among the measures. The reliability index for the measure separation was 0.891, suggesting consistency in the measurement. Similarly, the person separation reliability index was 0.802, indicating the accuracy of the assessment in distinguishing between individuals. All 16 measures demonstrated Infit mean squares within a reasonable range, ranging from 0.76 to 1.28. The Synonym overlap/paragraph (topic) measure exhibited a relatively high outfit mean square of 1.46, although the Infit mean square falls within an acceptable range. The standard error for the measures ranged from 0.13 to 0.28, indicating the precision of the estimates.

Table 5 further illustrated the weights assigned to different linguistic measures for score prediction, with higher weights indicating stronger correlations between those measures and higher scores. Specifically, the following measures exhibited higher weights compared to others: moving average type token ratio per essay has a weight of 0.0391. Mean dependency distance had a weight of 0.0388. Mean length of clause, calculated by dividing the number of words by the number of clauses, had a weight of 0.0374. Complex nominals per T-unit, calculated by dividing the number of complex nominals by the number of T-units, had a weight of 0.0379. Coordinate phrases rate, calculated by dividing the number of coordinate phrases by the number of clauses, had a weight of 0.0325. Grammatical error rate, representing the number of errors per essay, had a weight of 0.0322.

Criteria (output indicator)

The criteria used to evaluate the writing ability in this study were based on CEFR, which follows a six-point scale ranging from A1 to C2. To assess the quality of Japanese writing, the scoring criteria from Table 6 were utilized. These criteria were derived from the IELTS writing standards and served as assessment guidelines and prompts for the written output.

A prompt is a question or detailed instruction that is provided to the model to obtain a proper response. After several pilot experiments, we decided to provide the measures (Section Measures of writing proficiency for nonnative Japanese ) as the input prompt and use the criteria (Section Criteria (output indicator) ) as the output indicator. Regarding the prompt language, considering that the LLM was tasked with rating Japanese essays, would prompt in Japanese works better Footnote 5 ? We conducted experiments comparing the performance of GPT-4 using both English and Japanese prompts. Additionally, we utilized the Japanese local model OCLL with Japanese prompts. Multiple trials were conducted using the same sample. Regardless of the prompt language used, we consistently obtained the same grading results with GPT-4, which assigned a grade of B1 to the writing sample. This suggested that GPT-4 is reliable and capable of producing consistent ratings regardless of the prompt language. On the other hand, when we used Japanese prompts with the Japanese local model “OCLL”, we encountered inconsistent grading results. Out of 10 attempts with OCLL, only 6 yielded consistent grading results (B1), while the remaining 4 showed different outcomes, including A1 and B2 grades. These findings indicated that the language of the prompt was not the determining factor for reliable AES. Instead, the size of the training data and the model parameters played crucial roles in achieving consistent and reliable AES results for the language model.

The following is the utilized prompt, which details all measures and requires the LLM to score the essays using holistic and trait scores.

Please evaluate Japanese essays written by Japanese learners and assign a score to each essay on a six-point scale, ranging from A1, A2, B1, B2, C1 to C2. Additionally, please provide trait scores and display the calculation process for each trait score. The scoring should be based on the following criteria:

Moving average type-token ratio.

Number of lexical words (token) divided by the total number of words per essay.

Number of sophisticated word types divided by the total number of words per essay.

Mean length of clause.

Verb phrases per T-unit.

Clauses per T-unit.

Dependent clauses per T-unit.

Complex nominals per clause.

Adverbial clauses per clause.

Coordinate phrases per clause.

Mean dependency distance.

Synonym overlap paragraph (topic and keywords).

Word2vec cosine similarity.

Connectives per essay.

Conjunctions per essay.

Number of metadiscourse markers (types) divided by the total number of words.

Number of errors per essay.

Japanese essay text

出かける前に二人が地図を見ている間に、サンドイッチを入れたバスケットに犬が入ってしまいました。それに気づかずに二人は楽しそうに出かけて行きました。やがて突然犬がバスケットから飛び出し、二人は驚きました。バスケットの中を見ると、食べ物はすべて犬に食べられていて、二人は困ってしまいました。(ID_JJJ01_SW1)

The score of the example above was B1. Figure 3 provides an example of holistic and trait scores provided by GPT-4 (with a prompt indicating all measures) via Bing Footnote 6 .

Example of GPT-4 AES and feedback (with a prompt indicating all measures).

Statistical analysis

The aim of this study is to investigate the potential use of LLM for nonnative Japanese AES. It seeks to compare the scoring outcomes obtained from feature-based AES tools, which rely on conventional machine learning technology (i.e. Jess, JWriter), with those generated by AI-driven AES tools utilizing deep learning technology (BERT, GPT, OCLL). To assess the reliability of a computer-assisted annotation tool, the study initially established human-human agreement as the benchmark measure. Subsequently, the performance of the LLM-based method was evaluated by comparing it to human-human agreement.

To assess annotation agreement, the study employed standard measures such as precision, recall, and F-score (Brants 2000 ; Lu 2010 ), along with the quadratically weighted kappa (QWK) to evaluate the consistency and agreement in the annotation process. Assume A and B represent human annotators. When comparing the annotations of the two annotators, the following results are obtained. The evaluation of precision, recall, and F-score metrics was illustrated in equations (13) to (15).

${\rm{Recall}}(A,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,A}$

${\rm{Precision}}(A,\,B)=\frac{{\rm{Number}}\,{\rm{of}}\,{\rm{identical}}\,{\rm{nodes}}\,{\rm{in}}\,A\,{\rm{and}}\,B}{{\rm{Number}}\,{\rm{of}}\,{\rm{nodes}}\,{\rm{in}}\,B}$

The F-score is the harmonic mean of recall and precision:

${\rm{F}}-{\rm{score}}=\frac{2* ({\rm{Precision}}* {\rm{Recall}})}{{\rm{Precision}}+{\rm{Recall}}}$

The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.

In accordance with Taghipour and Ng ( 2016 ), the calculation of QWK involves two steps:

Step 1: Construct a weight matrix W as follows:

${W}_{{ij}}=\frac{{(i-j)}^{2}}{{(N-1)}^{2}}$

i represents the annotation made by the tool, while j represents the annotation made by a human rater. N denotes the total number of possible annotations. Matrix O is subsequently computed, where O_( i, j ) represents the count of data annotated by the tool ( i ) and the human annotator ( j ). On the other hand, E refers to the expected count matrix, which undergoes normalization to ensure that the sum of elements in E matches the sum of elements in O.

Step 2: With matrices O and E, the QWK is obtained as follows:

K = 1- $\frac{\sum i,j{W}_{i,j}\,{O}_{i,j}}{\sum i,j{W}_{i,j}\,{E}_{i,j}}$

The value of the quadratic weighted kappa increases as the level of agreement improves. Further, to assess the accuracy of LLM scoring, the proportional reductive mean square error (PRMSE) was employed. The PRMSE approach takes into account the variability observed in human ratings to estimate the rater error, which is then subtracted from the variance of the human labels. This calculation provides an overall measure of agreement between the automated scores and true scores (Haberman et al. 2015 ; Loukina et al. 2020 ; Taghipour and Ng, 2016 ). The computation of PRMSE involves the following steps:

Step 1: Calculate the mean squared errors (MSEs) for the scoring outcomes of the computer-assisted tool (MSE tool) and the human scoring outcomes (MSE human).

Step 2: Determine the PRMSE by comparing the MSE of the computer-assisted tool (MSE tool) with the MSE from human raters (MSE human), using the following formula:

${\rm{PRMSE}}=1-\frac{({\rm{MSE}}\,{\rm{tool}})\,}{({\rm{MSE}}\,{\rm{human}})\,}=1-\,\frac{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-{\hat{{\rm{y}}}}_{{\rm{i}}})}^{2}}{{\sum }_{i}^{n}=1{({{\rm{y}}}_{i}-\hat{{\rm{y}}})}^{2}}$

In the numerator, ŷi represents the scoring outcome predicted by a specific LLM-driven AES system for a given sample. The term y i − ŷ i represents the difference between this predicted outcome and the mean value of all LLM-driven AES systems’ scoring outcomes. It quantifies the deviation of the specific LLM-driven AES system’s prediction from the average prediction of all LLM-driven AES systems. In the denominator, y i − ŷ represents the difference between the scoring outcome provided by a specific human rater for a given sample and the mean value of all human raters’ scoring outcomes. It measures the discrepancy between the specific human rater’s score and the average score given by all human raters. The PRMSE is then calculated by subtracting the ratio of the MSE tool to the MSE human from 1. PRMSE falls within the range of 0 to 1, with larger values indicating reduced errors in LLM’s scoring compared to those of human raters. In other words, a higher PRMSE implies that LLM’s scoring demonstrates greater accuracy in predicting the true scores (Loukina et al. 2020 ). The interpretation of kappa values, ranging from 0 to 1, is based on the work of Landis and Koch ( 1977 ). Specifically, the following categories are assigned to different ranges of kappa values: −1 indicates complete inconsistency, 0 indicates random agreement, 0.0 ~ 0.20 indicates extremely low level of agreement (slight), 0.21 ~ 0.40 indicates moderate level of agreement (fair), 0.41 ~ 0.60 indicates medium level of agreement (moderate), 0.61 ~ 0.80 indicates high level of agreement (substantial), 0.81 ~ 1 indicates almost perfect level of agreement. All statistical analyses were executed using Python script.

Results and discussion

Annotation reliability of the llm.

This section focuses on assessing the reliability of the LLM’s annotation and scoring capabilities. To evaluate the reliability, several tests were conducted simultaneously, aiming to achieve the following objectives:

Assess the LLM’s ability to differentiate between test takers with varying levels of oral proficiency.

Determine the level of agreement between the annotations and scoring performed by the LLM and those done by human raters.

The evaluation of the results encompassed several metrics, including: precision, recall, F-Score, quadratically-weighted kappa, proportional reduction of mean squared error, Pearson correlation, and multi-faceted Rasch measurement.

Inter-annotator agreement (human–human annotator agreement)

We started with an agreement test of the two human annotators. Two trained annotators were recruited to determine the writing task data measures. A total of 714 scripts, as the test data, was utilized. Each analysis lasted 300–360 min. Inter-annotator agreement was evaluated using the standard measures of precision, recall, and F-score and QWK. Table 7 presents the inter-annotator agreement for the various indicators. As shown, the inter-annotator agreement was fairly high, with F-scores ranging from 1.0 for sentence and word number to 0.666 for grammatical errors.

The findings from the QWK analysis provided further confirmation of the inter-annotator agreement. The QWK values covered a range from 0.950 ( p = 0.000) for sentence and word number to 0.695 for synonym overlap number (keyword) and grammatical errors ( p = 0.001).

Agreement of annotation outcomes between human and LLM

To evaluate the consistency between human annotators and LLM annotators (BERT, GPT, OCLL) across the indices, the same test was conducted. The results of the inter-annotator agreement (F-score) between LLM and human annotation are provided in Appendix B-D. The F-scores ranged from 0.706 for Grammatical error # for OCLL-human to a perfect 1.000 for GPT-human, for sentences, clauses, T-units, and words. These findings were further supported by the QWK analysis, which showed agreement levels ranging from 0.807 ( p = 0.001) for metadiscourse markers for OCLL-human to 0.962 for words ( p = 0.000) for GPT-human. The findings demonstrated that the LLM annotation achieved a significant level of accuracy in identifying measurement units and counts.

Reliability of LLM-driven AES’s scoring and discriminating proficiency levels

This section examines the reliability of the LLM-driven AES scoring through a comparison of the scoring outcomes produced by human raters and the LLM ( Reliability of LLM-driven AES scoring ). It also assesses the effectiveness of the LLM-based AES system in differentiating participants with varying proficiency levels ( Reliability of LLM-driven AES discriminating proficiency levels ).

Reliability of LLM-driven AES scoring

Table 8 summarizes the QWK coefficient analysis between the scores computed by the human raters and the GPT-4 for the individual essays from I-JAS Footnote 7 . As shown, the QWK of all measures ranged from k = 0.819 for lexical density (number of lexical words (tokens)/number of words per essay) to k = 0.644 for word2vec cosine similarity. Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for syntactic complexity to 0.734 for grammatical accuracy. The correlations between the writing proficiency scores assigned by human raters and the BERT-based AES system were found to range from 0.661 for syntactic complexity to 0.713 for grammatical accuracy. The correlations between the writing proficiency scores given by human raters and the OCLL-based AES system ranged from 0.654 for cohesion to 0.721 for grammatical accuracy. These findings indicated an alignment between the assessments made by human raters and both the BERT-based and OCLL-based AES systems in terms of various aspects of writing proficiency.

Reliability of LLM-driven AES discriminating proficiency levels

After validating the reliability of the LLM’s annotation and scoring, the subsequent objective was to evaluate its ability to distinguish between various proficiency levels. For this analysis, a dataset of 686 individual essays was utilized. Table 10 presents a sample of the results, summarizing the means, standard deviations, and the outcomes of the one-way ANOVAs based on the measures assessed by the GPT-4 model. A post hoc multiple comparison test, specifically the Bonferroni test, was conducted to identify any potential differences between pairs of levels.

As the results reveal, seven measures presented linear upward or downward progress across the three proficiency levels. These were marked in bold in Table 10 and comprise one measure of lexical richness, i.e. MATTR (lexical diversity); four measures of syntactic complexity, i.e. MDD (mean dependency distance), MLC (mean length of clause), CNT (complex nominals per T-unit), CPC (coordinate phrases rate); one cohesion measure, i.e. word2vec cosine similarity and GER (grammatical error rate). Regarding the ability of the sixteen measures to distinguish adjacent proficiency levels, the Bonferroni tests indicated that statistically significant differences exist between the primary level and the intermediate level for MLC and GER. One measure of lexical richness, namely LD, along with three measures of syntactic complexity (VPT, CT, DCT, ACC), two measures of cohesion (SOPT, SOPK), and one measure of content elaboration (IMM), exhibited statistically significant differences between proficiency levels. However, these differences did not demonstrate a linear progression between adjacent proficiency levels. No significant difference was observed in lexical sophistication between proficiency levels.

To summarize, our study aimed to evaluate the reliability and differentiation capabilities of the LLM-driven AES method. For the first objective, we assessed the LLM’s ability to differentiate between test takers with varying levels of oral proficiency using precision, recall, F-Score, and quadratically-weighted kappa. Regarding the second objective, we compared the scoring outcomes generated by human raters and the LLM to determine the level of agreement. We employed quadratically-weighted kappa and Pearson correlations to compare the 16 writing proficiency measures for the individual essays. The results confirmed the feasibility of using the LLM for annotation and scoring in AES for nonnative Japanese. As a result, Research Question 1 has been addressed.

Comparison of BERT-, GPT-, OCLL-based AES, and linguistic-feature-based computation methods

This section aims to compare the effectiveness of five AES methods for nonnative Japanese writing, i.e. LLM-driven approaches utilizing BERT, GPT, and OCLL, linguistic feature-based approaches using Jess and JWriter. The comparison was conducted by comparing the ratings obtained from each approach with human ratings. All ratings were derived from the dataset introduced in Dataset . To facilitate the comparison, the agreement between the automated methods and human ratings was assessed using QWK and PRMSE. The performance of each approach was summarized in Table 11 .

The QWK coefficient values indicate that LLMs (GPT, BERT, OCLL) and human rating outcomes demonstrated higher agreement compared to feature-based AES methods (Jess and JWriter) in assessing writing proficiency criteria, including lexical richness, syntactic complexity, content, and grammatical accuracy. Among the LLMs, the GPT-4 driven AES and human rating outcomes showed the highest agreement in all criteria, except for syntactic complexity. The PRMSE values suggest that the GPT-based method outperformed linguistic feature-based methods and other LLM-based approaches. Moreover, an interesting finding emerged during the study: the agreement coefficient between GPT-4 and human scoring was even higher than the agreement between different human raters themselves. This discovery highlights the advantage of GPT-based AES over human rating. Ratings involve a series of processes, including reading the learners’ writing, evaluating the content and language, and assigning scores. Within this chain of processes, various biases can be introduced, stemming from factors such as rater biases, test design, and rating scales. These biases can impact the consistency and objectivity of human ratings. GPT-based AES may benefit from its ability to apply consistent and objective evaluation criteria. By prompting the GPT model with detailed writing scoring rubrics and linguistic features, potential biases in human ratings can be mitigated. The model follows a predefined set of guidelines and does not possess the same subjective biases that human raters may exhibit. This standardization in the evaluation process contributes to the higher agreement observed between GPT-4 and human scoring. Section Prompt strategy of the study delves further into the role of prompts in the application of LLMs to AES. It explores how the choice and implementation of prompts can impact the performance and reliability of LLM-based AES methods. Furthermore, it is important to acknowledge the strengths of the local model, i.e. the Japanese local model OCLL, which excels in processing certain idiomatic expressions. Nevertheless, our analysis indicated that GPT-4 surpasses local models in AES. This superior performance can be attributed to the larger parameter size of GPT-4, estimated to be between 500 billion and 1 trillion, which exceeds the sizes of both BERT and the local model OCLL.

Prompt strategy

In the context of prompt strategy, Mizumoto and Eguchi ( 2023 ) conducted a study where they applied the GPT-3 model to automatically score English essays in the TOEFL test. They found that the accuracy of the GPT model alone was moderate to fair. However, when they incorporated linguistic measures such as cohesion, syntactic complexity, and lexical features alongside the GPT model, the accuracy significantly improved. This highlights the importance of prompt engineering and providing the model with specific instructions to enhance its performance. In this study, a similar approach was taken to optimize the performance of LLMs. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. Model 1 was used as the baseline, representing GPT-4 without any additional prompting. Model 2, on the other hand, involved GPT-4 prompted with 16 measures that included scoring criteria, efficient linguistic features for writing assessment, and detailed measurement units and calculation formulas. The remaining models (Models 3 to 18) utilized GPT-4 prompted with individual measures. The performance of these 18 different models was assessed using the output indicators described in Section Criteria (output indicator) . By comparing the performances of these models, the study aimed to understand the impact of prompt engineering on the accuracy and effectiveness of GPT-4 in AES tasks.

Based on the PRMSE scores presented in Fig. 4 , it was observed that Model 1, representing GPT-4 without any additional prompting, achieved a fair level of performance. However, Model 2, which utilized GPT-4 prompted with all measures, outperformed all other models in terms of PRMSE score, achieving a score of 0.681. These results indicate that the inclusion of specific measures and prompts significantly enhanced the performance of GPT-4 in AES. Among the measures, syntactic complexity was found to play a particularly significant role in improving the accuracy of GPT-4 in assessing writing quality. Following that, lexical diversity emerged as another important factor contributing to the model’s effectiveness. The study suggests that a well-prompted GPT-4 can serve as a valuable tool to support human assessors in evaluating writing quality. By utilizing GPT-4 as an automated scoring tool, the evaluation biases associated with human raters can be minimized. This has the potential to empower teachers by allowing them to focus on designing writing tasks and guiding writing strategies, while leveraging the capabilities of GPT-4 for efficient and reliable scoring.

PRMSE scores of the 18 AES models.

This study aimed to investigate two main research questions: the feasibility of utilizing LLMs for AES and the impact of prompt engineering on the application of LLMs in AES.

To address the first objective, the study compared the effectiveness of five different models: GPT, BERT, the Japanese local LLM (OCLL), and two conventional machine learning-based AES tools (Jess and JWriter). The PRMSE values indicated that the GPT-4-based method outperformed other LLMs (BERT, OCLL) and linguistic feature-based computational methods (Jess and JWriter) across various writing proficiency criteria. Furthermore, the agreement coefficient between GPT-4 and human scoring surpassed the agreement among human raters themselves, highlighting the potential of using the GPT-4 tool to enhance AES by reducing biases and subjectivity, saving time, labor, and cost, and providing valuable feedback for self-study. Regarding the second goal, the role of prompt design was investigated by comparing 18 models, including a baseline model, a model prompted with all measures, and 16 models prompted with one measure at a time. GPT-4, which outperformed BERT and OCLL, was selected as the candidate model. The PRMSE scores of the models showed that GPT-4 prompted with all measures achieved the best performance, surpassing the baseline and other models.

In conclusion, this study has demonstrated the potential of LLMs in supporting human rating in assessments. By incorporating automation, we can save time and resources while reducing biases and subjectivity inherent in human rating processes. Automated language assessments offer the advantage of accessibility, providing equal opportunities and economic feasibility for individuals who lack access to traditional assessment centers or necessary resources. LLM-based language assessments provide valuable feedback and support to learners, aiding in the enhancement of their language proficiency and the achievement of their goals. This personalized feedback can cater to individual learner needs, facilitating a more tailored and effective language-learning experience.

There are three important areas that merit further exploration. First, prompt engineering requires attention to ensure optimal performance of LLM-based AES across different language types. This study revealed that GPT-4, when prompted with all measures, outperformed models prompted with fewer measures. Therefore, investigating and refining prompt strategies can enhance the effectiveness of LLMs in automated language assessments. Second, it is crucial to explore the application of LLMs in second-language assessment and learning for oral proficiency, as well as their potential in under-resourced languages. Recent advancements in self-supervised machine learning techniques have significantly improved automatic speech recognition (ASR) systems, opening up new possibilities for creating reliable ASR systems, particularly for under-resourced languages with limited data. However, challenges persist in the field of ASR. First, ASR assumes correct word pronunciation for automatic pronunciation evaluation, which proves challenging for learners in the early stages of language acquisition due to diverse accents influenced by their native languages. Accurately segmenting short words becomes problematic in such cases. Second, developing precise audio-text transcriptions for languages with non-native accented speech poses a formidable task. Last, assessing oral proficiency levels involves capturing various linguistic features, including fluency, pronunciation, accuracy, and complexity, which are not easily captured by current NLP technology.

Data availability

The dataset utilized was obtained from the International Corpus of Japanese as a Second Language (I-JAS). The data URLs: [ https://www2.ninjal.ac.jp/jll/lsaj/ihome2.html ].

J-CAT and TTBJ are two computerized adaptive tests used to assess Japanese language proficiency.

SPOT is a specific component of the TTBJ test.

J-CAT: https://www.j-cat2.org/html/ja/pages/interpret.html

SPOT: https://ttbj.cegloc.tsukuba.ac.jp/p1.html#SPOT .

The study utilized a prompt-based GPT-4 model, developed by OpenAI, which has an impressive architecture with 1.8 trillion parameters across 120 layers. GPT-4 was trained on a vast dataset of 13 trillion tokens, using two stages: initial training on internet text datasets to predict the next token, and subsequent fine-tuning through reinforcement learning from human feedback.

https://www2.ninjal.ac.jp/jll/lsaj/ihome2-en.html .

http://jhlee.sakura.ne.jp/JEV/ by Japanese Learning Dictionary Support Group 2015.

We express our sincere gratitude to the reviewer for bringing this matter to our attention.

On February 7, 2023, Microsoft began rolling out a major overhaul to Bing that included a new chatbot feature based on OpenAI’s GPT-4 (Bing.com).

Appendix E-F present the analysis results of the QWK coefficient between the scores computed by the human raters and the BERT, OCLL models.

Attali Y, Burstein J (2006) Automated essay scoring with e-rater® V.2. J. Technol., Learn. Assess., 4

Barkaoui K, Hadidi A (2020) Assessing Change in English Second Language Writing Performance (1st ed.). Routledge, New York. https://doi.org/10.4324/9781003092346

Bentz C, Tatyana R, Koplenig A, Tanja S (2016) A comparison between morphological complexity. measures: Typological data vs. language corpora. In Proceedings of the workshop on computational linguistics for linguistic complexity (CL4LC), 142–153. Osaka, Japan: The COLING 2016 Organizing Committee

Bond TG, Yan Z, Heene M (2021) Applying the Rasch model: Fundamental measurement in the human sciences (4th ed). Routledge

Brants T (2000) Inter-annotator agreement for a German newspaper corpus. Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, 31 May-2 June, European Language Resources Association

Brown TB, Mann B, Ryder N, et al. (2020) Language models are few-shot learners. Advances in Neural Information Processing Systems, Online, 6–12 December, Curran Associates, Inc., Red Hook, NY

Burstein J (2003) The E-rater scoring engine: Automated essay scoring with natural language processing. In Shermis MD and Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Čech R, Miroslav K (2018) Morphological richness of text. In Masako F, Václav C (ed) Taming the corpus: From inflection and lexis to interpretation, 63–77. Cham, Switzerland: Springer Nature

Çöltekin Ç, Taraka, R (2018) Exploiting Universal Dependencies treebanks for measuring morphosyntactic complexity. In Aleksandrs B, Christian B (ed), Proceedings of first workshop on measuring language complexity, 1–7. Torun, Poland

Crossley SA, Cobb T, McNamara DS (2013) Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications. System 41:965–981. https://doi.org/10.1016/j.system.2013.08.002

Article Google Scholar

Crossley SA, McNamara DS (2016) Say more and be more coherent: How text elaboration and cohesion can increase writing quality. J. Writ. Res. 7:351–370

CyberAgent Inc (2023) Open-Calm series of Japanese language models. Retrieved from: https://www.cyberagent.co.jp/news/detail/id=28817

Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Minnesota, 2–7 June, pp. 4171–4186. Association for Computational Linguistics

Diez-Ortega M, Kyle K (2023) Measuring the development of lexical richness of L2 Spanish: a longitudinal learner corpus study. Studies in Second Language Acquisition 1-31

Eckes T (2009) On common ground? How raters perceive scoring criteria in oral proficiency testing. In Brown A, Hill K (ed) Language testing and evaluation 13: Tasks and criteria in performance assessment (pp. 43–73). Peter Lang Publishing

Elliot S (2003) IntelliMetric: from here to validity. In: Shermis MD, Burstein JC (ed) Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Google Scholar

Engber CA (1995) The relationship of lexical proficiency to the quality of ESL compositions. J. Second Lang. Writ. 4:139–155

Garner J, Crossley SA, Kyle K (2019) N-gram measures and L2 writing proficiency. System 80:176–187. https://doi.org/10.1016/j.system.2018.12.001

Haberman SJ (2008) When can subscores have value? J. Educat. Behav. Stat., 33:204–229

Haberman SJ, Yao L, Sinharay S (2015) Prediction of true test scores from observed item scores and ancillary data. Brit. J. Math. Stat. Psychol. 68:363–385

Halliday MAK (1985) Spoken and Written Language. Deakin University Press, Melbourne, Australia

Hirao R, Arai M, Shimanaka H et al. (2020) Automated essay scoring system for nonnative Japanese learners. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1250–1257. European Language Resources Association

Hunt KW (1966) Recent Measures in Syntactic Development. Elementary English, 43(7), 732–739. http://www.jstor.org/stable/41386067

Ishioka T (2001) About e-rater, a computer-based automatic scoring system for essays [Konpyūta ni yoru essei no jidō saiten shisutemu e − rater ni tsuite]. University Entrance Examination. Forum [Daigaku nyūshi fōramu] 24:71–76

Hochreiter S, Schmidhuber J (1997) Long short- term memory. Neural Comput. 9(8):1735–1780

Article CAS PubMed Google Scholar

Ishioka T, Kameda M (2006) Automated Japanese essay scoring system based on articles written by experts. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 17–18 July 2006, pp. 233-240. Association for Computational Linguistics, USA

Japan Foundation (2021) Retrieved from: https://www.jpf.gp.jp/j/project/japanese/survey/result/dl/survey2021/all.pdf

Jarvis S (2013a) Defining and measuring lexical diversity. In Jarvis S, Daller M (ed) Vocabulary knowledge: Human ratings and automated measures (Vol. 47, pp. 13–44). John Benjamins. https://doi.org/10.1075/sibil.47.03ch1

Jarvis S (2013b) Capturing the diversity in lexical diversity. Lang. Learn. 63:87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

Jiang J, Quyang J, Liu H (2019) Interlanguage: A perspective of quantitative linguistic typology. Lang. Sci. 74:85–97

Kim M, Crossley SA, Kyle K (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. Mod. Lang. J. 102(1):120–141. https://doi.org/10.1111/modl.12447

Kojima T, Gu S, Reid M et al. (2022) Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, New Orleans, LA, 29 November-1 December, Curran Associates, Inc., Red Hook, NY

Kyle K, Crossley SA (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Q 49:757–786

Kyle K, Crossley SA, Berger CM (2018) The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 50:1030–1046. https://doi.org/10.3758/s13428-017-0924-4

Article PubMed Google Scholar

Kyle K, Crossley SA, Jarvis S (2021) Assessing the validity of lexical diversity using direct judgements. Lang. Assess. Q. 18:154–170. https://doi.org/10.1080/15434303.2020.1844205

Landauer TK, Laham D, Foltz PW (2003) Automated essay scoring and annotation of essays with the Intelligent Essay Assessor. In Shermis MD, Burstein JC (ed), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates, Mahwah, NJ

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174

Laufer B, Nation P (1995) Vocabulary size and use: Lexical richness in L2 written production. Appl. Linguist. 16:307–322. https://doi.org/10.1093/applin/16.3.307

Lee J, Hasebe Y (2017) jWriter Learner Text Evaluator, URL: https://jreadability.net/jwriter/

Lee J, Kobayashi N, Sakai T, Sakota K (2015) A Comparison of SPOT and J-CAT Based on Test Analysis [Tesuto bunseki ni motozuku ‘SPOT’ to ‘J-CAT’ no hikaku]. Research on the Acquisition of Second Language Japanese [Dainigengo to shite no nihongo no shūtoku kenkyū] (18) 53–69

Li W, Yan J (2021) Probability distribution of dependency distance based on a Treebank of. Japanese EFL Learners’ Interlanguage. J. Quant. Linguist. 28(2):172–186. https://doi.org/10.1080/09296174.2020.1754611

Article MathSciNet Google Scholar

Linacre JM (2002) Optimizing rating scale category effectiveness. J. Appl. Meas. 3(1):85–106

PubMed Google Scholar

Linacre JM (1994) Constructing measurement with a Many-Facet Rasch Model. In Wilson M (ed) Objective measurement: Theory into practice, Volume 2 (pp. 129–144). Norwood, NJ: Ablex

Liu H (2008) Dependency distance as a metric of language comprehension difficulty. J. Cognitive Sci. 9:159–191

Liu H, Xu C, Liang J (2017) Dependency distance: A new perspective on syntactic patterns in natural languages. Phys. Life Rev. 21. https://doi.org/10.1016/j.plrev.2017.03.002

Loukina A, Madnani N, Cahill A, et al. (2020) Using PRMSE to evaluate automated scoring systems in the presence of label noise. Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, Seattle, WA, USA → Online, 10 July, pp. 18–29. Association for Computational Linguistics

Lu X (2010) Automatic analysis of syntactic complexity in second language writing. Int. J. Corpus Linguist. 15:474–496

Lu X (2012) The relationship of lexical richness to the quality of ESL learners’ oral narratives. Mod. Lang. J. 96:190–208

Lu X (2017) Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Lang. Test. 34:493–511

Lu X, Hu R (2022) Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behav. Res. Method. 54:1444–1460. https://doi.org/10.3758/s13428-021-01675-6

Ministry of Health, Labor, and Welfare of Japan (2022) Retrieved from: https://www.mhlw.go.jp/stf/newpage_30367.html

Mizumoto A, Eguchi M (2023) Exploring the potential of using an AI language model for automated essay scoring. Res. Methods Appl. Linguist. 3:100050

Okgetheng B, Takeuchi K (2024) Estimating Japanese Essay Grading Scores with Large Language Models. Proceedings of 30th Annual Conference of the Language Processing Society in Japan, March 2024

Ortega L (2015) Second language learning explained? SLA across 10 contemporary theories. In VanPatten B, Williams J (ed) Theories in Second Language Acquisition: An Introduction

Rae JW, Borgeaud S, Cai T, et al. (2021) Scaling Language Models: Methods, Analysis & Insights from Training Gopher. ArXiv, abs/2112.11446

Read J (2000) Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

Rudner LM, Liang T (2002) Automated Essay Scoring Using Bayes’ Theorem. J. Technol., Learning and Assessment, 1 (2)

Sakoda K, Hosoi Y (2020) Accuracy and complexity of Japanese Language usage by SLA learners in different learning environments based on the analysis of I-JAS, a learners’ corpus of Japanese as L2. Math. Linguist. 32(7):403–418. https://doi.org/10.24701/mathling.32.7_403

Suzuki N (1999) Summary of survey results regarding comprehensive essay questions. Final report of “Joint Research on Comprehensive Examinations for the Aim of Evaluating Applicability to Each Specialized Field of Universities” for 1996-2000 [shōronbun sōgō mondai ni kansuru chōsa kekka no gaiyō. Heisei 8 - Heisei 12-nendo daigaku no kaku senmon bun’ya e no tekisei no hyōka o mokuteki to suru sōgō shiken no arikata ni kansuru kyōdō kenkyū’ saishū hōkoku-sho]. University Entrance Examination Section Center Research and Development Department [Daigaku nyūshi sentā kenkyū kaihatsubu], 21–32

Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 1–5 November, pp. 1882–1891. Association for Computational Linguistics

Takeuchi K, Ohno M, Motojin K, Taguchi M, Inada Y, Iizuka M, Abo T, Ueda H (2021) Development of essay scoring methods based on reference texts with construction of research-available Japanese essay data. In IPSJ J 62(9):1586–1604

Ure J (1971) Lexical density: A computational technique and some findings. In Coultard M (ed) Talking about Text. English Language Research, University of Birmingham, Birmingham, England

Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. In Advances in Neural Information Processing Systems, Long Beach, CA, 4–7 December, pp. 5998–6008, Curran Associates, Inc., Red Hook, NY

Watanabe H, Taira Y, Inoue Y (1988) Analysis of essay evaluation data [Shōronbun hyōka dēta no kaiseki]. Bulletin of the Faculty of Education, University of Tokyo [Tōkyōdaigaku kyōiku gakubu kiyō], Vol. 28, 143–164

Yao S, Yu D, Zhao J, et al. (2023) Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36

Zenker F, Kyle K (2021) Investigating minimum text lengths for lexical diversity indices. Assess. Writ. 47:100505. https://doi.org/10.1016/j.asw.2020.100505

Zhang Y, Warstadt A, Li X, et al. (2021) When do you need billions of words of pretraining data? Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, pp. 1112-1125. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.90

Download references

This research was funded by National Foundation of Social Sciences (22BYY186) to Wenchao Li.

Author information

Authors and affiliations.

Department of Japanese Studies, Zhejiang University, Hangzhou, China

Department of Linguistics and Applied Linguistics, Zhejiang University, Hangzhou, China

You can also search for this author in PubMed Google Scholar

Contributions

Wenchao Li is in charge of conceptualization, validation, formal analysis, investigation, data curation, visualization and writing the draft. Haitao Liu is in charge of supervision.

Corresponding author

Correspondence to Wenchao Li .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental material file #1, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Liu, H. Applying large language models for automated essay scoring for non-native Japanese. Humanit Soc Sci Commun 11 , 723 (2024). https://doi.org/10.1057/s41599-024-03209-9

Download citation

Received : 02 February 2024

Accepted : 16 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1057/s41599-024-03209-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Share full article

Supported by

Guest Essay

America’s Military Is Not Prepared for War — or Peace

A photo of U.S. Navy sailors, in silhouette, aboard an aircraft carrier.

By Roger Wicker

Mr. Wicker, a Republican, is the ranking member of the U.S. Senate Armed Services Committee.

“To be prepared for war,” George Washington said, “is one of the most effectual means of preserving peace.” President Ronald Reagan agreed with his forebear’s words, and peace through strength became a theme of his administration. In the past four decades, the American arsenal helped secure that peace, but political neglect has led to its atrophy as other nations’ war machines have kicked into high gear. Most Americans do not realize the specter of great power conflict has risen again.

It is far past time to rebuild America’s military. We can avoid war by preparing for it.

When America’s senior military leaders testify before my colleagues and me on the U.S. Senate Armed Services Committee behind closed doors, they have said that we face some of the most dangerous global threat environments since World War II. Then, they darken that already unsettling picture by explaining that our armed forces are at risk of being underequipped and outgunned. We struggle to build and maintain ships, our fighter jet fleet is dangerously small, and our military infrastructure is outdated. Meanwhile, America’s adversaries are growing their militaries and getting more aggressive.

In China, the country’s leader, Xi Jinping, has orchestrated a historic military modernization intended to exploit the U.S. military’s weaknesses. He has overtaken the U.S. Navy in fleet size, built one of the world’s largest missile stockpiles and made big advances in space. President Vladimir Putin of Russia has thrown Europe into war and mobilized his society for long-term conflict. Iran and its proxy groups have escalated their shadow war against Israel and increased attacks on U.S. ships and soldiers. And North Korea has disregarded efforts toward arms control negotiations and moved toward wartime readiness.

Worse yet, these governments are materially helping one another, cooperating in new ways to prevent an American-led 21st century. Iran has provided Russia with battlefield drones, and China is sending technical and logistical help to aid Mr. Putin’s war. They are also helping one another prepare for future fights by increasing weapons transfers and to evade sanctions. Their unprecedented coordination makes new global conflict increasingly possible.

That theoretical future could come faster than most Americans think. We may find ourselves in a state of extreme vulnerability in a matter of a few years, according to a growing consensus of experts. Our military readiness could be at its lowest point in decades just as China’s military in particular hits its stride. The U.S. Indo-Pacific commander released what I believe to be the largest list of unfunded items ever for services and combatant commands for next year’s budget, amounting to $11 billion. It requested funding for a raft of infrastructure, missile defense and targeting programs that would prove vital in a Pacific fight. China, on the other hand, has no such problems, as it accumulates the world’s leading hypersonic arsenal with a mix of other lethal cruise and attack missiles.

Our military leaders are being forced to make impossible choices. The Navy is struggling to adequately fund new ships, routine maintenance and munition procurement; it is unable to effectively address all three. We recently signed a deal to sell submarines to Australia, but we’ve failed to sufficiently fund our own submarine industrial base, leaving an aging fleet unprepared to respond to threats. Two of the three most important nuclear modernization programs are underfunded and are at risk of delays. The military faces a backlog of at least $180 billion for basic maintenance, from barracks to training ranges. This projects weakness to our adversaries as we send service members abroad with diminished ability to respond to crises.

Fortunately, we can change course. We can avoid that extreme vulnerability and resurrect American military might.

On Wednesday I am publishing a plan that includes a series of detailed proposals to address this reality head-on. We have been living off the Reagan military buildup for too long; it is time for updates and upgrades. My plan outlines why and how the United States should aim to spend an additional $55 billion on the military in the 2025 fiscal year and grow military spending from a projected 2.9 percent of our national gross domestic product this year to 5 percent over the next five to seven years.

It would be a significant investment that would start a reckoning over our nation’s spending priorities. There will be conversations ahead about all manner of budget questions. We do not need to spend this much indefinitely — but we do need a short-term generational investment to help us prevent another world war.

My blueprint would grow the Navy to 357 ships by 2035 and halt our shrinking Air Force fleet by producing at least 340 additional fighters in five years. This will help patch near-term holes and put each fleet on a sustainable trajectory. The plan would also replenish the Air Force tanker and training fleets, accelerate the modernization of the Army and Marine Corps, and invest in joint capabilities that are all too often forgotten, including logistics and munitions.

The proposal would build on the $3.3 billion in submarine industrial base funding included in the national security supplemental passed in April, so we can bolster our defense and that of our allies. It would also rapidly equip service members all over the world with innovative technologies at scale, from the seabed to the stars.

We should pair increased investment with wiser spending. Combining this crucial investment with fiscal responsibility would funnel resources to the most strategic ends. Emerging technology must play an essential role, and we can build and deploy much of it in less than five years. My road map would also help make improvements to the military procurement system and increase accountability for bureaucrats and companies that fail to perform on vital national security projects.

This whole endeavor would shake our status quo but be far less disruptive and expensive than the alternative. Should China decide to wage war with the United States, the global economy could immediately fall into a depression. Americans have grown far too comfortable under the decades-old presumption of overwhelming military superiority. And that false sense of security has led us to ignore necessary maintenance and made us vulnerable.

Our ability to deter our adversaries can be regained because we have done it before. At the 50th anniversary of Pearl Harbor, in the twilight of the Soviet Union, George H.W. Bush reflected on the lessons of Pearl Harbor. Though the conflict was long gone, it taught him an enduring lesson: “When it comes to national defense,” he said, “finishing second means finishing last.”

Regaining American strength will be expensive. But fighting a war — and worse, losing one — is far more costly. We need to begin a national conversation today on how we achieve a peaceful, prosperous and American-led 21st century. The first step is a generational investment in the U.S. military.

Roger Wicker is the senior U.S. senator from Mississippi and the ranking member of the Senate Armed Services Committee.

The Times is committed to publishing a diversity of letters to the editor. We’d like to hear what you think about this or any of our articles. Here are some tips . And here’s our email: [email protected] .

Follow the New York Times Opinion section on Facebook , Instagram , TikTok , WhatsApp , X and Threads .

IMAGES

Steps to writing a college essay
Tips on How to Write Effective Essay and 7 Major Types in 2021
Cambridge English Proficiency Writing Part 2
Writing Proficiency
College Application Essay Writing Help School
C2

VIDEO

Antarctica's New Accent #linguistics #english
Write an Essay Properly ! #essay #speaking #writing #eassywriting
Why Do British People Say This? #linguistics #english
ESSAY ÖRNEKLERİ ve ANALİZİ #4
ESSAY ÖRNEKLERİ ve ANALİZİ
İTÜ Proficiency "ESSAY WRITING"

COMMENTS

How to write an essay?
Identify and underline the key points in both input texts. Make sure to include that when you write your essay. Summarise the key points in your own words. Use an academic writing style (formal or neutral register). You must organise your ideas well, using an introduction, paragraphing and appropriate linking devices.
Guide to the Cambridge C2 Proficiency Writing Exam
First, let's look at the format of Part 1: Task: essay. Word count: 240-280 words. Register: formal. Overview: a summary of two texts and an evaluation of the ideas. Suggested structure: introduction, paragraph 1, paragraph 2, conclusion. Time: 1 hour 30 minutes for Part 1 and 2. Before we look at an example task, let's look at how your ...
PDF Proficiency Writing Part 1
Write your essay. The role of memory We like to think of our memory as our record of the past, but all too often memories are influenced by imagination. It is risky, therefore, to regard memory as a source of knowledge, because we will never be able to verify the accuracy of a memory fully. Although memory is
Essay
Example exam task: Write an essay summarising and evaluating the four key points from both texts. Use your own words throughout as far as possible, and include your own ideas in your answers. Tackling Traffic Congestion. Policy-makers employ a wide range of measures to tackle the problem of traffic congestion.
PDF C2 Proficiency: Writing Part 1
C2 Proficiency: Writing Part 1 . Description This activity familiarises students with the requirements of Part 1. It also helps students understand how to plan before they start writing. They complete a writing plan in table format in preparation for a Writing paper, ... • The type of writing is an essay • The focus is on summarising and ...
Cambridge C2 Proficiency (CPE): How to Write an Essay
The set text essay questions specify what particular aspect of the set text (development of character or significance of events) should form the content of the essay. Source: Cambridge English Assessment: C2 Proficiency Handbook for teachers. Essays are the first part of the writing test in Cambridge B2 Proficiency.
C2 Proficiency exam format
Removal of set text questions in C2 Proficiency Writing. ... Writing a discursive essay in which you have to summarise and evaluate the key points contained in two texts of approximately 100 words each. Candidates must integrate a summary of these key points, an evaluation of the abstract arguments involved and their own ideas on the topic in a ...
PDF Proficiency Writing
Part 1. Write an essay summarising and evaluating the key points from both texts. Use your own words. Write your answer in 240-280 words on the separate answer sheet. Shifting sands: behavioural change. Nowadays, in some cultures there may often be confusion between generations about what is acceptable behaviour in certain situations.
Cambridge C2 Proficiency (CPE): How to Write an Essay
In this video, we are going to look at essay writing in Cambridge C2 Proficiency. We are going to have a look at a typical task and use it go through the bes...
Tips on writing an essay for Cambridge English Proficiency ️
In this video on Proficiency I'm going to give you some pointers on how to produce a great essay for your Proficiency exam. ️ ️ Timestamps ⏳1 What is the fi...
Part 1
Write an essay summarising and evaluating the key points from both texts. Use your own words throughout as far as possible, and include your own ideas. This page helps to practice the C2 Proficient (CPE) writing part 1 - essay, providing insights into its structure, content, and scoring criteria.
Cambridge C2 Proficiency (CPE): How Your Writing is Marked
In a nutshell, there are four criteria your texts are assessed on: Content. Communicative Achievement. Organisation. Language. Each of these criteria is scored on a scale from 0-5 so you can score a maximum of 20 marks per text. As you have to complete two tasks in the official exam, the total possible score is 40.
C2
Writing an essay in part 1 of the Proficiency C2 exam can sometimes a bit confusing. In paper 2 you will be assessed on your skill to draft coherent and cohesive texts. QUICK INFO ABOUT PAPER 2 PART 1: Time: 1 hour 30 mins. Number of exercises: 2. Part 1 (Essay) is mandatory.
PDF Writing Resource Pack January 2021
The ECPE Writing Section was revised beginning with the May 2021 administration. The writing section now offers test takers a choice of two options— the first option is either an article or proposal, the second is an essay—in which they have to consider multiple points of view. Test takers refer to sources provided in the form of simple
Writing Proficiency Exam Scoring
The exams are read holistically: the score is based on the total impression the essay conveys. Each paper is scored on four areas: comprehension, organization, development and expression. Two faculty readers score your test on a scale from six (highest) to one (lowest). These scores are then combined. A total score of 8 or more reflects the two ...
PDF Assessing writing for Cambridge English Qualifications: A guide for
To prepare for the C2 Proficiency exam, learners should: • Read widely to familiarise themselves with the conventions and styles of different . types of writing (articles, reports, essays, reviews, etc.). • Read plenty of authentic texts (that is, not designed specifically for learners but . written for readers of English worldwide).
PDF C2 Proficiency teacher writing guide
About C2 Proficiency. Tests reading, writing, speaking and listening skills, plus use of English. Our highest level qualification that comes after C1 Advanced. Shows that learners can: Tests learners at CEFR Level C2. Can be taken on paper or on a computer. study demanding subjects at the highest level, including postgraduate and PhD programmes.
Evidence Essay Examples and Strategies: Key Insights
Basically, any type of evidence can be used in your argument as long as it provides credible and reliable proof. As already mentioned, essay evidence examples include statistical data, expert statements, and research results. You can find this information online, in textbooks, on web pages, and in scholarly articles.
Cambridge C2 Proficiency (CPE): How to Write a Report
Title & introduction. A report in Cambridge C2 Proficiency typically has a title, which can be descriptive and doesn't need to be anything special. For example, for our example task from earlier, we could choose a basic title like "Jobs Fair". That's it. The introduction or first paragraph of your text, however, is a little bit more ...
Exploring Students' Generative AI-Assisted Writing Processes
2.1 AI-Assisted Writing. Researchers have long been studying the utilization of AI technologies to support writing and language learning (Schulze, 2008).Three major technological innovations have revolutionized writing: (1) word processors, which represented the first major shift from manual to digital writing, replacing traditional typewriters and manual editing processes; (2) the Internet ...
How to write an article?
Shortly speaking, the main idea of the article should be concluded in the title. For example, if you are writing a description of a place, using adjectives can enhance the attractiveness of the place, before the reader begins reading the article, e.g. Title: "The Tranquility and Peace of an Island that Time Forgot".
Applying large language models for automated essay scoring for non
Table 9 further presents the Pearson correlations between the 16 writing proficiency measures scored by human raters and GPT 4 for the individual essays. The correlations ranged from 0.672 for ...
The Insider's Guide to Essay Writing Services: What Every Student ...
Proficiency & Expertise: Essay writing services frequently use experts from a variety of subjects, so the content is of the highest caliber and has been thoroughly researched. These experts know ...
Why the Pandemic Probably Started in a Lab, in 5 Key Points
Dr. Chan is a molecular biologist at the Broad Institute of M.I.T. and Harvard, and a co-author of "Viral: The Search for the Origin of Covid-19." This article has been updated to reflect news ...
Opinion
America's Military Is Not Prepared for War — or Peace. Mr. Wicker, a Republican, is the ranking member of the U.S. Senate Armed Services Committee. "To be prepared for war," George ...
How a teacher checks students work for AI
Teacher devises an ingenious way to check if students are using ChatGPT to write essays. This video describes a teacher's diabolical method for checking whether work submitted by students was ...
Trump's Trial Violated Due Process
Whether you love, hate or merely tolerate Donald Trump, you should care about due process, which is fundamental to the rule of law. New York's trial of Mr. Trump violated basic due-process ...
How to write a review?
The review should start with the title, and there are several ways to write it: imagine you're reviewing a book you can write [Title] by [Author] if you were reviewing a hotel you could write the [name of the hotel] - a review. or you can just write something catchy but it has to point to what you are going to review.

What’s in Part 1?

How can I write a fantastic essay?

How should I plan and structure my essay?

Introduction

What can I do to prepare?

What do I need to avoid?

What websites can help me?

Looking for further support?

Glossary for Language Learners

Leave a Reply

Improve your English pronunciation by mastering these 10 tricky words

5 Spelling Rules For Comparative And Superlative Adjectives

Related Post

A Guide to English Accents Aro

Passing Cambridge C2 Proficien

Exploring the Impact of AI in

Everything You Need To Know Ab

The Importance of English For

Discovering Barcelona Through

8 New Words To Improve Your Vo

Learning English through Chris

24 Christmas Phrases for Joyfu

3 Easy Ways To Use Music To Im

Grammar Guide – Understandin

Halloween Humour: Jokes, Puns

English for Business: 7 Ways L

A Beginner’s Guide to Ch

English Tongue Twisters to Imp

25 years of Oxford House – O

Guide to the Cambridge C2 Prof

9 Tips For Communicating With

English Vocabulary For Getting

5 Spelling Rules For Comparati

Improve your English pronuncia

Using Language Reactor To Lear

How to use ChatGPT to practise

Tips for the IELTS listening s

7 new English words to improve

How to Write a C1 Advanced Ema

5 Interesting Christmas tradit

How to write a C1 Advanced Rep

5 of the best apps to improve

Tips for the IELTS Reading sec

The 5 best Halloween movies to

How to Write a Review for Camb

How To Use Relative Pronouns i

How To Get Top Marks: Cambridg

Travel Vocabulary To Get Your

How To Get A High Score In The

10 English Idioms To Take To T

Tips for IELTS speaking sectio

How to use 6 different English

How to get top marks: B2 First

5 Of The Best Apps For Improvi

8 Vocabulary Mistakes Spanish

How To Get Top Marks: B2 First

12 Business Phrasal Verbs to K

How to use articles (a, an, th

Passing B2 First Part 3: Readi

8 new English words you need f

7 of the Best Apps for Learnin

4 Different Types Of Modal Ver

Useful Expressions For Negotia

Passing C1 Advanced Part 8: Re

The Difference Between IELTS G

Passing C1 Advanced Part 7: Re

The Benefits Of Learning Engli

6 Reels Accounts to Learn Engl

Passing Cambridge C1 Advanced

8 Resources To Help Beginner E

5 Famous Speeches To Help you

How To Write A B2 First Formal

4 Conditionals In English And

7 Of The Best Apps For Learnin

How to write a B2 First inform

How can I teach my kids Englis

How Roxana went from Beginner

4 Future Tenses In English And

10 Business Idioms For The Wor

5 Tips For Reading The News In