Death by summative over-assessment — When are students supposed to learn?

The assessment load is manageable for students, teachers and support services.” — Unnamed university assessment policy (publicly available)

Gotta love the part of term where every course decides to give an assessment with 1 week to do it, all due at the same time.” — Unnamed undergrad student

The road to hell is paved with good intentions.” – exact source unknown

I am now a little over a year into the role of having responsibility for my school’s overall teaching program. It always feels like there’s pressure to get right down to action in a new role, but unless there’s an impending crisis, which there was — COVID-19 — albeit in a different space, then the wise money is to sit on your hands for a bit, and take the luxury of a nice long observe & orient phase, before getting to work. Your decisions and actions are much more likely to be in a sensible and ultimately correct direction this way.

In taking it all in, it’s way too easy to focus on the what and how, and forget the most important question of all — Why? I’ve possibly annoyed the hell out of some of my colleagues in the last 12 months with my ‘why’ questions, applied to everything from ‘Why do we have exams?’ and ‘Why do we have lectures?’ to ‘Why do we even care about market share or WAM differential?’ But in my view the ‘why’ is essential to working out what truly matters and what doesn’t. And when it comes to assessment, my digging & thinking on why this past 12 months has only convinced me that the modern higher education system’s real raison d’etre is nothing short of disturbing and pathological. To put it bluntly:

… there is one path to ultimate happiness — having money — that in turn comes from attending prestigious colleges.” — Michael J Sandel

Sandel’s Tyranny of Merit is an excellent read and should be on the reading list for every academic in the modern higher education system, particularly the chapter titled “The Sorting Machine“, but let me explain how Sandel’s thesis becomes my point in this blog post.

Sandel’s central theme ultimately is that “the meritocractic ideal is not a remedy for inequality; it is a justification of inequality.” The role that the modern university plays in that is twofold: a) it is the gatekeeper that provides the ‘ticket of merit’ required to access a limited supply of high-status/salary jobs, and b) it is a filter or ‘sorting machine’ for taking in a large number of people, deciding their relative merit through a large series of measurements, and appending that determination to their ‘ticket of merit’.

We can argue about the merits of ‘meritocracy’ (pun intended) as a philosophy separately, but in the context of trying to be an educator, what are the inevitable outcomes of such a heavily ‘meritocratic’ system.

From the organisational perspective:

  • Everything tends to focus on measurement, i.e., assessment, over actual education.
  • All assessment tends to become summative rather than formative.
  • There is a tendency towards ‘single-figure’ metrics to enable easy/rapid (i.e., lazy) ranking.
  • There is a tendency to ignore uncertainty in measurement and present absolutes.
  • There is a tendency to measure continuously from start to finish, ignoring that this entrenches academic privilege (a cynic would suggest this is a stealth design feature, and to an extent, be correct).
  • There is a tendency to obsess over ‘cheatability’. Assessment design becomes dominated by measures to ‘cheatproof’ the assessment over all other aspects.
  • There is a tendency to tolerate bad teaching and assessment as long as ‘position’ remains rigorous, i.e., it ensures ‘more smart’ students stay near the top and ‘less smart’ students stay near the bottom, as defined by statistics applied to existing single-figure metrics (i.e. WAM/GPA differentials).
  • There is a tendency to view courses where everyone succeeds or the gap between top and bottom is small, with immediate deep suspicion on rigor and then question whether it is just a lazy lecturer offering ‘an easy ride’ rather than a dedicated lecturer providing effective course-wide teaching.

From the student perspective:

  • WAM/GPA becomes the priority: I addressed this in my last post Markissism, so I’ll keep it short. Everything from decisions to take easy courses for higher marks rather than ‘push the envelope’ through to gaming the number using contract cheating services. As Campbell’s law says:

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” — Donald T. Campbell

  • Student experience plummets: The whole experience becomes about fighting to cope with the crushing weight of endless assessment, which amusingly can start as early as Week 1 of term, i.e., before a student has even learned anything yet. And because these tasks contribute directly to your grade & WAM/GPA, which you will carry with you on your ‘ticket to work’ for life, there is no room to not put ridiculous efforts against it or make any mistakes. Given making mistakes is learning, there is essentially no room for learning, your whole experience is about demonstrating performance.
  • Assessment ranges from adversarial to punitive: When the whole assessment structure is essentially summative, there’s no room to make mistakes and learn from them or to experiment because those will all be black marks on the fine-grained measures that determine your future (some like WAM/GPA presented as though accurate to 3 decimal places!). Every assessment becomes an exercise in finding faults to set you in the pecking order of life. The elaborate designs to make assessments ‘cheatproof’ and ‘rigorous’ are not invisible to those who do them, and are seen as an assumption that all students are cheats or out to unfairly climb up some invisible ranking list.
  • Equity and fairness becomes extinct: When every week of every term is endless crushing assessment, it is pretty clear the system is pointless and stacked against you. Students with rich parents who can support them to devote full time & effort to study are obviously at a massive advantage to students who need to work to cover living costs (rent/food). And that’s before we consider any educational disadvantage from public vs private school systems and access to private tutoring services, which is stacked in the favour of students from rich families by virtue of 1st & 2nd year often counting equivalently to WAM/GPA as 3rd & 4th year.
  • Misconduct can become attractive: Most students in the right circumstances and with honour codes in place would be happy to not cheat. In their view, they come to university to be educated, to be exposed to new ideas, to have new experiences, to be challenged and expected to grow as people. Assessment is part of that, and when it’s done in the right way and for the right reasons, it is healthy.

    But, when you start to build up the pathologies above, e.g., crushing assessment that’s purely summative from day one, systems that appear adversarial and which demonstrate an implicit lack of trust, charging fees that result in significant debt, treating education as a profitable business, then you build an ideal environment for the temptation to cheat. And there are plenty of unscrupulous operators out there who will happily take advantage of this for financial gain.

Pointing out problems is all well and good, but it doesn’t achieve a lot unless you pitch some solutions, so let’s get get straight into it. Here are my top ten actions I think any university should be taking in this space as a matter of urgency.

  1. Accept change is needed & start the process: The system is obviously broken, it is unfair, it produces poor outcomes and it is currently being picked apart by contract cheating organisations and poor assessment design (i.e., tasks that are easily gamed via past assessments posted to the internet, e.g., coursehero, chegg). Rome wasn’t built in a day, and neither is a university’s assessment system. There are hundreds of moving parts that extend from governance documents and approvals by university administrative structures through to properly teaching academics how to design strong assessment that measures fairly, broadly and in a way that lifts the performance of all students during their time at university. The discussion and consultation processes need to start now, and be driven well by strong organisational leadership at the top level. There is also a massive cultural shift required as part of this, and cultural shifts are slow and also require very strong organisational leadership.

  2. WAM/GPA should be consigned to the dustbin of history: If the reasons in my last post weren’t sufficient, let me add a personal anecdote. One of the amusing aspects of my career is that my most cited paper was a side-project — we basically took the fractal analysis software I was using for work on quantum devices and deployed it on artwork by the American abstract expressionist painter Jackson Pollock. I grew to have a love-hate relationship with the project ultimately. The analysis was a fun challenge but the idea of taking paintings with so much in them and reducing them to little more than a single number (fractal dimension) felt hollow to me. We would talk about Pollock trying to increase the chaos with time, using that single number to justify it, but then I would look to the actual works, and would see so much more going on in them that was entirely hidden by that number and the reductionist analysis that generated it.

    As an educator, I see our current obsession with WAM/GPA in much the same way. We take students with different skills, different strengths and weaknesses, different personalities and we try to reduce them down to a single number, one given to a number of decimal places entirely incommensurate with the associated certainty of measurement, and unethically so. And as educational organisations, we do this simply so that lazy employers can excuse themselves from treating people on their full spectrum of ability in their employee selection processes (including the universities when they recruit to Ph.D. programs and offer scholarships for them). We do this under the foolish notion of ‘meritocracy’, when in reality, all we largely do is entrench and justify pre-existing inequality. It is our choice to do that, of course, because we could also choose not to; there was a time before WAM/GPA existed and there could always be a time after WAM/GPA exist too. We just have to be brave enough to admit that this measure is flawed and no longer fit for purpose, and replace it.

    So what do we replace it with?

  3. See students as a spectrum not a number: Most modern universities have a list of ‘Graduate Attributes‘ or similar. They are essentially the highest level of learning goals that underpin the learning goals of the various degree programs and subjects that fall underneath them. Like any good set of goals they should be ‘SMART’, and what I mean by this is Specific, Measurable, Attainable, Relevant and Time-bound. If that’s true, then we should be able to rate graduates on their attainment of these attributes at completion of their degree. After all, we claim to do this for the learning goals of the individual courses that make up that degree — if it’s good enough for the parts, then it should be good enough for the whole. An interesting way to do this (hat tip to Kane Murdoch & Garth Pearce for this idea), would be with a spider plot. Any home brewer/winemaker would be well aware of them, but for those who aren’t, I’ve included an example below.

    One could then meaningfully assess to what extent a student has mastered the various attributes, and in particular, which of them they are strongest and weakest in. It would encourage students to find and build their strengths and work on enhancing their weaknesses. It would also increase the likelihood that they end up in jobs following their studies that are a good fit for them and where they are a good fit for the organisation. Having this influence assessment design at program and course level would also improve the training and experience for students through their degree.

  4. Test them properly at graduation: The most important measurement is the state of the student at the point of graduation. This is notionally where all of the student’s training and studies are complete in a specific degree and you can fairly and sensibly expect them to have the full set of graduate attributes and knowledge for the degree program they enrolled in.

    The ‘graduation exam’ should be a rigorous and separate process from the standard ‘end of course’ exams in any usual year of course including the final year. It should seek to test the full set of graduate attributes and also look broadly at technical knowledge, both from the perspective of the broad norms of a graduate in the field and their chosen specialization in the degree program. It might have a written component and/or a short presentation, but the ultimate would be for it to include a substantive panel oral exam (a.k.a. ‘exit viva’). There is no better way to work out whether someone really gets a subject properly and has certain key graduate attributes than to spend 2 hours in a room with them talking about that subject and making them do tasks associated with it. It is a key reason why the world’s best Ph.D. exams are all oral exams with a panel and opponent (yes, I think Aussie Ph.D. exams are rubbish).

    Such a ‘graduation exam’, although being an ordeal at the time as all such things are, would also be a major achievement that adds to the sense of value in graduating from a degree in a way that just a last set of written exams and the grades in the mail a few weeks later doesn’t.

  5. Make course final exams about progression: Items 2-4 above mean that the final exam for a course can return to its original focus, which is simply to determine whether a student has adequate competence in the course and is ready to move on. This might seem like a small change, but it is rather significant when you think about exams more broadly than the university & upper high school environment.

    If I think about courses that I’ve taken outside the university or high school environment, they have all been significantly different in four ways: 1. A comprehensive set of learning goals was presented right down at lecture level (not just 4 token goals for a whole term course), 2. Educational aspects were squarely focused on and driven by those goals in an obviously clear way, 3. The final assessment was always of the learning goals, i.e., fair, and 4. the pass grade was often very high, e.g., 80% or above, or competence within 3 attempts. Having a class where almost everyone passed was an expectation and a demonstration of a competent educator. Having a class where the mean was 50% or 60% and only a handful got above 80% was a sign that the educator was ineffective, the course was poorly designed, or both.

    Somehow both in universities, and probably by osmosis over the years, the upper end of high school, we ended up with an entanglement between assessment for demonstrating competence and assessment as an ‘intelligence test’, which really is just an educational privilege test for the most part. I’ve always found it strange that we’ve designed and tolerated a system where the final test of competence sees getting half of the demonstrated performance wrong as an acceptable threshold for progression — if you can only do half of what’s in the course, then how the hell are you ready to move on? It is even more bizarre that someone demonstrating 95% or 100% competence with the material is considered ‘rare’ and ‘the sign of an exceptional student’. Why on earth do people as intelligent as academics supposedly are, think that this is a sensible, effective and fair way to teach and assess?

    Every course at university should have clear expectations (learning goals) not just at the course level but at the individual class level, and the final assessment should be demonstrably connected to proving mastery of those goals. They should not be intelligence tests aimed at determining who is better than who — leave that to entrance exams and aptitude tests for whatever they do after their studies — universities are there to train not be a meritocratic sorting machine. We should expect the passing grade to be high in courses, and if the cohort isn’t making that high grade, we should be questioning the course design or effectiveness of the educator. Some academics will claim this just means courses will be made ‘soft’ (i.e., easy), but anyone who has trained or taught in a serious course outside a university will know that’s absolutely not true. And the best university educators know that’s untrue as well. It’s really just the excuse that the mediocre use to justify poor learning outcomes and avoid having to teach to a higher standard.

  6. Throw numerical course marks in the bin too: In many universities the course marks are given on a hundred point scale to integer accuracy. It’s the same issue with WAM/GPA given to 3 decimal places, do we really know performance in an individual course to that level of accuracy? Imagine a report with two or even three markers, the standard deviation on that average has to be much greater than 1. Imagine an exam, there’s a finite number of questions, Student A might be lucky and got a question they liked and Student B wasn’t and got a question they didn’t. But, let them sit another years’ exam, and it might be Student A that doesn’t like the question and Student B getting the luck. There is no way that a 1 mark difference is meaningful and therefore fair.

    Courses should only carry final grades, and although numbers ultimately have to be used to determine that grade, there’s a long stretch between using a number and presenting it, which implies that it’s known at least as accurately as the presented significant figures it is specified with. Grades should be representative of how well you can know performance, including being able to tease it out from educational privilege in early years of the program. For example, grades in first year could be Fail/Pass/Superior scaling upwards to a more banded structure in higher years, e.g., Fail/Pass/Credit/Distinction/High Distinction, where the cohort has been on a (more) level playing field for several years than they might be at intake. But just having Fail/Pass/Superior right through the program would be arguably sufficient and possibly more fair.

  7. Put sensible measures/discussion of uncertainty in assessment on transcripts: Eliminating WAM/GPA and hundred-point grading lessens the need for this, but probably doesn’t eliminate it entirely. Any assessment carries uncertainties. Students have good days and bad days, and even good-bad days — one of my best exam performances was a hungover Monday morning after a large weekend. Some students thrive on certain types of assessment, e.g., me on written exams. Others are possibly equivalently good at the actual stuff in the course, but just don’t perform well for certain assessment types, and so their grades ‘under-represent’ them.

    The important thing here is that universities have an ethical responsibility to properly represent how certain they are of the performance they measure and put on a transcript that follows a student around for the rest of their life. Putting courses to a grading accuracy of 1% and WAM/GPA as a ‘single KPI’ with an accuracy down to 0.001% is clearly ridiculous given what we know about how those numbers are derived. And it should be considered anything on the range from unethical to clearly fraudulent given a transcript is an official document.

    This aspect is important because the misrepresentation of grading accuracy drives severe perverse incentives from a student perspective that range from absurd focus on minutiae to misconduct, all of which are destructive to learning.

  8. Make well-designed formative assessment compulsory: The benefits of formative assessment in higher education are well known, but despite that, its actual use in higher education is far from as wide-spread as many would suspect. Our constant obsession with performance rating means that the students are instead merely subjected to what are actually just summative assessments that are claimed to be formative in intent. A common example is a weekly quiz where, one is given unlimited attempts, but the final score of right vs wrong answers contributes to the marks — ultimately, your whole aim here is performance measurement, it is really just summative. The students pick up on this, and it just adds to the summative assessment load pressure across the term — they no longer feel able to make the mistakes that enable them to learn because every mark is sacred. The focus shifts from learning to earning, and the benefit is gone.

    A challenge with formative assessment is providing an incentive to actually do it. The easy solution here is to simply award some small component of the course marks to these tasks in a way that is not tied to performance (right vs wrong) but meeting a demonstrated threshold for a sensible attempt (engagement). This provides the space for students to be wrong without penalty, or even admit to just not knowing, which is useful for an educator to know if it’s a decent portion of the class. It also enables questions that are designed to properly advance learning on key or difficult aspects rather than with summative intent, including some with ambiguous answers deliberately aimed at forcing errors or questions with no single correct answer. The subtle question design shift here can substantially enhance effectiveness of the learning experience for students by forcing them to confront head-on the ambiguities that arise in learning new material.

    However, our obsession with performance-based marking in universities means that formative assessment with marks as incentive is almost impossible without breaking the rules. Using my own institution as an example (since I know the policy, it’s publicly available, and probably similar to many places), the assessment design procedure states pretty clearly that “…the overall course result will be calculated from the marks of all summative assessment tasks.“, “Participation in an assessment task in itself is insufficient grounds for awarding marks or grades.“, “Assessment marks will not be used to reward or penalise student behaviours that do not demonstrate student achievement…“. The word ‘formative’ appears four times in the relevant document, summative double that, and twice in the definitions and acronyms table. And while formative assessment is clearly put as an option, any reading of the policy pretty clearly makes a solid implementation of it impossible. If it carries no marks, why would a student do it, let alone spend a useful amount of time on it, especially when they are under an otherwise crushing summative load across multiple courses? I know what my priority would be under such circumstances.

    The obvious solution here is to have mandated formative-to-summative assessment ratios with clear guidelines on how marks can/cannot be allocated in each category (even if it’s just ‘up to’ caps). In the interests of learning, the ratio should probably be as high as 50:50 in Years 1 and 2, tailing back in higher years, e.g., 30:70 in Year 3 and 15:85 in Year 4, for example. And if there’s concerns that this gives ‘an easy ride’, it is easily solved by making the final exam a hurdle task, i.e., it must be passed in its own right to pass the course, with the final grade being made up of the overall course components to enable performance levels to be gauged.

  9. Stop showing fails and withdrawals on transcripts: Students fail courses for a variety of reasons, many that are outside their control including the fact that the course, the lecturer, or both, are poor. Withdrawals are similar, and recently, often driven by the student’s perception at census date on whether the course will help or hinder increases in their WAM/GPA.

    University is supposed to be an opportunity for finding your place in a particular study area or even just as a person. Sometimes you take a subject with the best of intentions and find that it ‘just isn’t you’. Why should that be held against you for the rest of your life if you withdraw or fail? Likewise, university should encourage you to push yourself. It is the pinnacle of the educational experience, students should want to go after challenging courses. Why should taking on a hard course and failing be seen as a bad thing? There’s still a lot learned along the way, even if you can’t claim at the end to be fully competent in that particular course.

    Transcripts should be about what a student has achieved in competence across their studies, not about tarring them for life with some fails or withdrawals.

  10. Use all of the above to get the culture right: Actions in any organisation are driven by culture, which to some extent is set by the way that environment operates. If you measure performance incessantly and to the point where a person’s whole worth is driven by a single performance metric, then you create severe perverse incentives that drive pathological behaviour where performance becomes the myopic focus of all efforts, even if that is not your core organisational mission (and I’d argue it is not for any university).

    If you want students to focus on learning, you need to create an environment that operates in a way that incentivises learning. Some academics will claim here ‘but that’s what we’re doing, we’re using performance measurement to incentivise learning’. Are you? Really?

    Take a close look at the system, and tell me we aren’t at the point of Goodhart’s law, namely “When a measure becomes a target, it ceases to be a good measure.” What we originally designed as a system for measuring learning has become a target to such an extent that not only is it no longer a good measure, but for the vast majority of students, it is producing sub-optimal outcomes. And if it wasn’t, we wouldn’t have to make the pass mark only 50 to get people through degrees!

    Deciding that endless summative assessment is good for learning in the modern era is no different to deciding that ‘the beatings will continue until morale improves’ as a strategy for managing a workforce. It will only be through a massive change in our approach to assessment in higher education that we will manage to put the ship back on an even keel and create a culture where students are encouraged to learn, enjoy the experience of doing so and feel positive about themselves in the process.

It is probably good to finish with looping back around to ‘why’. Why would you go to the trouble of doing all the things above? There are several reasons:

  • It’s more fair: Would you decide on the quality of a cake based on the taste of a finger-dip before the dough is even mixed? Of course not, you judge it on the final product. In much the same way is it fair to judge the final learning outcomes in the middle of a course or even in the first few weeks? Of course not. All you do with summative assessment during a course is punish students for not yet mastering things they’re supposed to be working on mastering for some point in the future, which is just destructive to learning.

  • Assessment is more reliable: You still measure students, but you measure them in a more sensible and reliable way at a point where you can fairly expect outcomes. Essentially you stop Goodhart’s law from becoming quite so pathological to your aim of measuring learning.

  • You spend less time on warfare: If all your courses are is endless assessment under enormous pressure, of course cheating is going to be an option. Designing assessments that are uncheatable for a few strategic endpoints is easy and sustainable, the sector has done it for decades. And if you get the culture right, the incentive to cheat is low anyway. However, if you want to assess like mad, all the time, and the culture is such that outcomes are everything, then you should expect rampant cheating and you should expect to be in an endless arms race that sees both sides devoting ever more effort to outsmarting the other. Higher education will run out of resources before the black market does, trust me, the black market always wins (just look at the ‘war on drugs’ if you don’t believe me).

  • Student experience will rise: Yes, we can have our cake and eat it to… we can still measure student performance, but we do that in such a way that it doesn’t get in the way of the learning process, which should be fun and engaging and not oppressive or punitive. If students feel that they’re being given completely fair opportunity to learn, including with assessments where they are encouraged, and even given course marks to deliberately to engage, fail and learn from failure, with the expectation that at the very end they will be fairly and rigorously measured on what they’ve learned, you’ll find the student experience will rise. It’s what they’re paying for, not funky cafes and outdoor foosball tables, and if you give it to them, they will be happy.

  • Staff will be happier too: There’s nothing like having an inspiring mission, which is to help people through that same learning experience you’ve had yourself. Crushing assessments inflict just as much pain on the instigators as on the victims. After all, the endless assessment has to be written and it has to be marked. And as the cheating arms race spirals forward, such that the time required for ‘countermeasures’ escalates, one can only expect staff to have less time, less interest and less engagement in education as a result.

  • The staff-student interactions are better: Getting the rapport you really want with students to teach well is very hard when you’re always having to assess them. This is because the ‘staff persona’ has to be different between teacher and assessor. In the former, you are there to help, in the latter, you are there to impartially judge. If you are always having to judge, then it’s very hard to build the sort of relationship that enables you to teach well. Switching between these personas takes time. Separating learning from assessment gives that space — one can build that teaching relationship with students during the term, and then switch to impartial judge during the study break to determine performance at the end. Even better would be to have someone else play the final judge on each course, in addition to improving staff-student relationships, it would also reduce the built-in biases in the system.

Ultimately, it doesn’t matter whether you’re in a higher education system where you charge high fees, i.e., the business of monetising intellectual respectability, or one that is entirely a public service, i.e., fully government funded education system for public good, getting the assessment right is a no brainer. So I’ll end with one last why question: Why the hell aren’t we doing it?

One thought on “Death by summative over-assessment — When are students supposed to learn?

  1. Pingback: WAM Booster Courses: Magic or Myth? | Fear and Loathing in Academia

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s