[This article was originally published by wonkHE in early April 2020 when it became clear that the pandemic meant there would be no exams. We went on to develop the approach outlined in note and ran it successfully for universities to support confirmation.]
University undergraduate recruitment in 2020 faces a set of unusual circumstances which are going to necessitate doing things differently. One of these circumstances is that exam-awarded grades, the cornerstone of admissions, won’t exist. Recruiting on some form of predicted grade is inevitable.
This is, as a one-off, not as bad as it sounds. Fears springing from the poor reputation of teacher predictions are unnecessary and universities have some good options.
The admissions problem
The obvious reason not to worry is that the awarding bodies will be calculating grades as a direct replacement for exam awarded grades. Full technical details are yet to be worked out, but the guidance describes a special exercise of teacher-led predictions with some statistical adjustments aiming to eliminate any aggregate school by subject bias.
The object here is to award a grade-of-record to all candidates, regardless of whether they are applying to university or not. And to generate a grade distribution overall, within subjects, and within schools, consistent with previous years (unless the prior attainment distributions have changed materially). Having an officially approved route to take these grades as if they were exam-awarded and proceed as normal is helpfully underwrites admissions for universities.
But even if this support wasn’t there universities would not be helpless. They already hold the traditional subject-level teacher predicted grades through the UCAS process. They also have a lot of useful information about predicted grade profile at the applicant level, and further latent information from the various decisions that they and the applicant have made. All of these contain information about how well someone would have done if exams were held.
The task facing universities is somewhat different from that of the awarding bodies. Rather than calculating all examination grades at subject level for everyone, they need to make yes/no admission decisions for individuals. People whose potential suitability is expressed over a particular combination of exams and many other things.
Moreover, that selection is being made not from all exam candidates, but from a very specific subset: those who have applied to that particular university, have been offered at a particular level, and have then decided their personal risk/benefit equation means they should hold their only conditional firm option with that particular course at that a particular university. This different task, subset, and data setting opens up other pathways to admissions without exam-awarded grades. For some using this complementary perspective can be useful, both in triangulating admissions decisions and responding to the unexpected.
In either case we don’t think universities should be too worried about ending up with entirely different students if they recruit on predicted grades rather than exam-awarded grades. But this view does rest on some assumptions about what achieved and predicted grades are measuring and what admissions are really trying to recruit on.
P, Q and A
The analytical approach rests on an assumption that universities are ultimately recruiting on an underlying ability to respond and benefit from having a place with them. For shorthand we’ll call this quantity “Q”. Unfortunately, it can’t be observed directly.
Exam-awarded grades, “A”, are one imperfect measure of Q, subject to bias and random noise. The bias can come from, for example, differing levels of exam support and coaching. The randomness can come from what questions come up, who marks the script, a learner’s health on the day of the exam, and so on.
Suppose a student with fixed level of Q sat different A level papers, with different questions and different markers, on different days, in different emotional and health states. Our analytical assumption is they wouldn’t always get exactly the same grade. You see an echo of just one part of this uncertainty in the 20 per cent or so of grades that are challenged being changed.
So, A isn’t the same as Q. Universities are not seeking people with high exam awarded grades as an end in itself, rather it is a helpful and relatively uncontroversial way to get at an estimate of Q.
Predicted grades, “P”, can be seen as another measure of Q, also with bias and randomness. The UCAS predicted grades don’t act as a guess of what someone will get, more what they are capable of in favourable circumstances. This is why they are, on average, about two-thirds of a grade higher than exam-awarded grades. They are estimating the upper quartile or so of likely attainment, rather the mean.
Unfortunately, this property (and the natural randomness of both predicted and achieved grades) has given predicted grades a terrible press for accuracy. But a prediction of the 20th percentile of attainment, rather than the average, is easy to handle statistically and not a problem in itself.
In their favour as a measure of Q, predicted grades gain from being a more holistic assessment, less subject to the happenstance of an individual day, or which question comes up on the exam paper. But they will lose from being earlier in time than exams, and perhaps more subject to biases, in terms of the teacher’s overall experience and disposition. And, perhaps, their views of what indicates high potential amongst their students.
So, P clearly doesn’t equal Q either. But neither does A. The important part of the assumption is that there is no “golden ruler” to measure Q. And it follows that exam-awarded grades for individuals can’t be obtained with certainty from predicted grades (or prior attainment, or even Q if you had it). There is just too much inherent randomness across all the measures.
But both P and A do clearly hold useful information about Q. UCAS predicted grades would collapse under the weight of routinely determining university entry on their own. There aren’t enough checks and balances in the process. But, critically, the 2020 cycle UCAS predicted grades were estimated by teachers as normal. They should have the same properties as previous cycles and are the last unperturbed measure of the ability of this cohort. Although you could use predicted grades directly, it is less disruptive to employ them to build a statistical model of exam-awarded grades and use those outputs instead.
Modelling exam-awarded grades: a field guide
We’ve been working with some universities supporting them on establishing evidence for whether there are usable relationships between predicted and exam-awarded grades amongst their students. They’ve suggested we share what we’ve found works in case it is of use to colleagues across the sector contemplating the same.
Firstly, work with the data you already have and keep your approach simple. You only need data for your own provider. Get started with people who have held a CF and subset to a group with uniform qualifications (e.g. three or more predicted A levels). We’ve used grade-level models, but within the subset of best three predicted and achieved grades (matched by rank, not subject). These perform well with good properties at grade and applicant level, and give a route to handling mixtures of predicted and achieved grades.
In terms of how to structure the models, some things that worked well for us. The starting point are measures of the predicted points for each qualification, together with the wider context of the strength and profile of other qualifications predicted for the offer holder.
The other key factor group is good characterisation of the course area that the CF is held with. This will be capturing a number of important things, including criteria used for offers, the level of the conditional offer, the attractiveness of the university, and the mix of A level subjects that an applicant likely has. We’ve found doing this at a level that reflects academic admissions decision making, such as school or department level, works most effectively.
Build your model on several years of recent data but leave the 2019 cycle out. By not letting this year contribute to the model you can use the accuracy of the estimates for 2019 as a mimic of how well you could expect the model to do for your 2020 admissions.
A few other variables might look useful in terms of predictive power for candidates, but we would leave them out. They are mostly characteristics of the applicant or the school they have applied from. This is a complex area, not helped by no one knowing whether any unfairness it suggests springs from the predicted or exam-awarded side of the equation – recalling that each have biases and randomness baked in and neither is a perfect assessment of an applicant’s potential to succeed.
What we recommend is don’t start by putting these types of applicant characteristics in the model (which formalises you treating applicants differently by their characteristics). Instead build the model with the other factors as well as you can. Then check how the model does at predicting across these applicant groups. We haven’t found any material bias by these background characteristics so far. If you do, you’ll need to consider a few further stages. We imagine the exam board method will be alert to similar concerns.
Confirming applicants without exam-awarded grades
Once your model is up and running you can then switch it to predict for your CF holders in the 2020 cycle data giving estimated exam-awarded grades profile and total score for CF holders. How you proceed from here depends on how you run your admissions.
A robust way is to rank your CF holders within courses or departments by some score based on the estimated exam-awarded grades that replicates your admission criteria. Then, starting at the top of the list, mark as potential confirmations as far down the predicted grade profiles as you plan to recruit. Your admissions team would then review those in your normal way, including bringing in other information as appropriate. Ranking approaches have some robustness advantages here for statistical reasons. You can use absolute grade criteria too, but you might need an extra step or two to calibrate it correctly.
Generally, we’ve found using these types of methods for admissions has an efficiency of 90 per cent or so against recruitment outcomes driven by the (real) exam awarded grades. This will differ by the gradient of achieved grades you have within your CF population, and how important grades are to your admissions. The difference comes, as you would expect, from some higher predicted grade but lower exam-awarded grade combination applicants being placed, instead of lower predicted grade but higher exam-awarded grade combination students.
How much does this matter? It is hard to say. If you accept that both predicted grade and exam-awarded grades are imperfect measures of underlying ability, with a fair amount of random noise on top, then both recruited groups are probably broadly similar in underlying ability.
Some good options for universities
But questions about whether exam-awarded grades are better are not helpful for 2020. There will not be exam-awarded grades for this cohort. Only various forms of predictions are on the table. Since it seems both the exam board approach and what universities are in a position to do themselves will be based on teacher predictions, they may well end up similar in overall nature.
From what we know of the properties of the two approaches, it is possible to map out the broad pattern of where they are likely to be useful for university admissions.
The exam board calculation process will be strong for university admissions where between-school prediction biases are a major problem, all the students apply to university, school cohorts are large and consistent from year to year. They will also have a strong advantage if student performance becomes a lot clearer between the Autumn term (when many UCAS predicted grades are made) and when teaching stopped last month. And importantly the universal coverage means it can work for those who are not holding a CF or CI with you. If the process turns out to be controversial, it has the considerable advantage that it is primarily for others to explain and justify.
There is though the unavoidable novel nature of the process. In particular teachers have never been asked for these types of predictions at this stage before. So what will happen can’t be tested in advance.
For example, imagine what might happen if teachers were more likely to recommend a student to take a conditional firm if they believed that student could meet the conditions. Since the same (genuine) belief, right or wrong, will now effectively be awarding the grades it is possible that achieved grade profiles will more closely match the conditional offers than would be the case if the (independent and somewhat random) exams were determining the grades.
This type of effect wouldn’t necessarily change the distribution of grades by subjects within a school, just exactly who got exactly what combination of grades at applicant level. So, it wouldn’t necessarily be controlled out by the proposed process as we’ve understood it. It possibly could, but there will no doubt be effects from the novelty of the process that won’t be anticipated in advance.
Estimating achievement using the pattern of predicted grades at applicant level and the university/course/offer condition choices of the applicant could do well in different situations. It would probably be strong if course/offer choice gives a powerful signal of the applicant’s view of their own ability or indicates their ambition to get into a desirable course against a demanding offer. It would also do well if the context of information across a set of predictions is useful (if the A in predicted A*A*A is more secure than the one in predicted ACC). Both of these circumstances did seem true in the cases we looked at. And it will work better for larger universities.
This approach also has the advantage that no further information is needed. Some universities might feel it was in their applicants’ best interests to take one aspect of the uncertainty for the coming months away, whether they have a place or not, given them longer to think about whether they want to take it or an alternative up. This method allows this, most readily for those who can be shown to be almost certain to have gained entry in a “normal” year. Universities are much more intimately connected to the hopes and anxieties of offer holders than central government, and we think their judgements on how best to support applicants in this cycle should merit some weight accordingly.
So not having exam awarded grades need not be the problem it first appears, especially if you believe that even exams are a clouded mirror of what you really want to recruit on. The provision of calculated grades across all qualifications is going to underwrite admissions this cycle. But its different objectives, and its novelty, unavoidable introduce a bit more uncertainty than usual.
Universities already hold valuable data on how well their particular offer holders would probably have done under an exam-awarded system. Data which, at least in the cases we’ve looked at, can closely replicates a “normal” year. It wouldn’t work or be necessary in all circumstances. But for some it could give some early benchmarks, alternative perspectives and potential triangulation to aid preparations as the results come in.