Consilium-Scientific

Nathan Cherney MBBS, FRACP, FRCP

Nathan Cherny: I'm going to give an overview regarding the background and the contents of the ESMO Magnitude of Clinical Benefits Scale (ESMO-MCBS), and also some of the many ways that it's being used and also some of the upcoming developments that we're doing with the scale. The picture is a new cancer centre, Shaare Zedek, which was open to 3 weeks ago, and it was designed by a Canadian architect.

Well here is the background of the problem. As everyone is aware, we've had a massive explosion of new therapeutic options in cancer with things, really erupting in the last 20 years with the development of different biological and immunotherapy options.

The major issue with these is the profound cost of these agents. In the dark green these are the drugs that cost between 200,000 to $500,000 a year. These (green) ones are more than a 100,000 to $200,000 a year, and that's 69% of the new agents, launched between 2017 and 2021 and this poses a major problem as far as: no health care system can afford all of these medications, and the economic Sustainability demands that we prioritize and make choices. This requires the ability to sort between a high benefit and a low Benefit treatments.

And this comes down to the issue of value. The value of any new treatment is determined by the magnitude of its clinical benefit balanced against its costs. What we know is, that the cost of procurement, and the out of pocket cost of consumers, they vary from country to country. However, the issue of how much benefit the drugs provide, this is a constant and this is why we have been focusing on this issue, because the other economic variable is one which each country is going to look at depending on their prevailing resources, and their insurance structure.

Value decision-making is important at an individual level, for patients to ask: “is it worth the effort, convenience, risk of adverse effect, and the cost of having to pay out of pocket for a drug?”, and probably for public health companies that have national insurance programmes, does the marginal improvement in the magnitude of benefit justify the cost of the drug and because there's going to be a need to prioritize and make decisions of which drugs to fund or not, particularly when there's a limited budget, but multiple new therapeutic options as we're confronted with each year now, with between 30 to 40 new agents new approvals coming on the Market each year.

When I started on this project 10 years ago in most guidelines they would make statements about the level of evidence, but the level of evidence would tell you if the evidence was based on a randomized Phase 3 study, or a meta-analysis, but it gave you no information as to what the magnitude of the clinical benefit was, and in fact at this stage there was no standard format for grading the magnitude of benefit. It may or may not have been incorporated into the text of these little guidelines, and it was often left to individual interpretation which was sometimes idiosyncratic and sometimes frankly biased.

In fact, Chris Booth and Ian (Tannock), highlighted this in a paper in JCO, in 2008, when they said that forcing the term “clinical benefit” is sometimes misused to describe the patients enrolled into a trial of either tumour shrinkage or stable disease so the situation was that there was really no standard measure and in fact many treatments offering small increments of survival or progression for survival were being touted as the new standard of care or breakthroughs. Now this has consequences. This sort of hype confounds public policy decision, it is harmful to the credibility of oncologists and oncology research reporting, and indirectly it could harm patients or cause patient and family stress, particularly when a Drug which is being super hyped, and then an HTA body decides not to cover it, and patients don't understand why they're not getting this drug, which everyone's saying is so wonderful, it’s a major breakthrough and this was really the trigger case that sparked me off on this, was with was bevacizumab for treatment of breast cancer.

Leonard Saltz, around the same time, wrote an editorial named “the hope, the hype, and the gap between reality and perception”, again highlighting this Gap and the Lack of having a standard format for describing real clinical benefit

I'll just say a few words about hype. Our optimism bias and overstatement. Optimism bias is a pervasive human tendency to evaluate circumstances more optimistically. In the oncological world there are multiple contributing factors, we want to promote the perception of professional efficacy, seduction of new technologies, personal Investment by researchers, or in commission stockholders, issues related to Reciprocity with pharma, physicians often have selected memory for best responders, they want to promote optimism and amongst patients, and sometimes they're just impressed by the journal, that the publication has been published in . The bevacizumab paper in breast cancer was a lead article in the new England journal of medicine. Overstatement is because when benefits are knowingly exaggerated, and this occurs commonly, when physicians focused on best outcomes, at the expense of really discussing the full range of potential outcomes, or likely outcomes, and also often the tendency when discussing a new therapy with patient, to trivialize or to minimize identified risks which are sometimes significant risks.

So we identified an evolving number of points of consensus. There was need to improve the value and cancer treatments; The evaluation of value is predicated on what Is the magnitude of benefit; that unlike purchasing price and out of pocket consumer price, the magnitude of clinical benefit has Cross-country relevance; that statements regarding magnitude a benefit are prone to bias and overstatement; And that there is a widespread perception of need for a standard valid approach to grade likely magnitude of clinical benefits.

So this is an expression of the motivation, and when I took this idea to ESMO, it took them 2 years before they agreed actually to take this project on because initially they were in fear of pushback from pharma but when they did they accepted these as the rationales, that there was a need for a real validated, unbiased approach to evaluation of magnitude of clinical benefit; that there was a need to generate a disciplined approach to data interpretation, to avoid all sorts of idiosyncratic readings of data and to promote the scientific integrity of ESMO as an organization and also of oncologists in general, by reducing bias in data interpretation and by reducing hype and developing a system, of evaluation which is robust with strict adherence to standards for “accountability for reasonableness” and I'll come back to this term. And finally, we wanted to, in terms of public health, provide a reliable and fair evaluation of benefit to assist in resource allocation decisions.

This is clearly going to be a public policy tool and public policy tools need to meet certain standards, because decisions regarding prioritization and resource allocations are a matter of substantial social consequence and thus there is an ethical responsibility to adhere to issues of justice, reasonableness, and having a due process to ensure inherence to the 2 prior principles.

The standard for the development of public policy tools that we have here adhered to throughout the development process was described by the Bioethicist, Norman Daniels, from the Harvard School of Public Health, and this is called accountability for reasonableness. This is a process to ensure and maximize the likelihood the decisions or decisional tools are going to be reasonable and the 5 issues are that the tool needed to be: Clinically Relevant and reasonable criteria for priorities, it needed to be coherent, widely applicable, have a very robust statistical validity, and a transparent process of development with scope for peer review, appeal, and revision.

And as a value scale we were looking for issues of integrity and this is again, reasonableness freedom from bias, validity, transparency, particularly with information derived from clinical trials, we need to recognize the limits of precision of the data that we have, and to avoid making inaccurate claims of precision. Finally, there's always a balance in a scale between inclusiveness and discernment. You want to robustly credit treatments that have true big benefit, but you also want to be discerning and be able to identify those where the benefit is limited.

We've had a multi-level development project, and when we started out this project in late 2014, we started out working from drafts, from initial work that I've been doing and Professor Sobrero from Italy had also been doing, and we sort of pulled our knowledge together with a group of other clinicians with biostatisticians and researchers to develop a model, so we had a draft team of researchers oncologists biostatisticians and public health experts. We had consultations from patient representative organizations and numerous biostatisticians, we did detailed statistical modelling of the tool. Before the initial publication we had got through 13 different drafts which were field tested across 73 different studies and 13 disease types and these were repeated until we got to a situation which appeared reasonable to all involved in the development team, and then we had those scores reviewed by outside reviewers to see if they felt that the scores were reasonable and giving us feedback to make any final adjustments. And in fact this process it has been repeated multiple times. By repeating this process basically based on feedback and identifying shortcomings, we went back and redrafted and made amendments. There’s been 2 versions thus far of the Scale. The current one that's being used is version 1.1 and later on I'll discuss some of the things that are going to appear in version 2, which is in the process right now of field testing and Being peer-reviewed for reasonableness.

So in the scale, there are 6 underlying premises. The first premise is that cure takes precedence over the deferral of death. Secondly, that direct endpoints, such as overall survival and improved quality of life, take precedence over surrogates, such as PFS or response rate. Thirdly, that disease free survival or event-free survival in a non-curative disease is a more valid surrogate than progression-free survival is in a non-curative, disease setting. Fourthly, that the interpretation of evidence for benefit derived from surrogates may be influenced by secondary outcome data, particularly toxicity and quality of life. Fifth, that the tail of the curve data can sometimes indicate important gain for a minority of responders. And finally that the data from randomized clinical trials is more credible and more robust than data derived from single am studies.

Throughout the scale, you'll see reference to what we call the ESMO-MCBS “dual rule”, and that's to say that you'll see repeatedly, for many of the criteria we're looking at both the relative benefit, and absolute benefit together, so we're looking at hazard ratio, as well as how much absolute gain there is in progression free survival and overall survival. But when we look at relative benefit, we are looking at the lower limit of the 95% confidence interval. So in scoring PFS, we're looking at a lower limit which is less than 0.65 and for overall, survival, depending on the median survival of the control arm, that’s less than 12 months or more than 12 months, it varies slightly.

Initially when we published this we got a lot of push back on why aren't we using the point estimate. We're not using the point, estimate, because the point estimate is often conflated with the true hazard ratio, which is an unknown, which is likely to be within the 95% confidence interval range and this is what the point estimate is, it’s often conflated with true hazard ratio which is which is an unknown and so whenever we have speakers and someone says the hazard ratio we always make a point of correcting them and say that's not the hazard ratio, that's the point estimate of the hazard ratio, you got no idea exactly what the hazard ratio. The point estimate system dramatically underestimates true benefit in 50% of cases, and to give you a precise answer on scoring based on point estimates, which is what the ASCO did with the with their scale implies a degree of precision that does not truly exist. We did modelling of over 50,000 study scenarios looking to see what was going to be the best way to balance our criteria of inclusiveness and discernment and we found, that the best balance was when we used the low limit of the 95% confidence interval, but also the absolute benefit range as well and you'll see this repeatedly through the scale.

The scale has 5 forms. In the setting of curative therapies and this is mainly in adjuvant therapies, we use a form 1 which has the grades of A,B, and C, where A and B is considered to be substantial benefit. In the Setting, of non-curative treatments, we have a range of different forms, depending on what the primary endpoint is. If the primary endpoint is overall survival then we use form 2a which is a which ends up being on a 5 point scale with the highest score being a 5. If the primary endpoint is progression free survival, the scale is capped and the highest score you could get is a 4. If the primary endpoint Is neither of these but It's either response rate or quality of life or a non-inferiority study, then again the highest score you could get it is a 4 and finally we have a form3 for single arm studies and again, and that's also capitated, so the highest scores that one can get in the curative setting is an A and B and a score of 4 or 5 in a non-curative therapy is considered a high score. A score of 1 or 2 is considered a low benefit therapy.

I'm going to walk you through some of the forms. This is the form that we use in a curative setting, primarily with adjuvant therapies. So if with follow up there's more than a 5% in survival, that can cause a grade A. If there is no mature survival data and the best data we have is disease-free survival, If the hazard ratio is less 0.65, that can also score an A. At the lower level, if the level of improvement at 3 years is less than 3% or if the DFS is very small, or if the only evidence of benefit of improved adjutancy is improved pathologically complete remission rate, that scores a C, which is the lowest level on this scale. This is the only form where there is no penalty for toxicity, and that's because the feedback we got from patients was that patients with curative therapies were prepared to put up with a lot of toxicity as a as a trade-off for cure, but this this is unique to this form.

Form 2A is the form that we use in non-curative settings, with an overall survival endpoint, and we have 3 different versions of the form depending on the median survival of the control arm so there’s a version for median survival of less than 12 months, for 12 to 24 months, or for more than 24 months. And you can see in each one to get the highest score, you want a hazard ratio of less than 0.65, and in here with a medium survival of less than 3 months, a gain of more than 3 months. Alternatively, if there's an increase in 2 year survival of more than 10%: both of those criteria will get you the highest Score. If your hazard ratio is more than more than 0.7 or the gain is less than 1.5 months you get a 1, and there's radiations in between. And if you look at the different scales for the different prognostic groups you see the numbers vary. So in this group with a median survival of more than 24 months, to get the highest score you want to see, again a median survival of more than 9 months or an increase in 5 years survival of more than 10% so this is the prognostic waiting that is introduced into the scales. Now based on this element, you can get a preliminary score, which is going to be between 1 and 4. We ask does a secondary endpoint show quality of life improvement or other statistically significant less grade 3-4 toxicities. We're not looking for blood counts or alopecia, but we're looking for things like nausea and vomiting, fatigue, diarrhoea, and if there's either improvement in quality of life or less toxicity, there's an extra bonus point and so the maximum score one can get is a 5. In the rare instance that in a non-curative setting that there's a very long-term plateau in the tail, indicating that some patients may be cured, you can also give them a double score based on form 1 as well. I've only had one instance of that in the setting of metastatic melanoma.

Okay form 2B is used in a non-curative setting when the PFS is the endpoint. The top has been taken off the pyramid, because the highest score you can get is going to be a 4. That's because of issues in the lack of predictive reliability of PFS as an endpoint in terms of improved overall survival. Here again, for different prognostic groups, we have 2 different versions of the form. If the median PFS is less than 6 months, or the median PFS is more than 6 months, the Criteria for the Preliminary Score are based on hazard ratio and the Gain, but if the hazard ratio is more than 0.65 irrespective of the gain, then it's Scores only 1 and that's true for both prognostic groups.

So the preliminary score is between 1 and 3. We asked: “did this study have an early stopping rule based on interim analysis of survival?” and if a survival advantage was identified leading to early termination of the study, there's going to be a 1 point upgrade. Again we look at issues of quality of life, side effects, and then based on these issues, we can adjust. When overall survival has been a second endpoint and that shows improvement, that will prevail and scoring can be done using form 2A. When there is increased toxicity with the new drug then there's a downgrade of 1 point. if a Study has shown only improved PFS, mature overall survival shows no survival advantage, and if quality of life is looked at and also doesn't show improvement, this is a situation of failed surrogacy. That the PFS is neither converting to making people live longer or live better than that gets a downgrade of 1 point. If Quality of life is improved it’s a 1 point upgrade and if there is a long term tail in the curve with plateauing in the PFS curve, that can also get at 1 point upgrade. So the maximum score however is going to be a 4.

In non-inferiority studies, if non inferiority is established, that myself does not get credit, because it doesn't mean that there's no improvement in clinical benefit, but if non inferiority is associated with either reduced toxicity or improved quality of life then you can get actually a high score of 4. A response rate is a very weak surrogate for improved survival or quality of life and the highest score that one can get based on an improvement in response rate is only a 2.

Finally for single arm studies, essentially for orphan diseases or where there's high unmet need, the preliminary score is determined by both overall response rate and duration of response so to get the highest grade, you’re wanting to see an overall response rate of more than 60% or an overall response rate between 20 to 60%, but with a very long duration of response. When the overall response rate is lower and the duration of response is less than 6 months, you get fairly low grades, and that’s the preliminary grading between 1 and 3. Again, if you can show the quality of life has been improved, you can get an upgrade, if there's a lot of toxicity, you get a Downgrade, and if there's a confirmatory adequately sized phase 4 confirming these findings, that can also get an Upgrade. I apologize. It's a very rushed overview of the scale, and on our website we have a very detailed and animated teaching to teach how to use each of these forms.

The Magnitude of Clinical Benefit Scale has been successful beyond our Wildest dreams. It's been applied now to more than 300 studies. The 2 primary papers have been very widely cited. The tool is increasingly being used by HTA bodies around the world. It's been used in in many residency programs in their teaching. The WHO is now using it to assist in the development of the essential medicines list. Some people are using it to benchmark licensing decisions and research design. The patient advocacy groups like it and in industry, we have a respectful relationship.

This is just an overview of the 340 studies we've scored. You can see that we covered a very wide range of oncology over the last 10 years. This is an interesting breakdown of the scores. So what we're seeing is about 30% of studies are getting high scores. About 20% of the studies are getting very low scores, with a large number in the middle as well. In a adjuvant setting we've got a sensitivity problem and overwhelmingly based on DFS data, studies are scoring very highly and this is something which we're addressing in the next version of the scale. So in terms of what's new, as I mentioned the version 2 of the scale is currently in field testing and it incorporates new approaches to credit DFS as an intermediate endpoint, and that's to say that if overall survival data ultimately shows that there was no overall survival benefit, up until now scores would be 0 and the feedback we got from the patient community was the time without treatment or disease was valued, even if there was no overall benefit in terms of long-term survival and in the new system, if there's no long term survival, the score will be downgraded by 1 point. What we are going to be doing is annotating the adjuvant therapies for treatment toxicity. The patients did not want us to penalize them, but they did want the information and we will be annotating them for acute side effects and persistent side effects, like ongoing peripheral neuropathy. We will have developed the capacity to score single arm de-escalation studies in the curative setting. We are updating our criteria for toxicity penalties in non-curative treatments and we're going to be more stringent on giving credits to the tail of the curve because up until now you could get credit even if there were very few patients at the time of assessment. And now at the cut of time, you will only get credit if at least 20% of the patients on the control arm have reached that period

The other new thing that's coming along is that we're going to have a dedicated version of the MCBS to haematology. This is being developed in conjunction with the European haematology association. This has now finished all its process of field-testing and testing for reasonableness, and it's about to be submitted for publication and we fix this to be out probably towards the end of the year and the version we’ve had up until now has been validated for solid tumours. This will validate it also for haematological malignancies.

We have online animated tutorials, which take you through each of the forms as to how to use them with hands-on experience and examples, and these are a very clear, and very well done, and then they're an excellent teaching tool.

We were worried about a couple of things to do with terms of credibility. We worries about issues of bias. Issues related to exaggeration of benefit, underestimation of harms and we thought that we didn't want to be seen giving high scores to studies that were problematic, and so we wanted to have a structured approach to identifying studies that that had elements of bias. This is harmful for multiple reasons, it has the potential for scientific harm and loss of generalizability, issues of reputational harm for our scale due to loss of credibility, that bias causes direct harm because of misleading expectations and it can cause societal harm by inappropriate resource allocation for drugs, that don't actually do as much as they claim to.

To address this we've done to 2 projects. In one we have developed standards to evaluate how good quality of life papers are because there was a lot of concern that much of the quality of life research may not have been as robust as it should be. And we have developed a checklist to identify by some annotate bias in studies and put these on our score charts, where the problems are identified. For the quality of life we go to make sure that the papers have a clear hypothesis, including the chosen method, measure the time points to be assessed, the expected direction of change, they must have high compliance rates, and a clearly stated and valid approach for dealing with missing data. Results need to have a pre-specified methodology, and they will need to be both statistically significant and clinically significant with a pre-stated minimal clinical difference and they need to both show that there is statistical benefit but also that they've crossed the threshold for minimum clinical threshold.

For looking at bias we have a wonderful paper published by B. Gyawali and we've identified design issues, implementation issues, and analysis issues, that can all introduce bias into studies. I'm not going to detail these now because I suggested that I may do this as a separate talk for Consilium, but in each of the studies we look at, we're looking at potential sources of bias and when they exist, as they unfortunately commonly do, particularly with issues of crossover and with inadequate post-progression therapy, this is now an annotated on the on the scorecards.

As I mentioned, the scale is being widely used in a in a range of different uses. I'm going to illustrate some of them. The scores for the 340 studies we've scored are all fully online as a as a free resource. This is what this scorecard looks like it gives the details of the study, the setting, the control arm, the key data, the score, and the reference, and when there are new subsequent publications, such as full update of long-term survival or quality of life papers, the scores will be updated as new data evolves.

The scale is being used widely in a setting of HTA and we've been using it in Israel over the last 5 years, and we published on our experiences in an editorial last year in ESMO open, and this has really made the whole process of the Israeli HTA much more transparent.

The ESMO-MCBS is incorporated into all the ESMO guidelines and in the flow charts for any recommendation, It shows you what the ESMO-MCBS scores are so you can see in this setting, the recommendations are all scoring very highly.

This is the executive summary of the last update to the WHO essential medicines list, and essentially the threshold for drug to become a candidate to go into the list is that they need to be scoring highly on the ESMO-MCBS screen. An A or B in a curative setting, or 4 or 5 in the non-curative setting and without meeting those criteria it's not even a candidate for discussion.

We were engaged by Kazakhstan to help them review their national cancer drug formulary and national protocols using ESMO-MCBS and using the ESMO guidelines and this project really did a total turnaround for their national oncology cancer plan, and not only was it able to improve outcomes, but also ultimately reduced overall expenditures by taking out a lot of wasteful agents.

This is a scattering of the countries, a map of some of the countries in the world, that have incorporated the ESMO-MCBS as part of their HTA processes: Brazil, Australia, Spain, Portugal, Austria, and Singapore, and the Chinese government are now in the process of translating the ESMO-MCBS to use there in terms of their public policy decision making.

So I thought I would conclude with a couple of general comments. Firstly, that no value scale is going to be perfect and will be perfect to generate perfect results in all circumstances, but in order to be widely effective, a scale needs to have specific characteristics of integrity, including reasonableness, freedom from bias, the validity, transparency, and a responsive process to valid criticism. A process which involves rigid adherence to accountability for reasonableness, and then which recognizes the limits of precision, and which awards inaccurate claims of precision. This is why the scale is only 5 points, and not 10 points, or 100 points scale. It needs to be inclusive to robustly credit true big benefits and have discernment to avoid over crediting small benefits.

The scale has been impactful beyond anything that we initially imagined and it's clear now that they are very useful for HTA processes: for the development of guidelines and education, and they are impacting positively on organizational culture and it’s really fascinating to see the impact that it has had on the whole scientific culture within ESMO. I think it's been excellent for ESMOs organizational integrity. We're working with the London School of Economics, who have looked at the HTA processes of a number of countries looking at the different variables that influence a positive decision, and by far the number one factor that influenced decision-making, was what the ESMO-MCBS Score was. We are increasingly being consulted by industry, and by researchers who are looking at what sort of things get credits to help build better study designs and we are trying to encourage medical journalists to look at this data as well, to improve the quality of medical journalism, to turn down a lot of the hype that we currently see with publications regarding new treatments, some of which really have only minor or marginal benefits

The ESMO website is rich in resources about the ESMO-MCBS in terms of teaching material or all of the papers, the key papers or forms are all freely online as are the scorecards, and when people have queries either about using it or would like to discuss a score, our whole development system is based on the feedback that we get from end users, be they clinicians, be they researchers, be they industry, or be they patient groups, and this is how we move this project forward.

QUESTIONS AND FEEDBACK

Leeza Osipenko: So I have tons of questions so before I completely hijack on the floor, does anyone have questions for Nathan?

Andrew Dillon: That was just so impressive. I mean the objectivity, and the thought that's gone into it, and the systematic approach to defining incremental therapeutic benefit is just really very impressive. I was just curious and I'm sorry if this did come up in your presentation, whether you'd got an assessment to the consistency of scoring. If the tools are made available to multiple independent groups, what are you finding in terms of the consistency of their conclusions about individual interventions?

Nathan Cherny: There's a learning curve in using the scale, and we did initially find that when inexperienced users were using it, they were sometimes generating incorrect scores, and this is why we publish our scores online, so there is a reference score. So one of the things that we do in our own department is when my trainees present in a journal, they always need to finish the presentation in going through a study analysing what the MCBS score is and so this is a way to teach them how to read and interpret data, and it gives them, ultimately, a structured way to help analyse to get perspective of how much benefit this new treatment is bringing. We wanted to have a tool that anyone can pick up and use but we do publish official scores and occasionally someone will pick up on something that we have not and give us feedback and we will change our scores. But our score go through a three-stage vetting before they are published. They are vetted by me, by a second clinical reviewer, and by a biostatistical team before they get published.

Andrew Dillon: I absolutely get the fact that if you’re using the tool for the first few times, that obviously you are still learning about it and then eventually a single user will become more consistent in their own approach, so I was just curious about for as it were maybe mature users, are they broadly reaching the same score?

Nathan Cherny: Absolutely, very much so. The other thing is we only score published peer reviewed data, so abstracts don’t get scored and unpublished data does not get scored. So we sometimes have the unfortunate scenario where a drug is approved by the EMA or the FDA before the papers have been published, and we will not issue our score until the peer reviewed paper is in the realm

Leeza Osipenko: What was the reaction of the American Colleagues at ASCO and just overall in the US. How is this used?

Nathan Cherny: Because the United States has no HTA process, It is unfortunately less relevant there. ASCO, as you know, developed their own value scale, the value framework, and we did a joint project with them and essentially what came out of that was that there were multiple statistical floors in their system and at this stage, although it is still available online they’re not proceeding in developing it further. The publication looking at the 2 scales came out in JCO about 2 years ago, and so I think that in terms of robustness, validity, unfortunately they published it without doing any statistical modelling, and without doing field testing for the reasonableness of the scores that were generated. So this comes back to this process that we've used of accountability for reasonableness, with very careful field testing and testing for reasonableness before we go to publication.

Leeza Osipenko: Following up on Andrews question for experienced users, how long does it take to grade 1 study.

Nathan Cherny: Probably less than half an hour.

Leeza Osipenko: Okay, very efficient. And another question is you said that it was tested through 344 trials but then you gave examples of Kazakhstan or potential HTA agencies. If somebody does it outside your framework, how does this feed in? And how does this cross compare?

Nathan Cherny: No, so they asked us to come in as consultants, and they wanted us to review their national formulary for oncology. They also had national protocols, and to review their protocols, and we found that in their formulary they had many drugs which were low value drugs or anachronistic drugs and many of the protocols they were using were not standard of care and they were very open to reviewing both the national formulary and reviewing their national protocols to be based upon the ESMO-MCBS scores and also ESMO guidelines together.

Leeza Osipenko: Are those studies that you did for them as a consultant, are they part of those 344, that you stated for testing the tool

Nathan Cherny: The experience with Kazakhstan was published in ESMO open, you know the whole review process we did with them, but the database that we're using are essentially every major trial which is led to the licensing of a new indication over the last 10 years is covered in the 340 studies that we've looked at.

Ian Tannock: Well obviously what you would really like to see happen is A: for the registration agencies to start registering new drugs, not on the basis of some p-value, but on the basis of a good score on scale, and you'd like to see prominent journals, you know including journals, probably the worst one is the New England which will publish anything, where the p-value is point 0.049, rather than point 0.051. To what extent do you think, going forward, you could have enough influence, perhaps first with the EMA and with some of the prominent journals to make those things come about?

Nathan Cherny: I mean the charter of the EMA or the FDA is to license drugs that are effective and safe. And unfortunately they are not being charted to demand that a new drug be better than what's available, and anything which is either equivalent or even marginally better according to current charter for both of them, they need to say that this is a marketable agent and unless their charter is changed they they're going to continue to do that irrespective of what the thresholds are and thus the burden in shifts to HTA bodies. Now America just doesn't have an HTA process at all and they've got this crazy situation where the mandate is that Medicare must provide any FDA licensed agent without price negotiation, and then this is a role on effect to the rest of the world. This absolute suspension of market forces in the country that is the most free market country in the world is causing a major economic aberration to the whole drug market. But unfortunately I don't really have a lot of expectation that it's going to affect the FDA or EMA, but it has been very impactful at the level of HTA bodies, and we learned a couple of really interesting things recently, from our work with the London School of Economics. For instance, drugs that get an accelerated approval, their approval by HTA bodies is likely to be slower than drugs that should have a regular approval and not unsurprisingly, they're less likely to be approved as well. We also know the drugs that score highly on the MCBS are more likely to get more rapid approval, but having a high MCBS score, but if the price is too high, it can still be poor value and so the cost issues are still a still a separate challenge that through this tool alone, we done have. We don't have a way of mitigating.

Andrew Dillon: The first question is just in relation to engagement with HTA agencies to influence their decision-making, I wondered if you'd had any interaction with the European Commission as they move towards introducing this unified clinical assessment. And secondly, so a completely different question is do you think this tool has application across other disease areas.

Nathan Cherny: So we are consultants EUnetHTA, and we consult to the EMA, and in fact we've just been working on feedback on a new policy statement, by the EMA, looking at why they use surrogate outcomes. Our tool has been validated so far only in solid tumour oncology. In the coming months it'll be available also for malignant haematology, which is also a lot of agents, and a lot of expensive agents, and I think that's going to be very useful there as well. But I think that the concept, and we are starting to work with radiotherapists now to develop a version of the scale for new radio therapeutic interventions. I think that the particular structure of the Scale is based around the endpoints used in oncology and the endpoints used in other diseases. They are often going to be disease-idiosyncratic and disease-specific but I think that the concept is one which is transportable to other diseases, but I wouldn't do it personally, because you need to be adequately expert in the area that you're evaluating.

John Hickman: I was just looking at something. It's rather old than I thought, it's from Aaron Kesselheim, the most transformative drugs in the past 25 Years, a Survey of Physicians. It's just a comment really, it’s interesting to go back and look at that which is a very subjective view of the value of drugs. There's a table there with let me just stop on what there is for oncology. It's not a surprise. Imatinib obviously, but it might be worth you having a look at that paper for it's rather old now, but to come back to Andrews point: to look at some of the other drugs as well. How things have changed, now that you've got this scoring system.

Nathan Cherny: Thank you for putting that out and I would really like to look at it. Last year or the year before there were a group researchers that published the Desert Island Project, asking oncologists if they were stuck on a desert Island what would be the 20 drugs they would want available for patients and interestingly most of them were old drugs but of the newer drugs, they were Herceptin, Imatinib, and these were all drugs that scored very highly,

John Hickman: So there was a good concurrence between the 2?

Nathan Cherny: Absolute.

Andrew Dillon: It's actually an observation because Nathan, you said that one thing that is happening over time are people are returning to their original score as new information becomes available about the drugs that they first looked at. That is going to be interesting because one of the big challenges from industry has always been that you can't really make solid conclusive decisions about our drugs when they first get introduced because there just isn't enough information and you're too sceptical and all will be revealed if only you pay huge sums of money for the next 5 years. My impression is actually the initial decisions that are taken in very considered ways using the sort of tool that you've developed, turns out to be remarkably robust over time when you come back to look at it.

Nathan Cherny: Yes a small number of score go, either up or down over time. Most stay the same. But you know companies do not like it when their scores go down. A score was given of 3 because they overall survival data was not mature, but once that mature overall survival data came out showing we had improved PFS, no gain and no improvement in quality of life, no gain in overall survival, it went down to a 2 which was really where it belonged but no one was pushing back on us for that.

Which medicines should be used in cancer care?

Search