Steven Salzberg, PhD, is a Professor of Medicine in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University.
GeneWatch: Are there any particular bioinformatics or computational challenges standing in the way of whole genome sequencing becoming widely adopted in clinical care?
Steven Salzberg: I don't think that we're that far away from having the technical ability to use sequencing in the clinic. I think we're somewhat further away from having knowledge of genetic variants that are actionable, that you can really do something about. That's really where the problem is.
Right now a lot of sequencing is focusing on just doing exons, or the exome, which is only about two percent of the genome. Even if you restrict your attention to the exome, you'll still find a very large number of variants, of sequence differences, in anybody that you sequence. Typically you'll get 50,000 to 100,000 differences in the exomes alone. Many of them are just private mutations that belong to that person and people closely related to that person which have no effect on health, nothing clinically relevant. So one challenge-and this is mostly computational-is winnowing the list down to a smaller number. We have computational ways of getting that list down to, say, less than 100 variants that are likely to have some biological effect.
But then you start to come up against the limitations on our knowledge. We don't know that much about how mutations will affect the function of a gene; and even when we do know that, at a molecular level, we don't know how those changes in function would affect the person's health. We're pretty far away from being able to say, "Oh, you have this mutation? Eat more spinach."
We need a lot more knowledge about how you link those small changes in someone's genetics to the way their body responds to the environment, to nutrition, or to a drug. There are a few variants we know about that make you more or less sensitive to certain drugs or certain infections, but we don't know that many of them.
So the sequencing technology is less key right now than the gene-finding technology?
The sequencing technology, because it's gotten so much better so quickly-it certainly was a barrier, a few years ago, but now we're at the point where it's feasible to do a good bit of sequencing for any person who needs health care, and it's not that expensive. I think we're not that far away from the day when we'll do sequencing routinely for people as part of their workup, their general physical. I think within ten years we'll see lots of sequencing done routinely.
We're working on it, but we need to get a lot more information. As a scientific community, we need to gather much more precise information-not about what the genes are, but what their functions are and how those functions translate into higher-level phenotypes.
When you talk about genome sequencing happening in the near future, are you talking about whole genome or exome sequencing, or something else?
Sort of both ... it depends on what the time scale is. Today you can sequence an exome for something like $1,200, but there's a lot of overhead that's present in exome sequencing that's not present in genome sequencing. Doing a whole genome might cost something like $4,000, but it's about fifty times as much DNA-instead of 2% of the genome, you get all of it. As the sequencing gets a little bit cheaper, it will cost about the same to do the whole genome as the whole exome, so you'll just do the whole genome at that point. That's probably only two or three years away.
At that point it becomes more of a computational problem. The actual costs of analysis are going to dominate the cost of sequencing. Storing and analyzing the data is going to cost more than capturing the data. Maybe people won't want to do whole genome sequencing because it's just so much data, because it's overwhelming-but it won't be because we can't do it.
You hear all this talk about the "thousand-dollar genome"-a couple of people even mention "the hundred-dollar genome"...
Still, whether it's a thousand dollars or a hundred dollars, is this the wrong way to be thinking about it? Are we focusing on the wrong thing when we fixate on the cost of sequencing?
The analysis is going to be the sticking point. It's going to be what's difficult to do. Right now the sequencing is difficult to do, and it's still out of reach for most of us, but it already looks like companies are emerging to try to provide some value added to your sequence. Direct-to-consumer genetic testing companies, for example, like 23andMe-they are not doing whole genome sequencing right now, they're doing SNP chips, so they're just interrogating your genome at a million locations. There's a lot they can tell you from that. Not very much of it is actually going to have an effect on your health, but at least it's interesting, and it's correct.
The more data we produce, the more we're going to see-I hope-entrepreneurs trying to figure out: How do we use that data to tell you something that's medically relevant and useful? But there's a lot of basic research yet to be done. We simply don't know that much about what most variants mean for you.
The analysis end of it is still a work in progress.
Right-but the good thing is that once you sequence your DNA, that's not going to change. You can use that forever, and as we learn new things, as new mutations are discovered and studied, you would be able to go back periodically and look up whether there's anything new that's relevant to your genome. We don't have any such service today, but I can see a point in the future where we would.
You mentioned that one of the big costs is the storage of data. How much space does it take on a hard drive to store a whole genome sequence right now?
The genome itself doesn't take up that much space. You can store all 3 billion base pairs in a gigabyte, which is not much these days. If you want to have all the reads, however, it's a much bigger dataset. Even if you compress it, you're looking at more like 100 to 200 gigabytes of data. That starts to be a problem. It's not easy to move around files, today, that are a couple hundred gigabytes. Networks don't have enough bandwidth.
Everybody has enough space on their own home computers to store their own genome; but if you're doing research and you're looking at hundreds of genomes, it's a real problem. You need many, many terabytes. And moving it around is even more difficult than storing it. These days, with the kind of research many of us are doing, we collaborate with a lot of different people, so we need to move the data around.
A lot of research now is being focused on gene expression, which is even more complex. When you're looking at someone's genome for information, you're really asking: Is there anything this person was born with that could affect their health? Does it tell them anything about how to eat, or things to avoid? But there's much more information contained in your tissues themselves. Today we do a lot more than just look at the inherited variants; we also look at variations between tissues. Every one of your cells has the same DNA in it, and yet the cells obviously don't behave the same.
To understand a disease specific to one type of tissue-the liver, for example-just getting your genome might not tell us anything. We may need to actually look at the genes that are being turned on and off in the tissue that is affected. Our understanding of that kind of data is not as far along as it is for the genome itself, but we're working on it very actively. That's not where the direct-to-consumer testing is going to happen-it's very complicated.
Where do you think genome sequencing can be most useful in medicine?
I think we'll continue to see personalized medicine happening in very specific cases for a while, and that will start to convince people of the value of it.
Cancer is one of those cases. There are many treatments for cancer, but they are effective for some types of tumors and not others. That's the kind of thing I'd expect to see earlier, because cancer is such a devastating disease ... and because people spend so much money treating it, if you're looking at sequencing a genome, it doesn't add that much to the total costs.
I think we'll see that sort of thing first, as opposed to walking into your internist's office when you have a cough ... they're not going to sequence your genome. Even if you have the flu, you're not going to sequence your genome; it's the flu, it's not you. Probably the most value will be the very expensive types of medical treatments where we might be able to afford doing medical genome sequencing without really changing the cost, and maybe end up saving someone's life.