Sequencing of Genes

Thank you for all coming Sunday morning to hear me talk about the Human Genome Project. While I know it's hard to get up and listen to this, at least I'm convinced that anyone who's here must be really interested in this subject in order to be here this morning.

It's a pleasure for me to be able to speak with you, in part because I know you spent the last day or so discussing the Human Genome Project and have heard from a number of distinguished scientists here. So it's an opportunity now for me to give you some details about what it actually means, what it actually looks like, what we're actually doing, as well as give you some perspective from someone who's actually doing it, which may differ a little bit from those other scientists who are thinking about it.

Let me tell you from the outset, as Kern mentioned, the organization within the university, the McDermott Center for Human Growth and Development, is a research center which focuses on problems of human disease which are inherited or which have to do with genes, be they inherited diseases, cancer, birth defects, or others.

And within that we are carrying out what is essentially a large project funded by the U.S. government, predominantly the National Institute of Health, but also the U.S. Department of Energy. That project is called the Genome Science and Technology Center. A group of people that has been working on this since 1990 was translocated here from California two-and-a-half years ago, and has as a goal carrying out the Human Genome Project.

The Human Genome Project is unique in the annals of science, and particularly biology. It's the first large biological project equivalent to, in some ways, the Manhattan Project that developed the atomic bomb or the attempt and successful landing of a man on the moon in the 1960s.

The Human Genome Project has really three major goals. The first is making physical and genetic maps of the human genome. Secondly, sequencing all of the 3 x 109, or the three billion nucleotides that make up the DNA molecules in human cells. And, thirdly, and what's not always obvious to the lay public, is sequencing the genomes of selected model organisms for very specific reasons. Those model organisms include brewer's yeast (saccharomyces cerevisiae). It includes a very obscure small worm called caenorhabitis elegans. It includes the fruit fly (drosophila), the mouse, and several other organisms.

Sequencing those organisms is, in fact, as important for the future of human medicine as is sequencing the human genome. And I hope that you can appreciate from what I'll tell you why that is the case.

The Human Genome Project has a goal which will be achieved with a confidence exceeding 99.9 percent of having all of the three times 109 to the three billion nucleotides of the human DNA sequence determined by September 30, 2003. I'd like to emphasize to you the magnitude of that effort. The DNA sequence, is, of course, a series of letters-G, A, T, and C, all repeated one after another-in a complex, biological code. The information necessary to construct the kinds of organisms scientists are used to looking at, such as simple viruses or bacteria, is about two pages of the Dallas phone book, and would be two pages entirely composed of very small letters, G, A, T, and C, repeated in a specific pattern.

Brewer's yeast is about a third of the volume of the Dallas phone book. And other organisms occupy more. But the human genome would be eighty Dallas phone books completely filled with G, A, T, and C-several stacks of telephone books about this high-and an amount of information that, for scientists, would boggle the mind in being able to accumulate and being able to deal with it.

And the goals, in very specific terms, are to make a genetic map. That's the way we can find the locations of diseases that we find inherited through families. That's now been completed. The physical map, which is a schematic of what the genome looks like, was completed early last year. And the DNA sequencing of human and these other organisms began in earnest in January of this year.

I'll mention at least my perspective on the history of this unique project, which is quite interesting. It may differ from some of the things you've heard so far. Through a number of fairly important scientific discoveries in the mid-1980s, scientists found the ability and learned how to clone or to detect genes that cause diseases, when one didn't know anything about the gene or the biochemistry.

That is, if we could find a disease that segregated through a family, such as a kind of cancer or cystic fibrosis, or any disease that was clearly inherited from parents to children, it became possible, with a technique that was invented called positional cloning, to find that gene and to determine its sequence and figure out what caused that disease.

This was distinctly different than the recombinant DNA types of studies done in the 1970s where one had to know first about the substance, like growth hormone, and then figure out how to engineer bacteria to make it. The ability to find a gene, when we didn't know anything about it other than it segregates through a family, was a major scientific advance.

And because of that, over the ensuing ten years, scientists discovered the genes for a large number of important human diseases-cystic fibrosis, Huntington's disease, several types of breast cancer, lung cancer-hundreds of different inherited conditions. In fact, even today you read every few weeks in the newspaper about the gene for a new disease being discovered and described somewhere in a scientific research laboratory.

This was an amazing advance. But it also created a major ethical dilemma in the way science was done in this country, which is there are 100,000 human genes, any one of which could cause a disease. Each of these large projects to discover a disease-causing gene took hundreds of people, millions of dollars, tens of years.

And the question is how could the U.S. government and the research funding agencies pay for this kind of approach for 100,000 genes. It was impossible. Then the question is, if we can't find every one, which genes are most important? Obviously, the ones that might be the most common and the ones in our country causing the most serious health problems. But, in fact, the most important disease gene to be understood is one that runs in my family, because that's the one I care about the most.

Because of this dilemma, to whom do we give the advantage when we're finding disease genes? A fairly radical idea was proposed in the mid-1980s-first proposed by Renato Dulbecco, a Nobel Laureate, who was, at that time, my boss and the president of the Salk Institute. He proposed that rather than finding diseases one after another based on the old paradigm, we should initiate a project-a crash project-to find all human genes in a very small amount of time-not to focus on diseases, but, first of all, to find the entire human blueprint.

The initial suggestion that he made was a library drawn by other scientists and became known as the Human Genome Project, and was an extremely controversial project in the late 1980s. But, in fact, a number of well-respected scientists, in particular James Watson and Francis Crick, managed to convince the U.S. Congress that this was not only a great idea, but it could be done, and it should be done.
And they established the Human Genome Project, which was officially initiated September 30, 1990. The goals of the project, as I've shown you, were to map and sequence all human genes. It was proposed it could be done for $3 billion, which was one dollar per base-three billion nucleotides, $3 billion-and it could be accomplished in fifteen to twenty years.

And my personal interest began in this way. One day at a meeting, James Watson came up after a talk, grabbed me aside, pulled me over to the side of the room, and said, "I want you to work on chromosome 11, and if you do I'll give you $2.5 million next year." We then began working on chromosome 11 for a variety of historical reasons.

As you'll see in our laboratory, the current effort is sequencing the entire chromosome 11, which is 150 million base pairs. Our criteria are that it will be 99.99 percent accurate. It will not be 100 percent accurate, and there's a very important reason for that. But we'll have no more than one percent in gaps because we know less than one percent is going to be extremely difficult to figure out. But those gaps will be well defined.

The reason the accuracy will be of this order is because not every person has the same DNA sequence. And if we compare the sequence of you with your neighbor sitting next to you, it will not be identical. It will differ about one in every one thousand bases.
So if we are sequencing along, we'll never know whether these are inaccuracies or differences between two different individuals. And so at this level of accuracy, which is greater than the polymorphism rate, we would be able to extract all the information that we would need.
The final prediction of the Human Genome Project and the laboratory that you'll see later this morning came from another graduate of the Salk Institute, Michael Crichton. Michael Crichton was a post-doctoral fellow of Jonas Salk several years ago. He elected not to continue as a scientist, but became a science fiction writer, a screen writer, and a movie producer. And he wrote the book, Jurassic Park, in which he proposed a laboratory set up by a commercial company, which would sequence the DNA from dinosaurs, and it would extract the DNA from insects embedded in amber.

But in the book he described the laboratory, which was constructed as "two six-foot tall round towers in the center of the room, along the walls rows of waist-high stainless steel boxes. This is our high-tech laundromat," Crichton wrote. "The boxes are Hamachi hood automated gene sequencers, and they're being run by a Cray super-computer. In essence, you're standing in the middle of an incredibly powerful genetics factory."

Probably without realizing it, he described the laboratory that we constructed two years ago here at Southwestern and similar laboratories in about nine other centers around the U.S., which are designed to sequence human DNA at very high speed, using a battery of instruments that we call ABI, or Applied Biosystems-ABI 377 automated gene sequencers. You'll see a whole bunch of those. And in our case, run not by a Cray super-computer, which is obsolete, but by a Hewlett Packard Exemplar parallel processing super-computer. In essence this description from a science fiction writer is exactly what has come to pass for purposes of the Human Genome Project.

I know you've heard probably a lot about what the project is. But it essentially is taking all the human chromosomes, of which there are twenty-three, extracting the DNA, attaching it to the DNA of microorganisms, such as yeast or bacteria, as a laboratory trick in order to make lots of that material, and going through an increasingly high resolution series of lab techniques to generate maps of higher and higher resolution, and, ultimately, to come out with a complete sequence. You can see on the walls of our laboratory all of these maps stapled up on the wall as people work on different parts of it day in and day out.

It was anticipated that the first phase of the project, making the maps, would take about seven years or so. But, in fact, starting September 30, 1990, the mapping phase was finished within about five years, ahead of schedule, and, as of last year, it's completely finished.

The sequencing effort was thought then to take an increasingly large amount of time, perhaps the next ten or fifteen years. We are now in a phase which is referred to among the organizations as the pilot project phase. That means we're trying to figure out how to do it. We know very well how to sequence DNA, but we don't know how to sequence fast enough in a large enough quantity. But we're very rapidly developing the techniques to sequence at incredible rates, which would allow the complete DNA sequence to be finished, not only on schedule, but ahead of schedule.

The initial plan was to have the human genome sequence finished by September 30, 2005, which would be fifteen years. Last year it became clear that this project was going so well that it would be possible to finish it by September 30, 2003, the current date. So I can tell you all with complete confidence that this will be finished on or before September 30, 2003. I remember that date quite well because it's my youngest daughter's birthday, who was born on the day the Human Genome Project was initiated.

These organisms, the model organisms, have already succumbed to this effort and been completed. The simplest bacteria, haemophilus influenza, was finished in July of 1995; brewer's yeast was finished in June of last year, and, as I'll show you in a minute, had some fairly profound implications for understanding of humans; the simple worm, c. elegans, will be finished in 1998; and the human sequence in 2003.

The reason these organisms are selected is for a very specific intellectual reason. The brewer's yeast is the simplest eukaryote-the simplest organism with true chromosomes-a single cell yeast that grows in liquid media.

C. elegans is the simplest animal-the simplest multi-cellular organism. It's a very small worm that has only 1,000 cells in the entire organism. So yeast will help us understand basic biochemistry and metabolism. But the worm will help us understand how animals are constructed. And all of the other model organisms are selected for very specific reason about what they tell us about the function of human beings.

I'll now tell you a little bit about the Genome Science and Technology Center here, what the people are actually doing. Our targets are initially chromosome 11, then chromosome 15, then chromosome 14. We anticipate in Dallas sequencing somewhere between fifteen and twenty percent of the entire human genome. There will be at least another five centers who will take up the other portions of the genome so that the entire thing will be completed on schedule. The Stanford group, for instance, is working on chromosome 4, another group on chromosome 17, and the effort is coordinated to avoid duplication.

This is what I referred to as a map of the chromosome-doesn't matter to you what all these things are, but these are the landmarks with which we will begin sequencing. An investigator in the laboratory would decide to take one of these markers here, and determine the DNA sequence of a large region surrounding it.

He would go to our computer data base, call up that region in the computer. Each of these dots and lines represents a fragment of DNA, or a clone in the freezer. He'd then go to the freezer, pull out the drawer, count, pull out that particular fragment, and then subject it to a process I'll discuss later.

Genome Science and Technology Center has really three main efforts. It's a well-structured and fairly large organization for an academic medical center. The mapping group determined the kind of maps I just showed. The sequencing group operates the high-speed DNA sequencing equipment, as well as the computers that assemble it and put it together and try to interpret it.

And because we're in a development phase, a large percent of our effort is in the area of automation. Our philosophy here, as opposed to philosophies at other universities is, in fact, to scale up the rate, but not to scale up the number of people. If we are able to sequence at a certain rate the best way of increasing that is just to hire more people-double or triple the size of the group.

We prefer the idea that if we can work at a certain rate, and we can utilize robots and automated equipment, we can then vastly increase the rate without increasing the number of people. That has the advantage of both getting the genome project done quicker, but also being able to sequence virtually anything. DNA is DNA. And should we decide to work on some other projects after this, the infrastructure will be in place.

The Genome Science and Technology Center is on the order of fifty people right now: the mapping group, the sequencing group, cloning support groups, and automation group, computer support, administration, and so on. I'm the director. My colleague, Skip Garner, the associate director, is a Ph.D. nuclear physicist who is responsible for all of the computer support and technology development.

And the goals we have, in fact, are to develop the laboratory you'll see and the infrastructure to carry out mapping and DNA sequencing at a rate of 100 megabases a year. We are currently probably the best DNA sequencing laboratory in the world, or at least equivalent to those, and we sequence at about four megabases a year. So we still have a substantial amount of scale-up to be done, but have a lot of technology in place to do that though, I think you'll see.

Obviously, our goal is to determine a portion of the human genome sequence as part of the Human Genome Project, about fifteen to twenty percent. We have about a third of the effort devoted to the development of instrumentation-robots, computer programs-to support high through put human genome sequencing.

So not only are we doing it, we are developing the tools. If we were sending a man to the moon, we are now in the phase of developing the rockets in order to allow us to go there.

And, finally, we are very aware that when the genome sequence is finished in 2003, which will still be within my viable scientific lifetime, we would like to be able to use that information for the benefit of scientific research and clinical strategies in the most efficient way. One of the advantages to us in having this high-tech basic science factory in the middle of a major medical center is the potential speed which discoveries in the lab can be translated into tools that could be used in the clinic.

So the organization really is five groups: mapping; sequencing; infomatics or data processing, which you'll see; supported by a resource group and an automation group. They interact in this way. The space actually looks like this. You'll walk through. As you come out this hallway here around this floor, the mapping lab is located here. We'll walk through that, through the DNA prep lab to the DNA sequencing floor, the genetics factory Michael Crichton talked about, out this door, down this corridor to these labs, and then past the infomatics suite, which is located here.

As you'll see, the computer support for this is a substantial part of the entire operation. In addition to the traditional kinds of microcomputers and work stations, we operate the Hewlett Packard super-computer, which now holds the world's speed record for processing biological data and information. It's certainly the only super-computer in an academic center in this part of Texas.

As you walk through-I'll show you in a few minutes some of the pictures of what you will see. I would, of course, point out to you this is Sunday. There will be a few people working in the lab, though not a lot. Our usual schedule is to operate the sequencing lab twenty-four hours-a-day, seven days-a-week. But for the month of December, we've actually cut back on our Sunday runs. So it won't be occupied as it usually is.

I'll ask you please don't touch anything. But, more importantly, don't let anything touch you. Much of this is automated with robots, which occasionally like to reach out and grab someone. And I think they're all turned off. I was walking through a little while ago.

A normal laboratory technician in a traditional lab, such as you'll see on other floors of this building, will sit down in the morning at a laboratory bench and carry out some experiments with a pipette. One of the most dismal and boring things is extracting the DNA from microorganisms in order to sequence it. A normal technician will do maybe ten or twenty samples a day before they finish. And that's fine for most kinds of normal laboratory operations.

However, our laboratory uses so many samples-thousands a day-that we developed in our automation shop these three DNA automation robots, which are-they don't look like R2D2. It doesn't look like what you expect a robot to look like. But it's laboratory equipment controlled by a robotic arm and programmed by a computer, where one of the technicians, Lisa, can load in 200 samples into each of these machines, walk away, and it will generate the results of those in a few hours.

So our genetics factory actually is almost entirely automated at this step. And these machines for a while were operating day in and day out, making DNA. They currently don't work so often because we've made DNA from virtually every DNA sample in the entire building. So they've kind of put themselves out of business.

I'll just tell you an interesting story-that these were invented by my colleague Skip Garner, who gave them-they do DNA preps-so he gave them the name Dr. Prepper, which we thought was really cute because it does DNA preps. And, as scientists usually do, they like to talk about their work. We talked about this at a number of meetings, but were surprised when a local soft drink company was not amused by that. And we received several letters asking us please to change the name so that we wouldn't take the danger of anyone confusing our robots with that local soft drink. So these are now known as Prepper Ph.D. machines.

A second, new robot was installed in the fall as a collaboration with Saigian Corporation, which two weeks ago was bought out by Beckman-so we're now collaborating with Beckman. This is called an ORCA-robotic arm. These devices are standard laboratory machines that a normal technician would use. These are pipetting machines here. These are thermal cyclers here. This is a plate sealer. This is a refrigerator which has a robotic door.

But this robot can move along this three-meter rail and transfer things from place to place under computer control, and can do the equivalent of what a laboratory technician would do, taking things from one place, putting them in another place, starting up the machine. And the goal of this machine is to be able to run twenty-four hours-a-day, seven days-a-week, preparing the samples to go in the sequencing machine, that would, at the present time, generate about 15,000 samples a day.

A normal technician could do maybe 200, so one machine like this can really replace a whole laboratory of very bored, uninspired people. Those bored and uninspired people can now start studying the biology of what these things mean.

Another development in the laboratory is this device, which is a DNA synthesizer. For particular parts of the sequencing project we need to sequence-once we know the sequence, we need to chemically synthesize that sequence in order to step down to the next portion. And this machine is programmed directly by the DNA sequencers, who can control this to now make the next priming step.

This is a photograph of the DNA sequencing floor. Each of these boxes is an automated DNA sequencer. It works in a way I'll show you in a minute. Each has a computer. Each is located on a moveable cart with wheels, so when it breaks down we can roll it into the shop for repairs and roll in a replacement.

We were very fortunate in moving to a institution where the administration is so forward thinking and, in fact, confident about their ability to recruit from elsewhere. Skip Garner and I had designed this entire 10,000 square feet of laboratory space. It has a lot of unique features, like these ceiling plugs. It was actually under construction before we ever committed to coming to Southwestern, which shows the confidence they had that we wouldn't turn them down. This laboratory is the heart of the entire operation and is the one that, in most cases, except for this month, is running twenty-four hours-a-day.

How are we actually doing the DNA sequencing? What does it look like? Each small piece of the human chromosome-and there are 3,250 pieces that are being sequenced-each one is subjected to a biochemical reaction, which essentially uses an enzyme to make a copy of it.

One of the remarkable things about DNA is that it not only contains information, it contains the information of how to replicate itself. We can, in a test tube, add back this substance which will copy it. And when it copies the DNA sequence, we put into the reaction the components that are labeled with fluorescent dyes of four different colors-yellow, blue, green, and red. And as the DNA strand is copied those colored dyes are incorporated into that strand, and we can detect those in the automated DNA sequencing.

What the sequencer does is it runs these fragments that are now labeled with colors through the machine. It's detected by a laser. The laser uses a photo-multiplier to determine what the color is. And the computer then interprets that as the sequence. So the colors-here in this case it's black, green, black, blue, green-can be read off as A, G, C, C, A, G, A, T, and so on.
When the machines operate you may see, if they're in operating mode this morning, these kind of colored patterns coming out. Each one of these is a small piece of a human gene. The sequence can be read off by the colors-green, yellow, yellow, green, green, yellow, green, green, and so on. That's a difficult way for us to look at.

What the computer does is interpret it like this-as a tracing where each peak is a different base in that sequence, and the computer will interpret this as T, G, G, T, A, G, A, A, G, G, T, T, and so on, by the pattern of colors that come out past the laser.

And, remember, the two important things: it takes only three billion of these to give one the entire instructions for how to construct a human being, and it takes only one of these to be incorrect in order to cause a genetic defect, like cystic fibrosis or Huntington's disease or breast cancer or thousands of other diseases.

This again is not particularly useful on the large scale. And we've instigated a procedure using the Hewlett Packard super-computer for very rapidly interpreting the sequence as something we can understand. This is a collaboration with Hewlett Packard. And this is a picture of the machine. It looks nothing like a fancy computer. It looks like a big monolithic box. But it does have these nice colored lights on the side, which I think they put on just for our entertainment.

What that machine comes out with is a sequence that looks like this: T, T, C, T, C, A, and so on. This is a very small portion of the entire instructions for the human genome. You can't see by scanning through this anything other than an uninterpretable code. But the super-computer can convert that into a diagram like this, which shows us actually some idea of what's there.

It marks for these boxes anything in red that is a gene we already knew about-a gene already discovered for some reason and present in a large database called gen-bank. It marks in blue any sequence which the computer predicts, based on the rules we've given it, that it must be a new gene, or it is likely to be a new gene that wasn't described before.

And those things in green are sequences that are highly repetitive in the human genome-that are repeated over and over and over again. That used to be called junk DNA, but, in fact, we know are not junk at all. In fact, they're quite important and have a lot to do with the ability to locate disease genes. So these are important mapping tools. And lots of other kinds of information can come out as well.

The reason that this is so important comes from something that was appreciated by scientists in April of last year when the sequence of brewer's yeast was completed. Yeast is, in fact, more similar to humans than we could possibly imagine, in that when the sequence of the entire yeast genome of about fifteen million base pairs was finished it was found to contain about 6,200 genes-actually the exact number is known. It's 6,183, I believe.

And when one categorized those genes, according to this pie chart, the genes in red are those that we already knew something about-or we know what they do. We have some idea of what their function is. And that's a very small percent of all of the genes present.

In fact, most of the other genes have never been seen before. Some of them we can guess as to what they might do. But there's a very large percent of genes here in purple where we, in fact, don't know anything about them. We don't know what they do. We don't know what their function is. And we never would have discovered them without this approach, through sequencing the human genome.

The bottom line is that the amount of information that will be new that will come out in the next few years will be incredible. And we can confidently tell the medical students at Southwestern Medical School that ninety-five percent of everything known about human genes will be discovered between now and 2003. So by the time they graduate, our view of human biology will be dramatically shifted.

There are two other aspects I'm going to mention briefly before we go out on the tour. The first is, like other large science projects, particularly like the NASA project to put a man on the moon, there are a lot of technical offshoots of this project that will be useful for things other than just the sequence of the human genome. There are hundreds of different discoveries that are having really enormous effects in pharmaceutical manufacturing, in genetic research, in forensics in legal settings, and in many, many things. I'm going to mention one offshoot of our work which we think will have a big impact-the idea of DNA chip-because it's something else going on here. Obviously, when we sequence the human genome, we will conceptually have resolved the information for how to make a human being into 3,000 megabases and put it on a CD Rom in a computer somewhere. It will take up about 750 megabytes.

The next goal, or the next challenge, is how do we use that information in a very rapid way to go back into the clinic or the laboratory and read that out in any particular individual. Another way of looking at that is what we are sequencing is a generic human. Completing that project in 2003, our next goal is to figure out how to sequence a specific human, that is, how to determine the complete sequence of any individual, for medical purposes or other purposes.

A concept that a colleague of mine, Mike Heller, and I had several years ago was to do something very far out-to develop a microchip-a computer chip-that might look like this and plug into a computer and might have a test reservoir where one could put a sample of material with DNA and have that genetic material interact directly with the microprocessors. This was so far out, in fact, we couldn't get a grant to do it from any funding agency.

So we did the next best thing. We started a company, which is called Nanogen. It's in California. And they will next year release the first commercial DNA microchip. It looks something like this. It has twenty-five different test sites on this version of it. Each of these little fifty micron locations is a genetic test for a specific thing, be it an infectious disease, a genetic disease, or identity for forensic testing. The chip looks like this sitting on my finger. This particular one has sixty-four test sites in that little teeny dot in the middle.

The results look something like this. This is the set of DNA test sites. Each of these tests for a different DNA sequence. When one is positive, one of them lights up like this. It sends an electronic signal to the computer, and the answer comes out, this person is positive for this disease.

The important thing about it is it's entirely automatic and it takes about five seconds. So it makes possible, in principle, the kind of device one sees on television on "Star Trek," where one can pull out of one's pocket a small meter, rub it against someone's skin, and read out their entire genetic makeup. And we expect that that will be possible within about ten years.

The last thing I'd like to mention is that, obviously-and I'm sure one of the sources of discussion you've had-is that all new genetics, but particularly the Human Genome Project, has specific ethical issues which have been created or amplified because of the magnitude of information.

This is not a surprise to the scientists involved in it. In fact, it was anticipated from the beginning. And, for that reason, about five percent of all the money put into genome research has actually gone into studies of legal, ethical, and social issues, as well as education.
Some of those issues which are fairly well known and discussed include commercial impact and patenting, that is, who owns the genome and should patents be allowed on it; genetic privacy-whose genome is being sequenced; genetic testing, genetic liability, predictive liability in insurance. And something which I think is not talked about at all that's a particular concern of mine is the inappropriate use of this genetic information for purposes such as biological weapons.

The conventions that have been adapted as of March of last year by an international group of scientists who are actually doing this really holds two principles foremost. The first is that we have agreed that we will not patent raw genetic material from the Human Genome Project-that that is an unethical use. It doesn't mean that patents can't be held on particular uses of the sequence, but that no one will try to patent the whole genome, or copyright it.

And, secondly, that the entire sequence will be in the public domain and will be freely available to anyone who wishes to use it, which, again, brings across this concern here about the free availability of the information, even for those that would use it unethically.

These are all questions that are being discussed and that will continue to be discussed for many years. But I'll point out that there will be many other offshoots that will have good and potentially concerning issues.

One of the scariest offshoots of the Human Genome Project, which I'll just mention to you, is, in fact, the Dog Genome Project, which was initiated several years ago to, in parallel with humans, characterize the genomes of dogs. Why would this be scary because dogs are a fairly benign organism? There are projects, of course, to work on the rice genome in Asia, the bovine genome here in Texas, and many other animals for their commercial and agricultural work.

But the dog, in my mind, has a particular concern. The concern is this. Dogs, through human civilization, have been bred for their behavioral traits and for their ability to carry out certain behavior. And their behaviors are quite distinct and quite demonstrable.

The Dog Genome Project is based on the principle of crossing a labrador with a border collie--each of whom has specific behaviors, analyzing the pedigrees, and mapping the diseases, not for genes, but for personality traits. And the kinds of personality traits that will be almost certainly located and converted into digital information will be things like eye contact with the owner, tail posture, barking, but also things like affection, response to family members in water, sociability to other dogs, aggressive behavior, and many other things.

We have learned from our studies of yeast and worms that every gene we find in the lower organism will be represented in humans, without any question. And if we find genes for affection, sociability or sociopathy in dogs, we will find equivalent genes in humans almost for sure, which would allow us, with technology ability, to actually screen individuals for the personality traits that might be desirable or undesirable.

In my mind, the idea of screening inductees into the U.S. Army for aggressive genes is quite scary. And whether or not those things should come to pass remain to be seen.

So, with that, I'll finish. We can go onto the tour. I'll point out to you that the Human Genome Project is something that's of great excitement to all of us here. We've been working hard on it for a number of years, even before coming to Southwestern.

It sometimes strikes us as a project of immense magnitude. But, in fact, it's not all that great a thing when compared with other large structures in the universe. And when people say, "Oh, my God, three billion nucleotides-how are we going to do it?"-I just point out to them then that there are actually three times 1011 stars in our galaxy. So compared with that this is actually a small project.

I thank you very much, and I'll be happy to answer any questions.