Operant reward learning in Aplysia

Björn Brembs

Abstract

Anticipating the future has a decided evolutionary advantage and many evolutionarily conserved mechanisms have been found by which humans and animals learn to predict future events. The marine snail Aplysia has been at the forefront of research into the cellular and molecular mechanisms of classical conditioning. Recently, it has also gained a reputation as a valuable model system for operant reward learning. Aplysia feeding behavior can be operantly conditioned in the intact animal as well as in reduced preparations of the nervous system. The reward signal relies on dopamine transmission and acts on known intracellular cascades to bring about operant memory in an identified neuron.

Predictive learning

As toddlers we already know how to attract our parents’ attention by pretending to be crying. Learning to anticipate the consequences of our actions is central to shaping our personalities, beginning with processing social feedback down to the acquisition of motor skills such as a craft or handiwork. In our daily lives, much of this fundamental type of predictive learning takes place unnoticed, the brain subconsciously processing the constant stream of stimuli, assessing the importance of each one and cross-correlating them with our behavior. Some of the stimuli we encounter may have their own consequences, more or less independently of our behavior: the smell of fresh coffee brewing in the morning, the sound of a dentist’s drill in the waiting room or dark clouds before a rainstorm are all signals of what’s to come.

Obviously, we could not function without the capacity to learn the causes for future events. How does the brain accomplish this? How is the constant stream of relevant and irrelevant stimuli sorted and processed? In order to begin understanding the neurobiological processes that perform these tasks, the complexity of the environment has to be reduced to controlled, experimental circumstances. Ideally, the experiment would only contain two factors: a predictor and its consequence. Historically, such studies of predictive learning have been divided into two categories: one where the predictor is a behavior (operant or instrumental conditioning) and one where it is a stimulus (classical or Pavlovian conditioning). In both cases, the predictor is repeatedly followed by its consequence and the subject learns that relationship.

Evolutionarily conserved mechanisms

The past decades of research into learning and memory have revealed that the capacity of predictive learning is so fundamental, that even animals as distant as worms, mollusks and insects possess it. This insight opens up the possibility of understanding certain basic human brain functions by studying these animals, because they provide many technical advantages, unthinkable in human experimentation.

The marine mollusk Aplysia (see Fig. 1) was introduced to the neurobiology labs by Ladislav Tauc in the 1960s and popularized by Eric Kandel in the 1970s. The results gathered since then have proven so fruitful for our understanding of classical conditioning in general, that Kandel was awarded the Nobel prize for physiology and medicine in 2000.

Fig. 1: Aplysia californica is a gastropod mollusk (Opisthobranchia) that lives in the tidal waters off the coast of California. Aplysiids were given the common name of 'sea hare' by the ancient Greeks because of their supposed resemblance to the European hare. Aplysia grows fairly large and animals with a weight of 6 kg have been recorded. Without known specialized predators and an effective ink defense against the occasional generalist, Aplysia is a sluggish snail with a limited behavioral repertoire.

The chief advantage of Aplysia is its large neurons. Measuring up to 1mm, they are easy to manipulate in a variety of ways. Conveniently, Aplysia also exhibits a surprising number of different learning capacities, including associative types such as operant and classical conditioning.

It was comparatively straightforward in classical conditioning to trace the predictor (the conditioned stimulus, or CS) and its consequence (the unconditioned stimulus or US) via their sensory pathways into Aplysia’s nervous system. The learning has to take place in those neurons where the two stimuli converge (Antonov, Antonova, Kandel, & Hawkins, 2003; Walters & Byrne, 1983). In operant conditioning, the predictor (the operant behavior) is more elusive.

Aplysia feeding behavior

Since Aplysia is a snail and in its natural habitat virtually lives on its food (seaweed), they exhibit a comparatively small repertoire of spontaneous behaviors that would be suitable for operant conditioning. The logical choice, therefore, is to study feeding behavior. The situation for studying operant conditioning of Aplysia feeding behavior is almost ideal: 1. In search for food, the animals display seemingly random bites for food, even without any external stimuli triggering the bite (Kupfermann, 1974). 2. Much of the neural network constituting the central pattern generator (CPG) in the buccal ganglia that generates the behavior (see Fig. 2) is known in great detail (Elliott & Susswein, 2002). 3. Processing of food-rewards is known to depend on the esophageal nerve (Schwarz & Susswein, 1986) which originates in the buccal ganglia, providing the necessary convergence of the behavior and the reward in those ganglia. 4. Isolated buccal ganglia continue to produce the neural patterns controlling the feeding behavior (Morton & Chiel, 1993).

Fig. 2: The neural network in the buccal ganglion controls the movements of the radula, a tongue-like organ. A – Photograph of the caudal surface of a desheathed buccal ganglion. B – Photograph of the head and mouth of Aplysia during a bite. C – Schematic representation of the coordination of movements during feeding behavior. Coordination of two sets of movements, protraction-retraction versus opening-closing, of the radula determines the type of behavior displayed. During ingestion, the two radula halves are protracted out of the animal to close around food and then pull the food into the buccal cavity during retraction. Alternatively, the radula can close on an inedible item in the buccal cavity and eject the item by protracting the radula and thereby pulling the item out of the buccal cavity (i.e., rejection). Thus, in both ingestion and rejection, radula protraction and retraction alternate, whereas radula closure shifts its phase relative to protraction-retraction. In rejection, the radula closes during protraction; in ingestion, the radula closes during retraction. D – Circuit diagram of a computer model of the key buccal neurons involved in coordinating the two sets of radula movements (courtesy of Douglas Baxter)

Recording extracellularly from the esophageal nerve in the intact animal (in vivo) during biting that fails to grasp food reveals only little activity. However, when the animal is biting and swallowing seaweed, there are bursts of electrical activity in the esophageal nerve in conjunction with the swallowing movements (see Fig. 3). Presumably, the esophageal nerve transmits information about the presence of food during swallowing to the buccal ganglia (Brembs, Lorenzetti, Reyes, Baxter, & Byrne, 2002).

Fig. 3: Extracellular recordings from the anterior branch of the right esophageal nerve during ingestion behavior. A – Little activity was seen in the nerve during ingestion movements (arrows) if no food was present. B – Large ~3 s bursts of approximately 30 Hz were observed in conjunction with ingestion movements (arrows) swallowing food. Redrawn from (Brembs et al., 2002).

‘Virtual’ seaweed reward

The activity in the esophageal nerve may be a reward signal. If so, mimicking this signal by contingent in vivo stimulation of the esophageal nerve immediately after each bite (see Fig. 4) should lead to an increase in biting behavior over a control group in which the animals received the same stimulation, but independently of their behavior (yoked control). Indeed, just as if this ‘virtual’ food actually rewarded the animals for biting, they produce more bites in a test phase without any stimulation than animals in the yoked control group. This increase can be seen not only immediately after the training, but also 24h later (Fig. 5; Brembs et al., 2002).

Fig. 4: Operant training in intact freely moving Aplysia. Spontaneous bites are rewarded via chronically implanted extracellular electrodes on the anterior branch of the esophageal nerve. The stimulation pattern mimics the one recorded during feeding (see Fig. 3). Redrawn from (Brembs et al., 2002).

Fig. 5: Spontaneous bite rate in a 5-minute test phase after 10 minutes of training. A – Immediately after the training phase. B – 24 h after the training phase. Redrawn from (Brembs et al., 2002).

Apparently, the reward signal from the esophageal nerve converges on the behavior generated in the buccal ganglia. Now the task of understanding operant conditioning in Aplysia is reduced from a behavioral task involving the entire animal down to a well-characterized network of comparatively large neurons, numbering in the hundreds. Consequentially, the next steps are to characterize the reward signal further and to find the neurons that are modified by the signal. Such detailed experiments require the removal of the buccal ganglia from the animal, in order to study the neurons neurophysiologically and to apply drug treatments that would not be feasible in the intact animal. Isolated buccal ganglia in a petri dish (in vitro) containing artificial seawater continue to spontaneously produce, in seemingly random order, neural patterns of excitation (buccal motor programs, BMPs, Fig. 6A) that can be related to the different feeding-related movements in the intact animal (Morton & Chiel, 1993). If these patterns are rewarded with the same type of electric stimulation of the esophageal nerve, in vitro operant conditioning takes place. Similar to operant conditioning in the intact animal, preparations of the buccal ganglia that receive electrical stimulation after each BMP that resembled a bite in the intact animal (i.e. an ingestion-like, or iBMP), produce more iBMPs than yoked control preparations (Fig. 6B)(Nargeot, Baxter, & Byrne, 1997). This effect is blocked when a dopamine receptor antagonist, methyl-ergonovine, is added to the bath, implicating dopamine as the transmitter for the reward signal (Nargeot, Baxter, Patterson, & Byrne, 1999). Curiously, dopamine is also considered to be the prime transmitter for reward-related signals in humans and other mammals (e.g. Fiorillo, Tobler, & Schultz, 2003; O'Doherty, Dayan, Friston, Critchley, & Dolan, 2003).

Fig. 6: In vitro operant conditioning of buccal motor programs (BMPs). A – Examples of ingestion-like (iBMP) and rejection-like (rBMP) buccal motor programs. P, R, C: extracellular recordings from the nerves responsible for radula protraction, retraction and closure, respectively. The relative duration of large-unit activity for P (green), C (red) and R (blue) is diagrammed by colored boxes underneath the recorded traces. If most of the large-unit activity in the radula closure nerve C occurs after the end of large-unit activity in the protraction nerve P (dashed line), BMPs are classified as ingestion-like. If most of the closure activity occurs during protraction, BMPs are classified as rejection-like. Example traces were of BMPs spontaneously expressed in the same preparation. B – Spontaneous rate of iBMPs in a 10-minute test phase immediately after in vitro operant conditioning (redrawn from (Nargeot et al., 1997).

Cellular mechanisms of operant reward learning

Where in the feeding CPG in the buccal ganglion does dopamine act to make it produce more iBMPs? Good candidates are neurons that can act as switches in the CPG, altering the output to produce different types of BMPs. Buccal neuron 51 (B51; Plummer & Kirk, 1990) fires late in an iBMP and is silent when the BMP resembles a movement that would reject an inedible item (a rejection-like BMP, or rBMP; Nargeot et al., 1997). If B51 is fired during a BMP by injecting depolarizing current into the neuron via an intracellular microelectrode, the BMP is more likely to become an iBMP than when no current is injected. Conversely, if B51 is silenced by injecting hyperpolarizing current during a BMP, the BMP is more likely to become a rBMP (Nargeot, Baxter, & Byrne, 1999a). Thus, B51 seems to be such a pattern-switching neuron whose state largely determines the type of pattern the CPG will produce: if B51 is easily excited and likely to fire, iBMPs are more likely to occur. If B51 is more difficult to fire, rBMPs are more likely to be produced.

Interestingly, after in vitro operant conditioning, B51 is more easily excited (lower firing threshold and higher input resistance) in preparations that received contingent reward after iBMPs than in yoked control preparations (Fig. 7A)(Nargeot, Baxter et al., 1999a). Thus, one mechanism by which at least in vitro operant conditioning may bring about the lasting behavioral change is by modifying the electrical properties of a pattern switching neuron to render the CPG more likely to produce the rewarded behavior. Indeed, if stimulations of the esophageal nerve are made contingent upon activity in B51 alone (i.e. generated by current injection without a BMP), the resulting increase in excitability alone is sufficient to reproduce some aspects of the in vitro operant conditioning procedure just described (Nargeot, Baxter, & Byrne, 1999b). It is unknown how B51 changes after rewarding rBMPs.

Fig. 7: Neuronal correlates of operant conditioning on three different levels of complexity. In all three cases, B51 was more likely to be active after operant conditioning: The amount of current needed to elicit a plateau potential (burst threshold) was lower and the deflection in the membrane potential caused by subthreshold current injections (input resistance) was higher. B51 input resistance and burst threshold after A – in vitro operant conditioning (redrawn from Nargeot, Baxter et al., 1999a). B – in vivo operant conditioning. C – operant conditioning of B51 in the single cell analogue of operant conditioning (redrawn from Brembs et al., 2002).

But is B51 only relevant in the highly reduced preparation of the isolated buccal ganglia? Or does the in vitro preparation actually give us an accurate picture of the processes inside the intact animal’s central nervous system? If buccal ganglia are dissected from animals after they underwent the in vivo operant conditioning procedure, the B51 neurons show a higher propensity to fire than B51s from yoked control animals (Fig. 7B)(Brembs et al., 2002), just as after in vitro operant conditioning (Fig. 7A). Now it is clear that both in vivo and in vitro operant conditioning of Aplysia feeding behavior produces the same kind of neural correlates of the operant memory. Thus, we really can learn about the neural mechanisms of operant conditioning studying the reduced preparations.

Single cell operant reward learning

The success-story of Aplysia operant conditioning goes on to cover all levels of complexity from the behavior, via the network, and the single cell level down to the molecules involved in changing the neurons’ properties. Aplysia’s neurons are so big and robust, that they can be taken out of the ganglion and put into primary cell culture. With the evidence for the convergence of a dopamine signal onto activity during iBMPs in B51, a single cell analogue of operant conditioning can be established (Brembs et al., 2002). Late in an iBMP, B51 fires with a prolonged burst of action potentials. Such a burst can be triggered in the cultured B51 with a short pulse of depolarizing current into the neuron and outlasts the current pulse by several seconds (plateau potential). Immediately following such a plateau potential, an iontophoretic pulse of dopamine is applied, mimicking the dopaminergic reward signal after an iBMP (in vitro) or a bite (in vivo). Neurons that receive such contingent dopamine applications after each of a total of 7 plateau potentials show a lower burst threshold and a higher input resistance than neurons that received the dopamine exactly between two plateau potentials (Fig. 7C)(Brembs et al., 2002). In other words, the effects of the contingent dopamine treatments parallels the effects found after both in vivo and in vitro operant conditioning (Fig. 7). The intracellular cascades involved in establishing these effects are currently under investigation. These results are consistent with the view that in the intact animal, the dopamine mediated food-reward is contingent on a B51 that is active late during the rewarded behavior. Activity-dependent plasticity in B51 leads to a modification of the biophysical properties of the neuron (higher input resistance and lower burst threshold) that make it more likely to fire and that last at least 24 hours. At least in part, these biophysical changes in B51, in turn, contribute to the increased production of bites seen after in vivo training.

The future

It remains to be seen how far the dopaminergic actions during reward processing are actually conserved through the evolution since the ancestors of Aplysia and humans split. As was the case with classical conditioning, the pervasive, multi-level approach in Aplysia has yielded some surprising parallels and offers a potential for the progress of our understanding of these simple learning mechanisms that is unrivalled even in vertebrate research. At this time, Aplysia is the only system in which a convergence point between operant behavior and reward has been identified in the nervous system. With the knowledge about the network giving rise to the behavior and the mechanisms with which the reward acts to modify components in this network to generate operant memory, it is unparalleled also in the quality of the deduced model and the predictions it makes. However, much remains to be learned about operant reward learning of Aplysia feeding behavior. Surely, B51 cannot be the only site of plasticity in the buccal ganglia. If so, what is the quantitative contribution to the total learning process? Where are the other sites of plasticity? Will the mechanisms in the other sites be similar, or very different from those in B51? How many sites are there and at what stage of the generation of behavior are they? Where are the sites of interaction with classical conditioning, if there are any?

Studying operant reward learning in Aplysia can be especially worthwhile, since in humans reward learning can also lead to the development of maladaptive behavior patterns, such as addiction. The research into the critical mechanisms underlying reward learning has been a prominent theme of psychological and neuroscience research over the last century. It is known that the dopaminergic system is crucial for the development of most types of addiction, but the subcellular processes leading to the permanent changes in the brain are still largely unknown.

Acknowledgements

I am indebted to Sarah Peterson, Riccardo Mozachiodi, Evangelos Antzoulatos, Gregg Phares, Mark Flynn, Fredy Reyes and Vu Huynh for commenting on an earlier version of the article, to John Byrne and Douglas Baxter for providing lab space and discussions, and to the Emmy-Noether program of the German Science Foundation (DFG) for financial support. The research was funded by NIH grant MH58321.

References:

Antonov, I., Antonova, I., Kandel, E. R., & Hawkins, R. D. (2003). Activity-Dependent Presynaptic Facilitation and Hebbian LTP Are Both Required and Interact during Classical Conditioning in Aplysia. Neuron, 37(1), 135-147.

Brembs, B., Lorenzetti, F. D., Reyes, F. D., Baxter, D. A., & Byrne, J. H. (2002). Operant reward learning in Aplysia: neuronal correlates and mechanisms. Science, 296(5573), 1706-1709.

Elliott, C. J., & Susswein, A. J. (2002). Comparative neuroethology of feeding control in molluscs. Journal of Experimental Biology, 205(Pt 7), 877-896.

Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299(5614), 1898-1902.

Kupfermann, I. (1974). Feeding behavior in Aplysia: a simple system for the study of motivation. Behav Biol, 10(1), 1-26.

Morton, D. W., & Chiel, H. J. (1993). The timing of activity in motor neurons that produce radula movements distinguishes ingestion from rejection in Aplysia. J Comp Physiol [A], 173(5), 519-536.

Nargeot, R., Baxter, D. A., & Byrne, J. H. (1997). Contingent-dependent enhancement of rhythmic motor patterns: an in vitro analog of operant conditioning. Journal of Neuroscience, 17(21), 8093-8105.

Nargeot, R., Baxter, D. A., & Byrne, J. H. (1999a). In vitro analog of operant conditioning in aplysia. I. Contingent reinforcement modifies the functional dynamics of an identified neuron. Journal of Neuroscience, 19(6), 2247-2260.

Nargeot, R., Baxter, D. A., & Byrne, J. H. (1999b). In vitro analog of operant conditioning in aplysia. II. Modifications of the functional dynamics of an identified neuron contribute to motor pattern selection. Journal of Neuroscience, 19(6), 2261-2272.

Nargeot, R., Baxter, D. A., Patterson, G. W., & Byrne, J. H. (1999). Dopaminergic synapses mediate neuronal changes in an analogue of operant conditioning. Journal of Neurophysiology, 81(4), 1983-1987.

O'Doherty, J., Dayan, P., Friston, K., Critchley, H., & Dolan, R. (2003). Temporal Difference Models and Reward-Related Learning in the Human Brain. Neuron, 38(2), 329-337.

Plummer, M. R., & Kirk, M. D. (1990). Premotor neurons B51 and B52 in the buccal ganglia of Aplysia californica: synaptic connections, effects on ongoing motor rhythms, and peptide modulation. Journal of Neurophysiology, 63(3), 539-558.

Schwarz, M., & Susswein, A. J. (1986). Identification of the neural pathway for reinforcement of feeding when Aplysia learn that food is inedible. Journal of Neuroscience, 6(5), 1528-1536.

Walters, E. T., & Byrne, J. H. (1983). Associative conditioning of single sensory neurons suggests a cellular mechanism for learning. Science, 219(4583), 405-408.