Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms

Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms

Björn Brembs¹, Fred D. Lorenzetti¹, Fredy D. Reyes, Douglas A. Baxter,
John H. Byrne (PDF)

¹ These authors contributed equally to this work.

Summary: Operant conditioning is a form of associative learning through which an animal learns about the consequences of its behavior. Here we report an appetitive operant conditioning procedure in Aplysia that induces long-term memory. Biophysical changes that accompanied the memory were found in an identified neuron (cell B51) that is considered critical for the expression of behavior that was rewarded. Similar cellular changes in B51 were produced by contingent reinforcement of B51 with dopamine in a single-cell analogue of the operant procedure. These findings allow for the detailed analysis of the cellular and molecular processes underlying operant conditioning.

Learning about relationships between stimuli [i.e., classical conditioning; (1)] and learning about the consequences of one’s own behavior [i.e., operant conditioning; (2)] constitute the major part of our predictive understanding of the world. Although the neuronal mechanisms underlying appetitive and aversive classical conditioning are well studied (e.g., 3-8), a comparable understanding of operant conditioning is still lacking. Published reports include invertebrate aversive conditioning (e.g., 9-12) and vertebrate operant reward learning (e.g., 13). In several forms of learning, dopamine appears to be a key neurotransmitter involved in reward (e.g., 14). Previous research on dopamine mediated operant reward learning in Aplysia was limited to in vitro analogues (15-18). In this report, we overcome this limitation by developing both in vivo and single cell operant procedures and describe biophysical correlates of the operant memory.

The in vivo operant reward learning paradigm was developed using the consummatory phase (i.e., biting) of feeding behavior in Aplysia. This model system has several features that we hoped to exploit. The behavior occurs in an all-or-nothing manner and is thus easily quantified (see Video). The circuitry of the underlying central pattern generator (CPG) in the buccal ganglia is well characterized (e.g., 19). The anterior branch of the esophageal nerve (En₂, Fig. 1A) is both necessary and sufficient for effective reinforcement during in vivo classical conditioning and in vitro analogues of classical and operant conditioning (15-18, 20-23). Presumably, En₂ conveys information about the presence of food during ingestive behavior. Consequently, we investigated the role of En₂ in the reinforcement pathway by recording from it in freely behaving Aplysia via chronically implanted extracellular hook-electrodes (24) (see Methods 1) (Fig. 1A). Little nerve activity was observed during spontaneous biting in the absence of food (Fig. 1B1), whereas bouts (duration: ~3 s) of high frequency (~30 Hz) activity in En₂ were recorded during the ingestion of food (Fig. 1B2). Specifically, this activity was observed in conjunction with ingestion movements of the odontophore/radula (a tongue like organ). Electrical stimulation of En₂ might thus be used to substitute for food reinforcement in an operant conditioning paradigm. Therefore, in vivo stimulation of En₂ at approximately the frequency and duration as observed during feeding was made contingent upon each spontaneous bite in freely behaving animals (see Methods 2). Such a preparation is unique in studies of learning in invertebrates and analogous to commonly used self-stimulation procedures in rats (e.g. 13).

Fig. 1: In vivo recordings and behavioral results. (A) Schematic representation of electrode placement. (B1) Activity in En₂ during spontaneous bites in the absence of food. Depicted are three bites (arrows). (B2) Activity in En₂ during biting and swallowing behavior in the presence of food. Seven bite-swallows are shown (arrows). (C and D) Behavioral results. (C) Spontaneous bite rate in the final unreinforced test phase immediately after training. There was a significant difference among the three groups (Kruskal-Wallis ANOVA, H₂ = 9.678, p < 0.008). A post-hoc analysis revealed that the number of bites in the contingently reinforced group was significantly higher than both control and yoked groups (Mann-Whitney U-Tests, U = 16.5, p < 0.007 and U = 24.0, p < 0.05, respectively). The two control groups did not differ significantly (Mann-Whitney U-Test, U = 29.0, p = 0.07). D: Spontaneous bite rate in the unreinforced test phase 24 h after the beginning of the experiment. There was a significant difference between the three groups (Kruskal-Wallis ANOVA, H₂ = 11.9, p < 0.003). The number of bites produced by the contingent reinforcement group was higher than the two control groups (Mann-Whitney U-Tests, U = 1.5, p < 0.009, control and U = 0.0, p < 0.004, yoke). The two control were not significantly different (Mann-Whitney U-Test, U = 9.5, p = 0.17). In this and subsequent illustrations, bar graphs display means ± s.e.m. (Click on image for larger file)

One day after implanting the electrodes, animals were assigned to one of three groups: i) a control group without any stimulation, ii) a contingent reinforcement group for which each bite during training was followed by En₂ stimulation, or iii) a yoked control group that received the same sequence of stimulations as the contingent group, but the sequence was uncorrelated with their behavior (25). Animals that had been contingently reinforced showed significantly more spontaneous bites during a five-minute test period than both control groups, regardless of whether they were tested immediately after training (Fig. 1C) or 24 h later (Fig. 1D). These results indicate that during ten minutes of contingent stimulation, the animals acquired an operant memory that lasted for at least 24 h.

We next sought to identify changes in the nervous system that were associated with the behavioral modification. The neural activity that underlies the radula movements during feeding is generated by the buccal CPG. This neural network consists of sensory, inter- and motor neurons that continue to produce buccal motor patterns (BMPs), even when the ganglia are removed from the animal (15). In the intact animal, ingestion-like BMPs correspond to radula movements transporting food through the buccal mass into the foregut, as opposed to rejection-like BMPs that correspond to radula movements that remove inedible objects from the foregut (24). Buccal neuron B51 is pivotal for the selection of BMPs. Specifically, B51 exhibits a characteristic, sustained all-or-none level of activity (plateau potential) during ingestion-like BMPs. Moreover, B51 can gate transitions between BMPs: direct depolarization of B51 leads to the production of ingestion-like BMPs, whereas hyperpolarization inhibits ingestion-like BMPs (18). We thus examined whether the observed increase in number of bites was associated with an increase in excitability of B51.

To test the hypothesis that B51 was a site of memory storage for operant conditioning, another set of animals was conditioned (26). Immediately after the last training period, the animals were anaesthetized, dissected and the buccal ganglia prepared for intracellular recording (see Methods 3). Resting membrane potential, input resistance, and burst threshold were measured in B51. Burst threshold was defined as the amount of depolarizing current needed to elicit a plateau potential (see also 16, 18). Cells from the contingent group exhibited a significant decrease in burst threshold (Fig. 2A) and a significant increase in input resistance (Fig. 2B) compared to cells from the yoked control. The resting membrane potential did not differ between the groups (27). The decrease in burst threshold and increased input resistance both increase the probability of B51 becoming active and thus increase the probability of a BMP to become ingestion-like. Our data validate an in vitro analogue of operant conditioning in isolated buccal ganglia (16) and extend the research to include operant conditioning in freely moving Aplysia.

Fig. 2: Changes in burst threshold and input resistance in B51 after operant training. (A) Burst Threshold. (A1 and A2) Intracellular recordings from B51 cells from a matched pair of contingently reinforced and yoked control animals. Depolarizing current pulses were injected into each B51 until the cell generated a plateau potential. In this example, a 6 nA current pulse was sufficient to generate a plateau potential in B51 from a contingently reinforced animal (A1), whereas 14 nA were required to generate a plateau potential in B51 from the corresponding yoked control animal (A2). A3: Summary data. B51 cells from the contingent reinforcement group required significantly less current to elicit the plateau potential (Mann-Whitney U Test, U = 59.5, p < 0.03). (B) Input Resistance. (B1 and B2) Intracellular recordings from B51 cells from both contingently reinforced and yoked control animals. Hyperpolarizing current pulses were injected into B51 and the cells’ input resistance was measured. In this example, the membrane potential of B51 from a contingently trained animal (B1) deflected more in response to the current pulse than the potential of B51 from a yoked control animal (B2). (B3) Summary data. B51 input resistance was significantly increased in contingently reinforced animals (Mann-Whitney U Test, U = 37.0, p < 0.002). (Click on image for larger file)

Although the expression of intrinsic changes in the membrane properties of B51 was associated with operant conditioning, the maintenance of these changes could be due to extrinsic factors such as a tonic change in modulatory input to B51. If so, the locus of the associative neuronal mechanism may be upstream of B51. Moreover, as B51 is active during ingestion-like BMPs, the changes in B51 could be the effect of repeated activation, rather than a cause of operantly conditioned animals producing more bites than yoked controls. To solve this question, we isolated the neuron in primary cell culture and developed a single-cell analogue of the operant procedure. B51 neurons were removed from naïve Aplysia and cultured (see Methods 4). Dopamine mediates reinforcement in an in vitro analogue of operant conditioning (17) and En₂ is rich in dopamine-containing processes (28). Therefore, reinforcement was mimicked by a brief (6 s) iontophoretic “puff” of dopamine onto the neuron. Because B51 exhibits a plateau potential during each ingestion-like BMP, this reinforcement was made contingent upon a plateau potential elicited by injection of a brief depolarizing current pulse. Contingent reinforcement of such B51 activity in the ganglion with En₂ stimulation is sufficient for in vitro operant conditioning (18). Two experimental groups were examined. Building on the experience with in vitro operant conditioning (18), we administered seven supra-threshold current pulses in a ten-minute period to a contingent reinforcement group. Dopamine was iontophoresed immediately after cessation of the plateau potential. An unpaired group received the same number of depolarizations and puffs of dopamine, but dopamine iontophoresis was delayed by 40 s after the plateau potential. Contingent application of dopamine produced a significant decrease in burst threshold (Fig. 3A) and a significant increase in input resistance (Fig. 3B). Apparently, processes intrinsic to B51 are responsible for the induction and maintenance of the biophysical changes associated with operant reward learning.

Fig. 3: Contingent-dependent changes in burst threshold and input resistance in cultured B51. (A) Burst Threshold. (A1 and A2) Intracellular recordings from a pair of contingently reinforced and unpaired neurons. Depolarizing current pulses were injected into B51 before (Pre-Test) and after (Post-Test) training. In this example, contingent reinforcement led to a decrease in burst threshold from 0.8 nA to 0.5 nA (A1), whereas it remained at 0.7 nA in the corresponding unpaired cell (A2). (A3) Summary data. The contingently reinforced cells had significantly decreased burst thresholds (Mann-Whitney U-Test, U = 0.0, p < 0.004). (B) Input Resistance. (B1 and B2) Intracellular recordings from a pair of contingently reinforced and unpaired control neurons. Hyperpolarizing current pulses were injected into B51 before (Pre-Test) and after (Post-Test) training. In this example, contingent reinforcement lead to an increased deflection of the B51 membrane potential in response to the current pulse (B1), whereas the deflection remained constant in the corresponding unpaired cell (B2). (B3) Summary data. The contingently reinforced cells had significantly increased input resistances (Mann-Whitney U-Test, U = 3.5, p < 0.03). (Click on image for larger file)

The combination of rewarding a simple behavior with physiologically realistic in vivo stimulation uncovered neuron B51 as one site where operant behavior and reward converge (see Discussion). The results presented here suggest that intrinsic cell-wide plasticity contributes to operant reward learning. Such cell-wide plasticity is also associated with operant conditioning in insects (10). Although B51 is a key element in the neural circuit for feeding, the quantitative contribution of the changes in B51 to the expression of the behavioral changes needs to be elucidated. Given the number of neurons in the feeding CPG (19), it is likely that B51 will not be the only site of plasticity during operant conditioning (nor will cell-wide plasticity likely be the only mechanism). However, the persistent involvement of contingent-dependent cell-wide plasticity in B51 in different levels of successively reduced preparations suggests an important role for this mechanism.

Research on Aplysia has provided key insights into mechanisms of aversive conditioning that are evolutionary conserved. The utility of this model system for learning and memory has now been extended to dopamine-mediated reward learning on the behavioral, network and cellular level. Our study expands a growing body of literature that shows that dopamine is an evolutionary conserved transmitter used in reward systems. Future research on Aplysia will provide insights into the subcellular effects of dopamine reward, an area currently under intense investigation in vertebrates (e.g., 8, 13).

References and Notes

I. P. Pavlov, Conditioned reflexes (Oxford University Press, Oxford, 1927).
B. F. Skinner, The behavior of organisms (Appleton, New York, 1938).
E. T. Walters, J. H. Byrne, Science 219, 405 (1983).
R. D. Hawkins, T. W. Abrams, T. J. Carew, E. R. Kandel, Science 219, 400 (1983).
M. Hammer, Nature. 366, 59 (1993).
J. J. Kim, D. J. Krupa, R. F. Thompson, Science 279, 570 (1998).
T. Zars, M. Fischer, R. Schulz, M. Heisenberg, Science 288, 672 (2000).
P. Waelti, A. Dickinson, W. Schultz, Nature 412, 43 (2001).
P. R. Benjamin, K. Staras, G. Kemenes, Learn. Mem. 7, 124 (2000).
G. Hoyle, Trends Neurosci. 2, 153 (1979).
D. Botzer, S. Markovich, A. J. Susswein, Learn. Mem. 5, 204 (1998).
D. G. Cook, T. J. Carew, J. Neurosci. 9, 3115 (1989).
J. N. J. Reynolds, B. I. Hyland, J. R. Wickens, Nature 413, 67 (2001).
W. Schultz, Nature Rev. Neurosci. 1, 199 (2000).
R. Nargeot, D. A. Baxter, J. H. Byrne, J. Neurosci. 17, 8093 (1997).
R. Nargeot, D. A. Baxter, J. H. Byrne, J. Neurosci. 19, 2247 (1999).
R. Nargeot, D. A. Baxter, G. W. Patterson, J. H. Byrne, J. Neurophysiol. 81, 1983 (1999).
R. Nargeot, D. A. Baxter, J. H. Byrne, J. Neurosci. 19, 2261 (1999).
E. C. Cropper, K. R. Weiss, Curr. Opin. Neurobiol. 6, 833 (1996).
H. A. Lechner, D. A. Baxter, J. H. Byrne, J. Neurosci. 20, 3369 (2000).
H. A. Lechner, D. A. Baxter, J. H. Byrne, J. Neurosci. 20, 3377 (2000).
M. Schwarz, A. J. Susswein, J. Neurosci. 6, 1528 (1986).
R. Mozzachiodi, H. Lechner, D. Baxter, J. Byrne, paper presented at the 31st Annual Meeting of the Society for Neuroscience, San Diego, Ca, 13. November 2001.
D. W. Morton, H. J. Chiel, J Comp Physiol [A] 172, 17 (1993).
A Kruskal-Wallis analysis of variance (ANOVA) determined that the number of bites did not differ between the three groups during an initial five-minute pre-test period without reinforcement (control: 13.1 bites; contingent: 10.5 bites; yoke: 15.1 bites; H₂ = 2.306, p = 0.32, N = 49). Differences in bite frequency between the groups began to emerge during training: biting increased during training in the contingent, but not in the other groups. A repeated measures ANOVA over the two training periods (tr₁, tr₂) and the three groups yielded a significant interaction of within- and between-groups factors (control: tr₁, 13.0 bites; tr₂, 9.6 bites; contingent: tr₁, 11.4 bites; tr₂, 15.1 bites; yoke: tr₁, 11.9 bites; tr₂, 10.2 bites; F(2, 46) = 7.198, p < 0.002, N = 49). After training, learning performance was assessed in a five-minute test period without reinforcement.
In the conditioning experiment conducted to search for correlates of the operant memory in B51, an additional five-minute training period replaced the last test, to minimize extinction and ensure a high level of conditioning. As unstimulated and yoked control groups did not differ significantly in the previous experiment, only two groups were used: contingent reinforcement and yoked control. Comparisons of the number of bites produced during the last five-minute training period assessed the success of the operant conditioning procedure. Confirming the previous results, contingently reinforced animals produced significantly more bites in the last training period than animals in the yoked control group: Mean contingent: 13.5, mean yoke: 8.4; Mann-Whitney U Test, U = 62.0, p < 0.04.
Mean contingent: -65.7 mV, N = 13, mean yoke: -65.3 mV, N = 12; Mann-Whitney U Test, U = 77.0, p < 0.96.
E. A. Kabotyanski, D. A. Baxter, J. H. Byrne, J. Neurophysiol. 79, 605 (1998).
We thank Evangelos Antzoulatos for helpful discussions and Elizabeth Wilkinson for invaluable technical assistance. BB is a scholar of the Emmy-Noether Programm of the Deutsche Forschungsgemeinschaft. Supported by NIH grant MH 58321