Operant conditioning in invertebrates
Learning to anticipate future events on the basis of past experience with the consequences of one's own behavior (operant conditioning) is a simple form of learning that humans share with most other animals, including invertebrates. Three model organisms have recently made signiﬁcant contributions towards a mechanistic model of operant conditioning, because of their special technical advantages. Research using the fruit fly Drosophila melanogaster implicated the ignorant gene in operant conditioning in the heat-box, research on the sea slug Aplysia californica contributed a cellular mechanism of behavior selection at a convergence point of operant behavior and reward, and research on the pond snail Lymnaea stagnalis elucidated the role of a behavior-initiating neuron in operant conditioning. These insights demonstrate the usefulness of a variety of invertebrate model systems to complement and stimulate research in vertebrates.
We all experience pleasurable and painful events on a daily basis. Predicting the occurrence of either is crucial for seeking the pleasurable ones and avoiding the painful ones. Most of the time, past experience with reliable predictors of these events helps us to do this. We may smell fresh coffee brewing in the morning, hear the sound of a dentist's drill in the waiting room or see dark clouds before a rainstorm. We also experience that touching a hot plate is painful and that saying ‘please' will often give us the desired treat. The learning by which we associate external predictors (conditioned stimuli, CSs) with important outcomes (unconditioned stimuli, USs) is called classical or Pavlovian conditioning . Learning from the consequences of our behavior (an internal predictor) is called operant or instrumental conditioning .
Understanding the neurobiology that underlies classical conditioning is a lot easier than doing the same for operant conditioning: one can follow the stimuli from their respective sensory organs into the brain and ﬁnd the points of convergence where the learning takes place. By contrast, the points where the US (reinforcement or punishment in the operant nomenclature) converges on operant behavior have proven much more elusive. The complexity of the vertebrate brain makes it difﬁcult to discern the circuits that are responsible for the generation of the behavior, and stimuli are processed in several hierarchical and interlocking steps. Fortunately, small brains can also learn operantly and classically. It seems that these simple forms of predictive learning are so fundamental that they appeared early in evolution and have been indispensable ever since. Out of the many invertebrates that show operant conditioning, three in particular have recently helped to further our progress towards a mechanistic model of operant conditioning: the fruit fly Drosophila melanogaster, the sea slug Aplysia californica and the pond snail Lymnaea stagnalis.
Heat-box learning in Drosophila
Research on the genetically renowned fruit fly Drosophila (Figure 1a) has been revealing a steady ﬂflow of genes that are involved in olfactory classical conditioning for the past three decades [3–11]. Many of these genes affect the level of the second-messenger cAMP and are preferentially expressed in a prominent neuropil in the flﬂy's brain, the mushroom bodies (Figure 1b; [12–14]). For some of these genes, being expressed exclusively in the mushroom bodies is sufﬁcient for normal learning . Are those genes also involved in operant conditioning? What role do the mushroom bodies play in operant conditioning?.
The heat-box (Figure 1c) is the perfect instrument to use the powerful genetic techniques in Drosophila to study operant conditioning [16,17,18] . Every time the ﬂfly walks into the designated half of the tiny dark chamber the whole space is heated. As soon as the animal leaves the punished half, the chamber temperature reverts to normal. After a few minutes, the animals restrict their movements to one-half of the chamber, even if the heat is switched off. Several training sessions interspersed by test phases in which the heat is permanently switched off are more effective than one long training session . With a brief reminder training, this memory is still detectable even if the ﬂfly is taken out of the chamber and then tested in a different one, up to two hours later .
As it is completely dark in the chamber, the animal is most likely can be shown that the operant memory consists of two to rely on idiothetic cues for orientation, thus minimizing components, a spatial component and a ‘stay-where-you-the contamination with potential classical predictors. It are' component .
Flies with mutations in the genes involved in classical conditioning (those affecting cAMP) show marked deficits in the heat-box . However, the question remains of whether learning classical (external) predictors is really the same as learning operant (internal) predictors on the genetic scale or are there operant learning genes that are not involved in classical conditioning? Taking advantage of the size of the fruitflﬂy, there are usually a barrage of chambers connected in one setup, making a genetic mutant screen possible in an operant learning paradigm for a single flﬂy. Apparently supplementing the previous mutant data, one of the mutants found using the heat box approach affects an enzyme that is thought to be downstream of the cAMP pathway: the ignorant gene codes for the p90 ribosomal S6 kinase (RSK, Figure 1d; ).
However, if the different alleles generated by the screen are scrutinized a little closer, it appears that ignorant has very different effects on operant and classical conditioning. The original mutant (ignP1), with a Drosophila transposable element in the ﬁrst exon of the gene, shows a sexual dimorphism in the heat-box, where males are impaired but females appear normal . Both males and females of that line are statistically indistinguishable from the wild type controls in olfactory classical conditioning . The null mutant (ign58/1), in which the entire RSK sequence is missing, shows decreased learning and memory in the classical case  , but is normal in the heat-box . Finally, several partial deletions of the ignorant gene make ﬂies deﬁcient in the heat-box task , but these lines have not yet been tested for classical conditioning. Apparently, different mutations of the ignorant gene have different effects on operant and classical conditioning, which indicates a differentiated role of RSKs in the two forms of learning.
Paralleling the differential results on the molecular scale, there are several operant learning situations, including the heat-box, that do not require the mushroom bodies , whereas olfactory classical conditioning is abolished without mushroom bodies . It seems unlikely, however, that the mushroom bodies are generally required for classical conditioning, as flﬂies without them do very well in classical conditioning with visual CSs . The picture that emerges suggests that the mushroom bodies are needed for chemosensory learning and higher-order integrative tasks  . Although the cAMP cascade and its downstream targets are both necessary and sufﬁcient in the mushroom bodies for these tasks [15,22], in operant conditioning they are involved in neurons outside the mushroom bodies and in a different way than in classical conditioning .
Neither the neurotransmitter mediating the reinforcement nor the brain region controlling the relevant behaviors are yet known. The antennal lobes, the median bundle, and the ventral ganglion in the thorax are good candidate regions, because a functional cAMP cascade in these regions alone is sufﬁcient for learning in the heat-box . Finding the transmitter and scrutinizing the expression patterns of the wild type and mutated ignorant gene should help to identify the location of the potential regions in which behavior and reinforcement converge. Once those convergence points are found, they can be targeted and speciﬁc parts of the molecular machinery can be manipulated to not only evaluate necessity and sufﬁciency of each point but also to help construct a mechanistic model of operant conditioning on the cellular and molecular level.
Operant reward-learning of feeding behavior in Aplysia
Similar to Drosophila, the sea slug Aplysia (Figure 2a) is also better known for its prominent role in classical conditioning [25–29]. Unlike the situation in Drosophila, its strengths lie in the analysis of the cellular and network level, which provides a possibility to ﬁfind the convergence points of operant behavior and reinforcement. By virtue of its large neurons (Figure 2b) it is possible to trace the neural networks in the ganglia and follow the ﬂflows of activity generated by sensory stimulation or during behavior. Aplysia's feeding behavior (Figure 2c) has proven very valuable for the study of operant conditioning [30, 31–34]. The key neurons in the central pattern generator (CPG) are known [35,36]. They are located in the buccal ganglion and, in part, control the ingestion and rejection movements of the radula (a tongue-like organ) in the buccal mass (Figure 2d). Conveniently, the behavior can be both classically and operantly conditioned [30, 31, 37, 38].
Early on, the esophageal nerve appeared to be crucial for the effectiveness of these conditioning experiments [31,38,39]. Recording extracellularly from the esophageal nerve in the intact animal during a biting movement that fails to grasp food reveals little activity. However, when the animal grasps and swallows seaweed, there are bursts of activity in the esophageal nerve during and outlasting the swallowing movements . Presumably, the esophageal nerve transmits information about the presence of food to the buccal ganglia.
In an effort to mimic the food signal as reward in an operant conditioning experiment, the esophageal nerve was stimulated in vivo (in a pattern resembling the recorded activity) whenever the animals produced a bite (no food present). No other stimuli were contingent with the bites, minimizing the contamination with classical components. Just as if this ‘virtual' food rewarded the animals, they produced more bites in a subsequent test session without stimulation than a control group that had received the same stimulation sequences, but independently of their behavior (yoked control) .
Apparently, the reward signal from the esophageal nerve converges on the behavior that is generated by activity in the buccal ganglia. However, the question that remains is in which neurons does this happen? One neuron thought to determine whether a radula movement becomes an ingestion or a rejection, is B51 [32,40]. Interestingly, not only is there evidence that B51 is active during the rewarded behavior but it also receives a dopaminergic input from the esophageal nerve , that is, the possibility exists that B51 constitutes a convergence point of operant behavior and dopamine-mediated reward. In line with this hypothesis are ﬁndings that B51 shows altered biophysical membrane properties after operant conditioning, making it more excitable [30, 32]. Additional evidence comes from a single cell analogue of operant conditioning. If cultured B51 cells receive an iontophoretic puff of dopamine right after (as opposed to between) depolarization-induced activity that mimics the presumed B51 activity during a bite, they show the same biophysical changes as those seen after operant conditioning . These results are consistent with the view that during this form of operant conditioning, a dopamine-mediated food-reward is contingent on activity in B51 during the rewarded behavior. Activity-dependent plasticity in B51 leads to a modiﬁcation of the biophysical properties of the neuron that make it more likely to be active. These biophysical changes in B51, in turn, contribute to the increased production of bites seen after operant training.
Although the described biophysical changes are sufﬁcient for some aspects of the operant learning , it is not known if the changes in B51 are necessary for learning to occur. It is also not yet known how many more neurons are involved and what their relative contributions are. For example, although B51 is crucial for determining what kind of pattern the buccal CPG produces, it is active rather late during the pattern and not involved in initiating the behavior . To construct a mechanistic model of operant conditioning it will be vital to understand the role of initiating activity and if/how spontaneously active neurons are modiﬁed as the learning takes place.
Operant conditioning of aerial respiratory behavior in Lymnaea
The pond snail, Lymnaea (Figure 3a) may provide the data to elucidate the role of activity initiating neurons in operant conditioning. Lymnaea is a bimodal breather. Under normoxic conditions, it obtains oxygen cutaneously, whereas under hypoxic conditions, it moves to the surface to supplement cutaneous oxygen uptake by aerial respiration using its pneumostome (respiratory oriﬁce; Figure 3a). Similar to Aplysia, Lymnaea has a relatively simple nervous system and a central ring ganglion contains the CPG for generating the aerial respiratory behavior (Figure 2b; [41,42]). Experimentally, hypoxic conditions are induced by bubbling N2 in the training beaker (Figure 3c; ). A sharpened wooden applicator is used to lightly touch the pneumostome as it opens. This punishment only causes the animal to close the pneumostome and does not elicit the defensive withdrawal of the whole animal. With repeated stimulation, the animals cease to open their pneumostome. Control groups showed that this effect is neither due to a general decrement caused by the induced hypoxia nor due to non-associative effects of the stimulation .
The three-cell CPG (Figure 3d) that controls aerial respiratory behavior is well characterized and can be reconstituted in cell culture [44,45]. In the most exten sively characterized invertebrate operant conditioning preparation, various training regimes have been reported to induce a context dependent multi-phasic memory, which includes aspects of short-, intermediate-, and long-term memory, that lasts for up to one month [46, 47, 48, 49, 50–54]. The accounts of the underlying neurobiology include the differential requirements of local translation and transcription for intermediate-and long-term memory, respectively, as well as neural correlates in neurons in the CPG [46, 49, 55, 56]. Importantly, the CPG activity initiating neuron RPeD1 (right pedal dorsal 1; see Figure 3d) shows a lower spontaneous fiﬁring frequency in semi-intact preparations from trained animals after a brief reminder training . In isolated ganglia of operantly trained animals, RPeD1 is quiescent more often than in preparations from yoked control animals, and the efiﬁcacy of the excitatory connection from RPeD1 to IP3 (input 3 interneuron; see Figure 3d) is reduced .
The most parsimonious explanation of the published data is that RPeD1 is active at the beginning of the behavior and contingent stimulation of the pneumostome changes several of its biophysical and synaptic properties. These changes can last up to several hours solely relying on local translation. Transcription is required in order for the changes to become more permanent. It would be very interesting to see if these changes could be brought about in a single cell analog with the isolated RPeD1 in cell culture. For such an experiment, it remains to be determined of what biophysical or molecular nature the changes found in RPeD1 are, and whether the punishment is mediated through the whole-body withdrawal neuron (Figure 3d) or affects RPeD1 directly.
Using the particular advantages of each model system, research in Drosophila generated new insights into the molecular processes involved in operant conditioning, research in Aplysia yielded a convergence point of operant behavior and reinforcement and suggested a possible cellular mechanism of operant conditioning, and research in Lymnaea shed some light on the possible role of activity-initiating neurons in a CPG in operant conditioning. It is noteworthy that each animal provided a piece of data that was not obtainable in the other model systems. If one were to attempt an integrated mechanistic model of operant conditioning at this early stage, one could say that contingent reinforcement/punishment acts on behavior-initiating and -switching neurons altering both their biophysical membrane properties and their synaptic connections through the cAMP cascade and its downstream targets.
Paralleling evidence from vertebrates [57,58], it was found that different brain circuits and molecular mechanisms are involved when external (i.e. stimuli) or internal (i.e. behaviors) predictors are used to anticipate important events. Not unexpectedly, the data presented here are consistent with the idea that the modiﬁcations induced by learning reside in the circuits involved in processing the predictors, that is, the sensory pathways in classical conditioning and the behavior generators (CPGs) in operant conditioning. On a more general level, one can conceive that any important event (US) generates a distributed signal acting on coincidentally active neurons through activity-dependent plasticity. The location of the circuits, the points of convergence between the neurons processing the events that precede the signal (the predictors) and the signal itself will depend not only on the type of predictor (operant or classical) but also on the type of US (reward or punishment).
Research in vertebrates is not so fortunate to be able to deduce the involvement of the relevant brain regions by virtue of their location with respect to the studied learning paradigm. Most progress towards a mechanistic model of operant conditioning can be made by studying several model organisms, so questions can be tackled on many levels of complexity with the ideal method in the ideal system for the particular question. Thus, the largest leaps in understanding operant conditioning can be expected from integrative approaches using the common task of learning from the consequences of behavioral actions to constrain the design of experiments, so that they become comparable across phyla. Using the common and disparate data, one can then construct a general mechanistic model of operant conditioning, with wide ramiﬁcations ranging from the basic sciences of neurobiology and evolution to substance abuse, mental illness and even philosophy.
I am indebted to R Mozzachiodi, E Antzoulatos, G Phares, F Lorenzetti, D Baxter, G Spencer and G Putz for commenting on an earlier version of the manuscript, to J Byrne and D Baxter for providing laboratory space and discussions, to G Spencer, J Dow, M Heisenberg and D Baxter for providing ﬁfigures, and to the Emmy-Noether program of the Deutsche Forschungsgemeinschaft for fiﬁnancial support.
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
* of special interest
** of outstanding interest