Algorithms in nature: the convergence of systems biology and computational thinking

Computer science and biology have enjoyed a long and fruitful relationship for decades. Biologists rely on computational methods to analyze and integrate large data sets, while several computational methods were inspired by the high-level design principles of biological systems. Recently, these two directions have been converging. In this review, we argue that thinking computationally about biological processes may lead to more accurate models, which in turn can be used to improve the design of algorithms.

We discuss the similar mechanisms and requirements shared by computational and biological processes and then present several recent studies that apply this joint analysis strategy to problems related to coordination, network analysis, and tracking and vision. We also discuss additional biological processes that can be studied in a similar manner and link them to potential computational problems. With the rapid accumulation of data detailing the inner workings of biological systems, we expect this direction of coupling biological and computational studies to greatly expand in the future.

Introduction

Biologists have been increasingly relying on sophisticated computational methods, especially over the last two decades as molecular data have rapidly accumulated. Computational tools for searching large databases, including BLAST (Altschul et al, 1990), are now routinely used by experimentalists. Genome sequencing and assembly rely heavily on algorithms to speed up data accumulation and analysis (Gusfield, 1997Trapnell and Salzberg, 2009Schatz et al, 2010). Computational methods have also been developed for integrating various types of functional genomics data and using them to create models of regulatory networks and other interactions in the cell (Alon, 2006Huttenhower et al, 2009Myers et al, 2009). Indeed, several computational biology departments have been established over the last few years that are focused on developing additional computational methods to aid in solving life science's greatest mysteries.

Computer scientists have also relied on biological systems for inspiration, especially when developing optimization methods (Table I). Early work on artificial intelligence in the 1960s leveraged ideas related to the activity of neurons in the brain to develop a class of computational methods known as neural networks (Hebb, 1949Hopfield, 1982Bishop, 1996), which have been used in many machine learning applications ranging from image segmentation to computing missile trajectories (Bishop, 1996). Other optimization techniques, such as genetic algorithms (Goldberg, 1989), were inspired by common operations in DNA sequence evolution and have been widely applied over the last 20 years. Social insects (Dorigo et al, 2006) and particle swarms (Kennedy and Eberhart, 2002) have also motivated the study of how self-organization emerges from local interactions (Abelson et al, 2000Committee on Frontiers at the Interface of Computing, Biology, and National Research Council, 2005). These ideas have been applied extensively to multi-agent system optimization (Deneubourg et al, 1990Ferber, 1999). A number of additional methods, including non-negative matrix factorization (Lee and Seung, 1999) and population protocols (Aspnes and Ruppert, 2009Chazelle, 2009) have also capitalized on biological insights to derive new computing paradigms.

Table 1

Examples of biological systems that have inspired computational algorithms

Biology Computer science References
Most of these examples were based on a high-level understanding of the system and therefore usually did not directly lead to new biological insights.
Neural networks Machine learning Hebb (1949)Hopfield (1982)Bishop (1996)
Sequence evolution Genetic algorithms Goldberg (1989)
Sequence evolution DNA computing Adleman (1994)Benenson et al (2004)Istrail et al (2007)Păun et al (2004)
Ant colonies, swarms Search optimization Kennedy and Eberhart (2002)Committee on Frontiers at the Interface of Computing, Biology, and National Research Council (2005)Dorigo et al (2006)
Social insects, flocks Network design Deneubourg et al (1990)Beckers et al (1994)Theraulaz and Bonabeau (1995)Coore et al (1997)Ferber (1999)Abelson et al (2000)Fewell (2003)Werfel and Nagpal (2006)
Immune system Modeling and optimization Forrest et al (1994)Kephart et al (1995)Garrett (2005)Greensmith et al (2005)
Visual system Dimensionality reduction Lee and Seung (1999)

While all of these methods have led to successful applications, they only relied on a high-level (and sometimes flawed) understanding of the biological processes they were based on, and thus they usually did not directly lead to new biological insights. Similarly, though novel computational methods have been developed to help researchers learn new biology, the application of these methods to the biological problem (i.e., the biological system itself) rarely fed back to help computer scientists design better algorithms. Thus, the two directions—relying on biological ideas to develop computational methods and using computational methods to study biology—remained largely separated (Figure 1).

An external file that holds a picture, illustration, etc.
Object name is msb201178-f1.jpg

Traditional studies versus computational thinking. (A) Traditionally, biologists leveraged computing power to analyze and process data (e.g., hierarchically clustering gene expression microarrays to predict protein function), and computer scientists used high-level design principles of biological systems to motivate new computational algorithms (e.g., neural networks). Rarely these two directions were coupled and mutually beneficial. (B) By thinking computationally about how biological systems process information (Nurse, 2008Hogeweg, 2011), we can develop improved models and algorithms and provide a more coherent explanation of how and why the system operates as it does.

Shared principles of computational and biological systems

There are many parallel requirements of computational and biological systems, which suggest that one can learn from the other (Figure 2). First, like virtually all large-scale computing platforms, biological systems are mostly distributed consisting of molecules, cells, or organisms that interact, coordinate, and make decisions without central control (Seeley, 2002Babaoglu et al, 2006Figure 2A). Second, biological processes need to be able to successfully handle failures and attacks to thrive (Jeong et al, 2000Kitano, 2004). Robustness is also a key property algorithm engineers covet when designing computing systems that persist in noisy environments. Third, networks serve as an important medium through which interactions occur and information propagates (Alon, 2006Figure 2B). In both settings, the structure of these networks is often directly linked to the system's function. Fourth, biological systems are often modular; that is, they reuse certain components in multiple, and sometimes very different, applications. This is a key design principle in many programming languages and in large complex networks (Kashtan and Alon, 2005Fortunato, 2010Figure 2C). Fifth, biological processes are often stochastic (Kaern et al, 2005) resembling randomized algorithms whose power has been well documented for the design of approximation algorithms and whose use ranges from single process systems to large distributed systems (Motwani and Raghavan, 1996Vazirani, 2004Figure 2D).

An external file that holds a picture, illustration, etc.
Object name is msb201178-f2.jpg

Parallels between computational and biological systems. (A) Biological and computational systems often coordinate and maintain functionality without relying on a central controller. (B) Networks are often the medium through which processes spread, either on the Web or in the cell. (C) Modularity is a common design principle in programming languages and also serves as the basis for increasing complexity and flexibility in evolving systems. (D) Stochastic processes in computational and biological settings help to remain robust in constrained, noisy environments.

These shared principles indicate that thinking computationally about biological systems may lead to better understanding of their function. However, there are also differences between biological and computational systems. While computational systems are often focused on speed, biological systems can adapt dynamically to changing environments and pressures and have a keen ability to reach joint decisions with very limited knowledge of their surroundings. These aspects, while desirable, are still not fully realized by computational systems. By studying how biological systems operate within these constraints and conditions to solve problems, we may be able to infer new directions for addressing these issues within computational systems.

With recent advances in our ability to generate and analyze biological data, new studies have emerged that have linked computational problems to biological problems, including cell-fate determination, slime mold tunneling, fault tolerance in regulatory networks, and bat navigation (Table II). In this review, we discuss a number of these studies focusing on three areas of interest: coordination in large distributed systems, network processes and design, and tracking and vision (Figure 3). For each area, we highlight several studies that demonstrate how computational thinking (Wing, 2006) can lead to new biological insights and new algorithms. We also discuss several additional biological processes for which better understanding of how they operate could lead to solutions to long-standing computer science problems.

An external file that holds a picture, illustration, etc.
Object name is msb201178-f3.jpg

Examples of biological systems and their analog computational problems. Bidirectional studies have led to insights into how biological systems operate, while at the same time advancing computational methods. These studies have explored coordination issues (A), network design principles (B), and vision problems (C). See text for complete details.

Table 2

Examples of new synergistic relationship between biology and computer science

Area Biological system Computational problem References Model Algorithm
The biological systems are divided into three areas: coordination, networks, and tracking and vision. For each system, we show the analogous computational problem and whether a model or algorithm was derived by previous works. Rows beginning with a star denote potential future work.
Coordination Fly SOP selection Maximal independent set Afek et al (2011)
  Fireflies flashing Synchronization Glass (2001)Lucarelli and Wang (2004)Hong and Scaglione (2005)Werner-Allen et al (2005)Babaoglu et al (2007)Pagliari and Scaglione (2011)
  *Octopus neural control Hierarchical computing Sumbre et al (2001) × ×
  *Fish shoals/honeybees Ensemble methods Marshall et al (2009)Conradt (2011)Ward et al (2011) ×
  *Quorum sensing Consensus (Marshall et al, 2009) ×
  *DNA replication Resource allocation Farkash-Amar et al, (2008) × ×
  *Mate selection Graph matching × ×
Networks Slime mold tunneling Network design Li et al (2010)Tero et al (2010)Watanabe et al (2011)
  Gene regulation Fault tolerance Gu et al (2003)Balaji et al (2006)Gitter et al (2009)Pomerance et al (2009)
  *Protein localization Routing Shapiro et al (2009) × ×
  *Brain networks Network design Chen et al (2006) × ×
  *Cell signaling Clustering Wokoma et al (2005)Charalambous and Cui (2008)
Tracking and Vision Visual cortex Object recognition Riesenhuber and Poggio (1999)Serre et al (2007a2007b)
  Echolocation in bats Localization Ghose et al (2006)Yovel et al (2010)
  *Visual cortex One-shot machine learning Li et al (2006)Serre and Poggio (2010) × ×

Note that from the computer science point of view, our focus in this review is on algorithms. Another important area which has both benefited from, and contributed to, the improved understanding of biological systems relates to engineering computational systems. We briefly discuss some examples for this direction including robotics and synthetic biology. However, this direction deserves its own review(s) and is thus out of the scope of this paper.

Coordination

Coordination is a major challenge for both computational and biological processes. At the molecular level, coordination is required to activate sets of genes that together respond to external conditions (Kaur et al, 2010). At the cellular level, cells coordinate to determine cell fate during development (Afek et al, 2011) and to synchronize heart beats (Mirollo and Strogatz, 1990). Organisms, including the octopus, need to coordinate the activation of limbs (Sumbre et al, 2001), while shoals of fish coordinate to avoid predators (Ward et al, 2011). In computational systems, coordination is required for virtually all large-scale computing systems. Examples include search engines that coordinate thousands of back-end servers to quickly respond to user queries (Brin and Page, 1998), sensor networks that aggregate data when monitoring environments (Mainwaring et al, 2002Werner-Allen et al, 2006), and mobile networks that synchronize data and schedules across multiple devices (Bernard et al, 2004). Below we discuss a few examples in detail.

The fly brain and maximal independent set

A recent example of a coordination-focused study that jointly addressed a biological and computational problem is from work on fly brain development. During development of the fly brain, some cells in the pre-neural clusters become sensory organ precursor (SOP) cells. These cells later attach to small bristles on the fly's forehead that are eventually used to sense the environment. It has long been known that the distribution of SOP and non-SOP cells follow a strict spacing pattern (Rendel et al, 1965). In particular, cells that become SOPs inhibit their neighbors so that SOPs are surrounded by non-SOPs (Figure 3A, left). However, the exact process by which such global organization is achieved in a short time and by starting with all-equivalent cells remains unknown.

While studying this problem, Afek et al (2011) made a new and interesting observation: the requirements from SOP selection—that each cell was either an SOP or directly connected to an SOP and that no two SOPs were connected—are similar to the formulation of the maximal independent set (MIS) problem in computer science. The MIS problem seeks to elect a set of local leaders in a graph such that all other nodes in the graph are connected to a member of the MIS and no two MIS members are connected to each other. An MIS in an ad-hoc wireless network serves as a routing backbone by which nodes can communicate. It also corresponds to a distributed partitioning of the nodes into clusters that can be used to optimize network bandwidth and resource distribution in the network. Thus, this is an important practical problem that has been studied for more than two decades. However, the solution used by the fly seemed different from all previous algorithms suggested for this task (Luby, 1985Alon et al, 1986). In particular, the fly's solution relied on very short messages and did not use any information regarding the number of neighbors that can still become SOPs.

To study how the fly was still able to solve the SOP selection problem under such limitations, Afek et al (2011) used microscopy experiments to follow SOP selection in developing flies. Coupling the results of these experiments with possible molecular processes for the activation of the specific proteins known to be involved in the SOP selection process, they discovered that a stochastic feedback process, in which selection probability increases as a function of time, provides the best match to the experimental results. Based on these findings, they were able to develop a new computational algorithm for MIS selection that is more efficient and more robust than previous methods. Unlike previous MIS selection methods (Luby, 1985Alon et al, 1986), the fly-based method does not use any knowledge about the number of neighbors a node has (its degree). Instead, with probability that increases exponentially over time, each node that has not already connected to an MIS node proposes itself as an MIS node. This leads to a selection pattern in which denser areas in the graph are assigned MIS nodes early on and less populated areas are assigned at later stages (see Figure 3A). While the original algorithm of Afek et al (2011) required that nodes know an upper bound on the total number of nodes in the network, a new version of this algorithm (Afek et al, 2011) removes this requirement, leading to the first MIS selection algorithm that does not require any knowledge of the network or its topology. This fly-based method is specifically appropriate for sensor networks in which nodes might be randomly and dynamically distributed in an environment and have no knowledge of global topology. Thus, by studying a very specific cellular developmental system, the authors helped resolve a long-standing computer science problem and improved our understanding of cell-fate determination.

Additional coordination problems

Several additional coordination problems have been studied, separately, in biology and computer science. The octopus presents a particularly interesting case with its eight arms. From what we know (Mather et al, 2010), the octopus employs a hierarchy of coordination elements. At the top level, the brain is in charge of what arm to use when, for example, seeking prey; however, at the lowest level each arm appears to act independently, adjusting its grip, size, and rotation without consulting the brain (Sumbre et al, 2001). It is believed that a large number of neurons in the octopus are outside the brain and in the arms which is highly unusual for vertebrates (Mather et al, 2010). Further, Mather et al (2010) claim that there is a third intermediate layer that allows coordination and communication between neighboring arms without going through the brain. Hierarchical distributed systems have also been studied as a means to improve load balancing, scheduling, and fault tolerance (Feitelson, 1990). Though no works that we are aware of has attempted to integrate these studies, lessons learned from the octopus can potentially be used to improve these systems, while also helping to better understand how different layers of control collectively coordinate actions.

Another example of a problem pertinent to biological and computational systems is distributed consensus (Simeone et al, 2008). Consensus is a building block in several computational procedures and is known to be an extremely hard problem for massive, asynchronous, and failure-prone systems (Lynch, 1996). In biology, consensus has an important role in coordinating populations. Fish, for example, are constantly under the threat of predation and must be able to quickly sense the environment and make decisions. The formation of groups can aid in collective vigilance because surveillance can be distributed among a set of fish. In addition, the group is likely to avoid individual errors by pooling sensory data together. However, the group also bears additional communication costs of propagating information to other members of the group, which delays decision making (Ward et al, 2011). Ward et al (2011) have empirically shown that fish make faster and more accurate decisions under predation threat in large groups (shoals) compared with smaller groups. The self-organization of shoals, the authors argue, naturally emerges from quick reaction to local neighbors that spreads quickly through the shoal. Honeybees have also been shown to optimally minimize the decision time for a desired level of accuracy (under certain models) when deciding between candidate new homes (Marshall et al, 2009). Here, honeybees also attempt to balance between accuracy (choosing the best home) and speed (time to make a decision).

Finally, synchronization is another form of coordination that many biological systems rely on, including pacemaker cells in the heart (Kuramoto and Yamagishi, 1990) and fireflies that flash in unison (Winfree, 1967Mirollo and Strogatz, 1990). By developing models and methods based on studying how fireflies synchronize (Mirollo and Strogatz, 1990), the theory of pulse-coupled oscillators has flourished and has been used in wireless sensor networks (Lucarelli and Wang, 2004Hong and Scaglione, 2005Werner-Allen et al, 2005Pagliari and Scaglione, 2011) and in overlay networks (Babaoglu et al, 2007) for clock management purposes and to integrate time-sensitive data. Algorithms based on these systems have resulted in faster convergence times and improved our understanding of biological self-stabilization (Daliot et al, 20032004). In all cases, as modeling efforts continue, it is possible that better distributed ensemble protocols can be developed, which may lead to new testable biological hypotheses.

Networks

Networks underlie many computational systems. Understanding the structure and dynamics of networks such as the Internet, social networks, and mobile networks has been the focus of several recent studies (Easley and Kleinberg, 2010). Key questions studied include how to best design such networks, how to determine what are the important nodes, how to route messages efficiently from one node to another, and how to maintain functionality despite errors or noise. These issues have also recently been the focus of several biological studies that have investigated how entities in molecular (signaling, regulatory, and metabolic) networks, cellular networks in the brain, and organism-level networks collectively interact and self-organize to efficiently accomplish tasks. We highlight a few recent examples below.

Slime mold tunneling for network design

When the slime mold Physarum polycephalum forages in an environment, it grows to become a tubular structure that links all the resources it finds together via direct connections or intermediate junctions (Figure 3B, left). Evolution has designed the network such that sources are connected via efficient paths, while also ensuring minimal production cost and robustness to disconnection. To better understand the design principles balancing these requirements, Tero et al (2010) created a map of 36 food sources (‘cities') placed on a map of the Tokyo area. Remarkably, they found that the slime-mold designed network and the actual Tokyo rail system were very similar. In particular, both networks had comparable cost (rail length), efficiency (short paths), and resiliency (redundant paths)—all under various definitions and models of these factors. What is interesting, however, is that the slime mold designed its tunneling system in a completely distributive manner without any global coordination. Clearly, the same cannot be said of the rail system.

Inspired by the characteristics of this process, Tero et al (2010) developed an algorithm for adaptive network construction based on driving flow through the network and dynamically selecting for preferential tubes (Figure 3C, right). They show that their model can produce networks that closely resemble the actual networks grown by the slime mold. Such a model—which mathematically captures how the slime mold self-organizes, self-optimizes, and self-repairs itself (Marwan, 2010)—can be used to quantify the trade-off between various desired network properties. These principles can be applied to designing mobile communication networks, in which various nodes or devices need to maintain communication pathways under a changing and dynamic environment. Indeed, follow-up work has shown that a model inspired by the slime mold foraging strategy can be used to design path formation protocols in wireless sensor networks (Li et al, 2010) and optimize traffic distribution in transportation networks (Watanabe et al, 2011) . On the other hand, mapping the slime mold tunneling process to a network design problem has also led to a greater understanding of how the slime mold balances conditions and constraints when foraging.

Gene backup and fault tolerant systems

Errors, failures, and attacks commonly occur in molecular processes. Examples includes DNA replication errors, protein failures due to misfolding or mutations affecting transcription, and pathogens that enter the cell and interfere with normal function. DNA repair, backup, and innate immunity are all cellular processes that have been remarkably designed to identify and overcome these issues. Gaining a better understanding of how these systems operate is important for many health-related studies.

Fault tolerance is also a critical design principle of computational systems. Consider a popular website backed by thousands of servers. At any time, one or more of these servers may fail due to error or malicious attack. Overcoming these issues and making them transparent to the user (by, e.g., redirecting queries to active servers that also act as backup mechanisms; Guerraoui and Schiper, 1996) is an important algorithm design problem. Improving these techniques is key in creating better and more usable distributed systems.

So, how do molecular systems deal with failures? Recent work suggests that molecular systems do not in fact use dedicated failure detection mechanisms. Rather, they seamlessly integrate backup resources into their design. Using single gene-deletion mutants in S. cerevisiaeGu et al (2003) found that knocking out duplicate genes (genes with paralogs) resulted in a higher probability of functional compensation (i.e., higher fitness) than when knocking out non-paralogous genes.

Gitter et al (2009) reported a similar result in gene regulatory networks. These networks encode the relationships between transcription factors (TFs) and the genes they regulate. When knocking out TFs with a strong paralog, it was observed that the expression level of its regulated genes did not significantly change, due to compensation by the TF's paralog. On the other hand, when knocking out a singleton TF, there was a bigger change in expression of direct targets due to the lack of a backup mechanism. Balaji et al (2006) propose a fault tolerant computational model based on the design of regulatory networks. To create a network that is fault tolerant to hub deletion they propose to take the original network and do a transformation (multiplying AA where A is adjacency matrix for the network). In the resulting network, two nodes (or TFs) will be connected if they share many common neighbors in the network. This common-neighbors structure arises naturally in biological systems from the duplication-divergence model discussed below.

Together, these studies show how backup is naturally built into decentralized molecular systems. Evolutionarily, it is possible that backup naturally emerged from the process of duplication and divergence. In this model, a gene is duplicated to yield a functional equivalent, which over time diverges to specialize in a similar subtask. Thus, the gene and its duplicate share similar sequence and interaction partners and either can be used as a backup gene for the other. The challenge, from the design standpoint, is to determine which local nodes (genes) should be replicated in order to globally minimize the effect of errors. Recent work on robustness of networks to node and edge failures (Ciliberti et al, 2007Zhong et al, 2009) may provide further information on how such ideas can be applied to computational systems.

Additional network problems

Effective routing of proteins to their designated subcellular locations is another problem that may benefit from joint computational and biological modeling. It is known that routing occurs via signals encoded in the amino-acid sequence of the protein (Shapiro et al, 2009); however, we are not aware of any model of distributed routing inspired by protein localization. Such a model, as with most other biological systems, might have several desired properties, such as efficiency and fault tolerance to specific conditions or environments.

Another network that holds much promise for future work is the network of neurons that are active in the brain. As mentioned previously, neural networks represent a computational optimization framework that is based on these networks. However, with our increased ability to study brain activity (Sporns, 2010), we can start addressing other questions about the design and architecture of real neural networks. For example, some models have been proposed to generate networks with brain-like connectivity (Kaiser and Hilgetag, 2004Koene et al, 2009) while minimizing wiring and path lengths (Chen et al, 2006). What local rules do these models employ and what do they tell us about why the brain is organized the way it is?

Vision and tracking

The visual system in the mammalian brain has an exceptional ability to process imagery and build a coherent representation of the world. It can seamlessly detect and track objects, reconstruct scenes, and restore images (Shapiro and Stockman, 2001), all in real time and under various amounts of noise, lighting conditions, and transformations. No computer algorithm today comes close to matching its performance. Understanding how this system works and replicating it in the computer has been a long sought goal of biology and computer scientists, respectively. By teaching a computer how to ‘see', automated systems can be designed for face detection and authentication, 3D model building, medical image processing, tracking, and surveillance, among many more (Szeliski, 2010).

Bat echolocation and tracking

Bats are notoriously well versed at tracking objects at night. Using their highly evolved sonar system, bats rely on echoes to forage and navigate efficiently at night. In biology, pursuit is a basic task to all prey-seeking animals. In computer science, localization is an important problem with many applications to defense and guidance (Figure 3C, left) systems. A common strategy for localization is to keep the stimulus of interest constantly centered in the field of view. However, recent results from bats led to the discovery that this common perception is not optimal.

Yovel et al (2010) studied how Egyptian fruit bats use echolocation to home-in on a target (Figure 3C, left). They measured the direction of the bats' sonar clicks in darkness and found that, instead of centering the sonar beam onto the target, they alternated the beam to the left and right of the target and thereby pointed the maximum slope of the beam onto the target. As a result, any subsequent motion of the object (relative to the bat) will maximally change the echo intensity and the sign of the change will indicate the direction of motion. They show that this strategy results in a less jittery path (i.e., a smoother trajectory) to the target that is easier and quicker to follow. In fact, the authors proved that this strategy is optimal with respect to localization, though it does pay some cost of possible misdetection (since less sound is emitted by not pointing the beam directly at the target; see Figure 3C, right).

This study shows that even high-level biological processes can reveal interesting and computationally relevant models that can help clarify the logic of biological actions.

The visual cortex and computer vision

Visual processing in the brain has long been modeled by hierarchical models. By encoding objects at varying levels of sophistication, hierarchies are believed to be capable of recognizing objects under various scales, perspectives, and natural conditions. Different levels of the hierarchy respond to stimuli that become increasingly complex; neurons in the lower levels respond to simple bars and shapes, whereas neurons in the upper levels respond to more complex combinations and whole objects (Serre and Poggio, 2010). Recently, Riesenhuber and Poggio (1999) and others (Serre et al, 2007a2007b) ‘opened up' the brain to see how well such hierarchical models fared to actual neurophysiological data of neuron activation patterns in the inferotemporal cortex in the macaque brain. From these data, they proposed a new hierarchical feed-forward model for object recognition that adheres better to anatomical and physiological constraints and that better matched the corresponding neural recordings. The model is based on a novel way to pool or integrate responses from lower level neurons by identifying the strongest, most active signal present. This makes recognition robust to random noise and fluctuation. By representing features that are invariant to transformations, the model also illustrates how the visual system is able to recognize objects across different scales given only a single view at one scale. From the computational standpoint, the model has been shown to perform well for robust object recognition, action recognition, and face processing compared with many previous (non-biologically inspired) techniques (Jhuang et al, 2007Serre et al, 20052007aMeyers and Wolf, 2008).

Additional tracking and vision problems

Research in computational neuroscience holds much promise for inspiring new models of visual processing (Serre and Poggio, 2010). While hierarchical feed-forward models were shown to closely replicate neural recordings for immediate recognition (before eye movement and high-level processing occurs), feed-back projections are also extremely prevalent in the cortex and must be incorporated to model visual perception beyond immediate recognition (Serre and Poggio, 2010). Better understanding of how feed-forward and feed-back connections interface with one another will likely lead to better algorithms and better understanding of how more advanced processing (e.g., memory and learning) occurs in the brain.

Another area where neuroscience may be helpful is in learning. Humans have a remarkable ability to recognize and detect objects very quickly (Thorpe et al, 1996) and using only a few training examples. Most computational classifiers today, however, typically require hundreds or even thousands of training examples to robustly learn discriminative features. It may be possible to develop additional one-shot machine learning algorithms (Li et al, 2006) by studying how visual processing circuits in the brain generalize concepts. There has also been work on drawing parallels between human and machine semisupervised learning (Zhu et al, 2007). In return, these computational models may lead to better theories of human learning that quantify its limitations and biases.

Engineering-based studies

While our focus in this review is on algorithmic aspects of bidirectional studies, another area that has proliferated in recent years involves the engineering of various systems by tightly linking computational thinking and biology. We briefly mention a few examples here, though as mentioned above these deserve a separate review. One example of such an approach is recent studies in robotics in which researchers aim to mimic several aspects of living organisms for tasks that include flying (Franceschini et al, 2007), crawling (Hatton and Choset, 2010) and distributed control of arm movements (Laschi et al, 2009). These studies often involve careful analysis of how these tasks are performed by organisms ranging from birds to snakes to octopuses. In some cases, such an analysis can also lead to new insights about biological systems. For example, Franceschini et al (2007) recently developed a robotic helicopter that demonstrated how flying insects manage to avoid the ground by using an optic-flow regulator, which shed new light on unexplained findings regarding insects' visually guided performances.

Synthetic biology and DNA computing are another biocomputational area that has received tremendous interest recently. Here, the goal is either to engineer novel systems using biological components (synthetic biology) or to use molecules to perform computations (DNA computing). In many cases, these biological constructs utilize ideas that were previously used when designing computational systems. One of the first studies that demonstrated the computational power of biology was done by Adleman who showed how DNA can be manipulated to solve complex graph-theoretic optimization problems (Adleman, 1994Păun et al, 2004). More recent studies have focused on the logical control of gene expression (Benenson et al, 2004), the construction of synthetic genomes and cells (Tian et al, 2004Gibson et al, 2010), and the replication of neural systems using DNA computing (Qian et al, 2011).

Discussion and conclusions

The mutual relationship between biology and computer science goes back a long time. However, until recently, the application of computational methods to study biological processes and the influence of biological systems on the design of algorithms remained mostly separate. The former involved developing computational methods to solve specific biological problems, while the latter primarily viewed biological systems as motivation to derive new optimization methods.

With the improvement in our ability to study detailed aspects of molecular, cellular, and organism-level biological systems, it is now possible to unite these two directions. This direction involves thinking about biological processes as algorithms that nature has designed to solve difficult real-world problems under a variety of constraints and conditions. By viewing these systems as information processing units, we can better understand their biological properties and their survival mechanisms, and can use the gained knowledge to design and improve computational systems.

Over the last few years, this strategy has been successfully used in a number of different application areas including coordination, networks, and tracking and vision. The list of studies in Table II is not meant to be comprehensive but rather is meant to point out various systems that have been studied this way and additional systems that are ripe for analysis. Table III lists the shared principles that were identified as the basis for some of these studies. Such studies usually begin by identifying a problem addressed by both computational and biological systems. For example, methods to cluster networks have been widely studied in computer science (Schaeffer, 2007Fortunato, 2010); a number of biological processes, including lateral induction during intercell signaling (Charalambous and Cui, 2008) and decision-making processes by quorum-sensing bacteria (Wokoma et al, 2005Ren and Meng, 2006), perform a very similar task. The next step, before designing experiments, is to think about the biological system's computational requirements. For example, is the system distributed? What is known about how communication occurs within the system? What qualities (e.g., robustness and efficiency) may the system be trying to optimize? Does randomness or stochasticity have a role? Such questions can help pinpoint particular biological questions regarding the function of the process and may provide a unique point of view for even well-studied processes. Next, a model is often constructed based on the experimental results and this model is used to design an improved solution to the computational problem. In the clustering example, perhaps the model can lead to improved clustering algorithms that operate in noisy and distributive environments.

Table 3

The five primary studies highlighted in this review (rows) each annotated with the principles it shares with computational systems (columns)

  Distributed Robust Networks Modular Stochastic Adaptive
Most biological systems operate distributedly and seek a design that is robust and adaptive to changing environments. Networks often serve as a basis for carrying forth interactions and propagating information. These similarities provide a deep basis for the shared analysis of biological processes and computational algorithms.
Fly SOP selection ×
Slime mold tunneling ×
Gene regulation ×
Bat localization × × × ×
Brain processing ×

In this review, we argued that by adopting a ‘computational thinking' approach to studying biological processes, we can improve our understanding of biological processes and at the same time improve the design of algorithms.

Saket Navlakha and Ziv Bar-Joseph

Source

Photo: https://get.pxhere.com

Similar articles:

Use of Mathematics in Biology

"My aim in this book has been to show that ...

Algorithms in Nature

Computer science and biology have shared a long history together ...

COOKIE

Our site collects information using cookies to be more convenient and customized to your needs interests. The purposes of the use of cookies are defined in Policy the processing of personal data .If you agree to continue to receive cookies, please click the "Accept" button. If you don't agree or want to resolve this issue later, please change your browser cookie settings.