Tilting the Odds in Plant Breeding

A greenhouse with plants growing in tubes.

How Math and Data Science Accelerate Innovation While Conserving Resources

His name is Phani Chavali, and he leads the Analytics team in Plant Breeding at Bayer, where he has worked for the past six years. With a Ph.D. in Electrical Engineering, Phani specializes in advanced statistical inference, signal processing methods and machine learning, and he’s fascinated by the challenge of solving problems.


So what is he doing working in plant breeding?

“If you go to the fundamentals, breeding is just a numbers game, one that becomes more complex as more variables are considered,” says Chavali.


That’s because genetic traits are passed along according to the laws of probability. Chavali believes the partnership between the data and plant sciences can help shape a new era of precision breeding, minimizing the uncertainty in the process.


High Costs & Improbability of Success

In modern agriculture, plant breeding is the engine that drives innovation in crops. Improved harvests, disease resistance, plant vigor, drought tolerance–each of these is the product of Mendelian Laws of Probability.


Prior to the introduction of advanced math and data science, breeders had to establish large scale growing experiments and observe how the genetic crosses performed. Many generations of trial and error were required to achieve desirable outcomes. This required not only time, but also significant amounts of land, water, energy and other precious natural resources.

A man smiling in a green and blue sweater.
Before, we would create a large experimental event and hope for the best result, then make selections based on what was observed in the field. In order to test that, we had to evaluate many varieties in the field
Phani Chavali
Global Breeding

Success required collecting data and observations from thousands of parental breeding lines, which was very time consuming. Potential solutions then had to be matched to specific grower needs, requiring even more time in field testing. 


The Cloud Rolls in and Changes Everything

The past five years have seen incredible progress in the technologies that support data analytics and artificial intelligence, and this has had a profound impact on breeding operations.


Foremost among those tools is cloud computing, which has recently become far more available and more affordable. This means staggering amounts of data can be stored, cleaned, organized and studied with relative ease. This is great news for Chavali and his team at Bayer, as well as university researchers and independent plant scientists all over the world.

At the same time, another breakthrough–the artificial neural network–advanced to the point where it could provide genuine value. These layers of interconnected computing systems provide a framework for advanced data analytics. They are vitally important to machine learning, a type of artificial intelligence. When combined with the leaps forward in processing power, this represents a vast improvement in predictive modeling efficiency.


Even in the most accurate and informed breeding programs, it can take many attempts to achieve a desirable trait. This is because some traits are complex, meaning they are associated with multiple genes across the genome of the plant. In addition, every plant must be tested in a variety of growing conditions, to ensure we are delivering farmers the solutions they need for their specific mix of climate, soil type and agronomic practices. With predictive analytics and machine learning, however, desirable outcomes are more likely to be achieved within a reasonable timeframe, reducing the resources required.


The Real Benefits of Artificial Intelligence

Plant breeders use predictive modeling algorithms, supported by machine learning, to give themselves an advantage. They can do this because of the wealth of quantitative information we have gathered about plant traits over the years.


“As we were scaling our breeding and field research efforts and starting to collect more dimensions of data, it became virtually impossible for a human being to sift through the data to make informed decisions,” says Chavali.


Math helps reduce the number of turns required to "solve" the breeding puzzle, producing a plant with the desirable combination of traits, tailored to the farmer's unique growing conditions.

In response to this challenge, Bayer built an artificial intelligence assistant that helps breeders select the right candidates in the breeding program. It relies on cloud-based algorithms built on a foundation of roughly 1.7 trillion calculations, enabling a dramatic shift in the scale and speed of the breeding pipeline.


Breeders know, for example, where the genes associated with certain traits are located within the plant’s genome. They also understand the likelihood of those genes showing up within any given generation. As a result, advanced mathematics and data science models can help breeders identify the plants needed to achieve a specific collection of traits.


One such model uses a “branch and bound” algorithm. The computer predicts the potential of multiple breeding lines (branches), which lets breeders avoid certain lines that aren’t likely to achieve the intended result. The model produces a breeding schedule for the remaining line, while also recommending the minimum number of plants required to achieve the desired trait.


Throw in the neural networks, and these models are literally learning throughout the entire process. In other words, the data models provide breeders a road map to follow. With every generation, that map becomes more accurate and efficient.


Nothing More Natural Than Math

It makes sense that we’re using math to better understand natural processes like breeding and heritability. After all, mathematics are everywhere in nature. Seashells, snowflakes and sunflowers are just a few places where we can find patterns and combinations that follow mathematical rules.


Following nature’s lead, our breeding innovators are also using mathematics to create desirable crops for farmers. Along the way, they are accomplishing some pretty amazing things.


Every single plant requires light, land, water, soil or another growth medium and nutrients. Data science reduces the number of plants needed to produce a trait, saving not just time and money, but vital natural resources and energy.



The use of machine learning and data science leads to a significant reduction in the use of resources, compared with standard breeding, meaning less water, energy, land and nutrients are required.

What does this mean for farmers? Because more data is collected on every product, it’s possible to make better recommendations, that are tailored to their unique circumstances, based on in-depth analysis of plant performance in a diverse range of scenarios.


And consumers? They can choose from a variety of nutritious and sustainably grown produce at markets across the globe.


Greater accuracy. Fewer resources. Less time. Incredible energy savings. Enough food to meet the needs of a growing population. All made possible because we took the wisdom from centuries of plant breeding and combined it with modern forms of technological intelligence. Advanced math and data science have already demonstrated their value to agriculture, and the world is seeing the results. Isn’t it nice for math to decrease anxiety for a change?

6 min read