Introduction
For more than a decade, the dream of the autonomous farm has rested on a deceptively simple premise: if a machine can recognize a ripe fruit, it can pick it. Billions of dollars and thousands of research papers have been poured into teaching robots to see: to distinguish a ripe tomato from an unripe one, a fruit from a leaf, a target from its background. By almost any measure, that effort has succeeded. Modern detection systems identify tomatoes with accuracies exceeding 95%, rivaling and often surpassing the human eye.
Yet the autonomous farm has not arrived. In greenhouses and open fields alike, harvesting robots still fumble, miss, and bruise. The uncomfortable truth of agricultural robotics is that seeing a tomato and successfully picking it are two entirely different problems; for years, the field has quietly conflated them.
A new study from Osaka Metropolitan University reframes the question in a way that may prove more consequential than another increment of detection accuracy. Assistant Professor Takuya Fujinaga, of the Graduate School of Engineering, proposes that a robot should not merely ask “Can I pick this tomato?” but rather “How easy will this tomato be to pick?”, and then act accordingly. He calls it harvest-ease estimation, and in trials his system achieved an 81% successful harvest rate, with the robot intelligently adjusting its strategy when its first attempt failed. Published in Smart Agricultural Technology, the work marks a quiet but profound shift: from perception to judgment, from detection to decision.
The Detection Era: When Machines Learned to See
To understand why Fujinaga’s reframing matters, one must first appreciate how thoroughly the field has been dominated by the problem of vision.
The arrival of deep convolutional object detectors, and the YOLO (“You Only Look Once”) family in particular, transformed agricultural perception. Researchers adapted these architectures to the orchard and the greenhouse with remarkable results. Modified YOLOv3-based detectors reported average precisions approaching 98-99% for tomato detection, and models such as YOLO-Tomato maintained identification rates above 94% even when fruit was partially obscured by foliage. More recent systems went further still, predicting not just where a fruit is but how it is oriented, estimating grasp position and peduncle pose to within a few degrees.
By the early 2020s, the verdict seemed clear: robots could see tomatoes about as well as anyone could reasonably ask. The bottleneck, everyone assumed, was nearly cleared.
The Harvesting Gap: Why Seeing Was Never Enough
It was not. As detection accuracy climbed toward perfection, harvesting success rates stubbornly refused to follow. Across the literature, end-to-end robotic harvesting systems have tended to plateau in the low-to-mid 80% range: a platform built on the Deep-ToMaToS network reported 84.5% success, while a YOLOv8-based system reported 83.3%, accompanied by a measurable rate of crop damage. The distance between near-perfect recognition and merely good harvesting exposed an inconvenient reality.
The problem was never really seeing. It was acting.
Analyses of failed picks tell a consistent story: roughly two-thirds of failures trace not to misidentification but to the physical act of approach, namely arm misalignment, variability in stem orientation, and imprecise cutter positioning. Above all looms the problem of occlusion. A tomato growing in a dense cluster may be perfectly visible yet practically unreachable; leaves and neighboring fruit block the gripper, distort depth estimation, and turn a confident detection into a failed grasp. The robot knew exactly what it wanted. It simply could not work out how to get it.
This is the gap that detection-centric thinking cannot close, because the question it answers, is there a ripe tomato here?, is the wrong question. The right question is about the geometry of the pick itself.
A New Question: From “Can It Pick?” to “How Easy Is the Pick?”
Fujinaga’s contribution is, at its heart, a reframing. Rather than treating harvesting as a detection problem followed by a mechanical afterthought, he treats the ease of harvesting as the central quantity to be estimated: a measurable, predictable property of each individual fruit.
“This moves beyond simply asking ‘can a robot pick a tomato?’ to thinking about ‘how likely is a successful pick?’” Fujinaga explains. The distinction sounds subtle, but it changes everything downstream. A robot that estimates harvest-ease can rank the fruits in front of it, choose the most promising target, select the best direction of approach, and, crucially, recognize when a particular tomato is simply not worth the struggle.
In doing so, the work establishes harvesting ease as a quantifiable metric rather than an emergent accident. It gives the machine something it never had before: a basis for judgment.
Inside the Method: Where Vision Meets Statistical Judgment
The system combines image recognition with statistical analysis. Where a conventional pipeline would stop at “this is a ripe tomato,” Fujinaga’s model presses further, weighing the visual evidence (the fruit itself, the shape and position of its stem, the degree to which leaves and neighboring fruit obscure it, and the way it sits within its cluster) to estimate how readily it can be reached and detached.
From this analysis the robot derives the optimal approach direction for each fruit, rather than blindly reaching straight ahead. In the system’s “robot-eye view,” mature fruit is rendered in red, immature fruit in green, and the chosen target in blue: a visualization of a machine that is not merely detecting, but deliberating.
The result is a harvesting strategy computed per fruit, grounded in the specific obstacles each tomato presents. It is the difference between a worker who lunges at everything red and one who pauses, tilts their head, and chooses their angle.
The 81%, and the Quarter That Learned to Adapt
In testing, the system achieved an 81% successful harvest rate, a strong figure on its own, and one Fujinaga reports as exceeding expectations. But the more revealing number lies beneath the headline.
Roughly a quarter of the successful harvests were not won on the first attempt. When a front-on approach failed, the robot reassessed and switched to a side approach, reaching from the left or the right, to complete the pick. In other words, a meaningful share of the system’s success came not from getting it right the first time, but from recognizing failure and adapting in response.
This is the quiet breakthrough. A detection-driven robot that misses simply misses. Fujinaga’s robot treats a failed attempt as information, updates its strategy, and tries a better angle. It does not merely see and act; it senses, acts, evaluates, and reconsiders: the rudiments of the “sense, think, act” loop that defines genuinely intelligent machines.
Toward Collaborative Agriculture: A Field Shared by Humans and Machines
The timing of this advance is not incidental. Agriculture across much of the developed world faces a deepening labor crisis, driven by aging rural populations and a shrinking pool of willing workers. The pressure to automate is intense. Yet selective harvesting, where ripe and unripe fruit hang side by side on the same vine, has long resisted full automation.
Fujinaga’s vision does not pretend otherwise. Instead of chasing the fantasy of a robot that picks everything, he imagines a division of labor calibrated to harvest-ease itself: “Robots will automatically harvest tomatoes that are easy to pick, while humans will handle the more challenging fruits.” A robot that can estimate difficulty can also estimate its own limits, and hand off the hardest cases to human hands.
This is a more honest, and more achievable, model of agricultural automation than the all-or-nothing autonomy that has dominated the field’s imagination. It positions the robot not as a replacement for the farmworker but as a tireless partner that clears the easy majority, freeing skilled humans to concentrate where their dexterity and judgment still matter most.
Conclusion
The history of agricultural robotics has been, until now, a history of perception: a long and largely successful campaign to teach machines to see. Fujinaga’s work suggests that the next chapter will be a history of judgment: of machines that weigh difficulty, choose their approach, and adapt when they fail.
By reframing the question from can it be picked? to how easily can it be picked?, harvest-ease estimation closes the gap that raw detection accuracy never could. An 81% success rate, with a quarter of those successes earned through adaptation, is not the end of the story, but it is convincing proof that the field has been measuring the wrong thing for too long.
The deeper lesson reaches beyond the tomato vine. As robots move out of the structured factory and into the messy, occluded, unpredictable world (the orchard, the warehouse, the home), the decisive capability will not be how well they perceive, but how wisely they decide. The most intelligent machine in the field, it turns out, is not the one that sees the most fruit. It is the one that knows which fruit to reach for, from which direction, and when to leave the hardest pick to someone else.
How much of intelligence, in the end, is simply knowing what not to attempt?
References
-
Osaka Metropolitan University. “AI-powered robot learns how to harvest tomatoes more efficiently.” ScienceDaily, 18 March 2026 - https://www.sciencedaily.com/releases/2026/03/260317064512.htm
-
Fujinaga, T. (2025). “Realizing an intelligent agricultural robot: An analysis of the ease of tomato harvesting.” Smart Agricultural Technology, 12, 101538. - https://doi.org/10.1016/j.atech.2025.101538
-
Osaka Metropolitan University. “RoboCrop: Teaching robots how to pick tomatoes.” EurekAlert!, 2025 - https://www.eurekalert.org/news-releases/1108867
-
SciTechDaily. “Robots That ‘Think Before They Pick’ Could Transform Tomato Farming.” 2025 - https://scitechdaily.com/robots-that-think-before-they-pick-could-transform-tomato-farming/
-
Phys.org. “RoboCrop: Teaching robots how to pick tomatoes.” December 2025 - https://phys.org/news/2025-12-robocrop-robots-tomatoes.html
-
“Tomato harvesting robotic system based on Deep-ToMaToS: Deep learning network using transformation loss for 6D pose estimation of maturity classified tomatoes with side-stem.” Computers and Electronics in Agriculture (2022). - https://www.sciencedirect.com/science/article/abs/pii/S0168169922006123
-
“Tomato detection based on modified YOLOv3 framework.” Scientific Reports (2021). - https://www.nature.com/articles/s41598-021-81216-5
-
“YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3.” Sensors (2020). - https://pmc.ncbi.nlm.nih.gov/articles/PMC7180616/
-
“YOLO-GPP: End-to-end prediction of the grasp position and pose on tomato peduncle for robotic harvesting.” Smart Agricultural Technology (2026). - https://www.sciencedirect.com/science/article/pii/S2589721726000206