How Machine Learning and Precise Data Annotation Shape the Future of Self-Driving Cars
Autopilot is not just a convenient option for drivers; it is one of the most significant achievements in autonomous technology: ADAS systems and self-driving cars. It is implemented in vehicles starting at Level 2. The higher the level, the smarter the automatic driving system. In 2025, nearly 60% of new cars sold worldwide will feature Level 2 autonomy. While Level 2 vehicles are expected to remain dominant through 2030, Level 3 and Level 4 autonomous cars are projected to make up approximately 8% of total new car sales. This growth is driven by increasing demand for autonomous driving technologies, predictive maintenance, and connected vehicle features.Keymakr
Zoya Boyko, PM of automotive projects at Keymakr, says that training such models requires a deep understanding of various factors, from road conditions to users’ behavioral characteristics. Further, we will discuss the specifics of annotation, machine learning processes, and real-world cases of this solution.
The Importance of Autopilot
The autopilot system must not only control the vehicle, monitor its movement, and maintain the required speed but also perceive everything happening around it to ensure safe driving. It must recognize traffic signs, lights, road markings, people, cars, bicycles, and other objects. All this data must be processed correctly so the autopilot can respond effectively and make the necessary decisions.
Zoya explains: “It’s important for the system to make adequate decisions in emergencies. For example, if the driver temporarily disables the autopilot due to the need to intervene urgently, the model should still control the situation and activate a different algorithm in case of a threat to the passengers’ safety.”
How is Autopilot Evolving?
The advancement of autopilot is happening gradually, shaping the future of autonomous driving step by step. More than 95% of new vehicles are now equipped with driver-assistance features, such as adaptive cruise control, lane-keeping assist, or parking with camera — all of which already help drivers in certain situations — but they are not yet full autopilot systems. These technologies can manage speed, maintain a safe distance, and analyze road conditions, but the possibility of driver intervention remains.
Zoya notes, “Current autopilot systems are not the final solution yet. They assist with quick airbag deployment or enhanced parking systems, but do not fully replace the driver. We are at a stage that is preparing us for full autopilot in the future.”
Beyond just assisting with driving, autonomous systems are integrating with smart city infrastructure, allowing cars to “communicate” with traffic lights, pedestrian crossings, and other vehicles to optimize traffic flow and reduce congestion.

The Role of Data Annotation
Autonomous driving models require massive amounts of high-quality labeled data to function effectively.
To put this into perspective:
- Millions of Miles, Billions of Labels – A single self-driving car generates about one terabyte of data per hour, requiring continuous annotation and refinement. The more complex the task, the more data is needed to train the model. For example, when the system must recognize people of different ages and body types, the model needs to be trained on diverse data to account for all variations. For instance, during the study conducted by researchers at King’s College London and Peking University, eight AI-powered pedestrian detection systems were used, using over 8,000 images. The findings revealed that detection accuracy was approximately 7.5% higher for light-skinned individuals compared to dark-skinned individuals. This disparity was attributed to training datasets containing more images of light-skinned pedestrians, leading to biased detection performance.
That is why diverse data annotation is crucial for fairness in AI-driven decisions. It is also important for the model to work with poor data, such as images from blocked cameras or sensors. Zoya emphasizes, “Data diversification and the model’s ability to work in different conditions are key to successful training and improving the autopilot’s performance.” - Diverse Road Conditions Matter – A model trained only on well-maintained roads in sunny weather will struggle in real-world conditions. This is why annotation must cover extreme weather, poor road conditions, and varied landscapes, from urban congestion to remote dirt roads. Zoya explains, “At Keymakr, we’re working on a project to differentiate road types, such as wet snow, dirt roads, or asphalt roads. Previously, we only identified puddles, but now we distinguish between wet roads and puddles, which is important for the autopilot’s correct response. This data helps the car accurately assess braking distance and other parameters. We also analyze damaged or congested roads, which helps us train the model to respond to these conditions.”
- Reusing and Refining Data – The training process doesn’t just rely on new datasets. Reusing and refining old datasets helps improve model performance as conditions change or new obstacles appear on the road. This iterative learning process allows autonomous systems to adapt continuously.

Challenges and Developments
Sometimes, the data annotation process faces challenges, especially when image objects are difficult to recognize. Zoya provides an example: “In one case, we were dealing with hundreds of stones in the same image, all of which needed to be labeled in detail so that the AI could learn to distinguish size and shape “.
When the annotation process requires additional clarifications or changes, clients can make adjustments, and the annotation team adapts its approach to reflect the required details as accurately as possible. “There have been many situations where our Keymakr team needed to figure out how to properly annotate specific elements, like a person’s face in a car. We encountered problems such as annotating the skeleton of a person wearing a t-shirt or holding a newspaper when the system couldn’t accurately identify where the shoulder was if it was hidden. We concluded that clarifying how to annotate these moments is important so the model can work accurately with various situations. For example, annotating a seatbelt also raised many questions. How do we correctly annotate a seatbelt on a person in a car? It took a lot of time until we reached a common understanding,” Zoya says.
Data annotation can also differ depending on the country. For example, driving directions in the UK differ from those in Europe, and Japan has a different system for traffic signs and lights, which is crucial when training autopilots. In Asia, there are rickshaws and mopeds with three or four wheels, which require adapting the models to the diversity of transportation.
The future of autopilot is about overcoming current technological limitations and creating systems that learn and adapt continuously. This will require a balanced combination of advanced machine learning techniques, precise data annotation, a human-in-the-loop approach, and real-time data feedback from the road.
As Zoya concludes, “The journey towards fully autonomous vehicles is a marathon, not a sprint. It will require patience, precision, and continuous learning. But with each step, we’re getting closer to a future where autopilot isn’t just an option, but the standard.”