Various Benchmarks have played an important role in various domains of machine learning such as MNIST (LeCun et al., 1998), Caltech101 (Fei-Fei et al., 2006), CIFAR (Krizhevsky & Hinton, 2009), ImageNet (Deng et al., 2009).
However, there is a lack of standardized testbed for Reinforcement Learning algorithms. Various benchmarks released by OpenAI such as Procgen, Obstacle Tower by Unity Technologies are serving as a state of the art.
Nevertheless, In this article, we are going to do hands-on with training various RL agents across some classical control tasks such provided by OpenAI Gym.
These environments are designed specifically for low-dimensional tasks that provide quick evaluations and comparisons of RL algorithms.
The overview and details of various environments are cited within OpenAI Environments.
- Acrobot-v1
- The system has two joints and two links, where the joint between the two links is actuated. Initially, the links are hanging downwards, and the goal is to swing the end of the lower link up to a given height.
- First mentioned by – R Sutton, “Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding”, NIPS 1996.
- Cartpole-v1
- A pole is attached by a un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.
- First mentioned by – AG Barto, RS Sutton, and CW Anderson, “Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem”, IEEE Transactions on Systems, Man, and Cybernetics, 1983.
- MountainCar-v0
- A car is on a one-dimensional track, positioned between two “mountains”. The goal is to drive up the mountain on the right; however, the car’s engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum.
- First mentioned by – A Moore, Efficient Memory-Based Learning for Robot Control, Ph.D. thesis, University of Cambridge, 1990.
- MountainCarContinous-v0
- A car is on a one-dimensional track, positioned between two “mountains”. The goal is to drive up the mountain on the right; however, the car’s engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum. Here, the reward is greater if you spend less energy to reach the goal.
- First mentioned by – A Moore, Efficient Memory-Based Learning for Robot Control, Ph.D. thesis, University of Cambridge, 1990.
- Pendulum-v0
- The inverted pendulum swing-up problem is a classic problem in the control literature. In this version of the problem, the pendulum starts in a random position, and the goal is to swing it up so it stays upright.
- A2C – Synchronous, deterministic variant of Asynchronous Advantage Actor-Critic (A3C)
- PPO2 – Successor to TRPO, is a family of first-order methods that use a few other tricks to keep new policies close to old. Here, is a great video to understand PPO in more detail.
- ACER – Sample Efficient Actor-Critic with Experience Replay (ACER) combines several ideas of previous algorithms. It uses multiple workers (as A2C), implements a replay buffer (as in DQN), uses Retrace for Q-value estimation, importance sampling and a trust region.
- ACKTR – Actor-Critic using Kronecker-Factored Trust Region (ACKTR) uses Kronecker-factored approximate curvature (K-FAC) for trust-region optimization.
The algorithms have detailed coverage in the link provided. However, the objective of this guide is to quickly implement the algorithms and compare them against various environments from classical control theory literature.
For ease of implementation, we have leveraged the Stable-Baselines package from OpenAI.
The following notebook implements the above-mentioned algorithms to train an RL agent for 1 million episodes across the classic control environments.
The agents were further tested on 1000 timesteps across all the environments.
The entire implementation was done using Macbook pro.
This section provides a comparative summary of various algorithms across different environments as provided by OpenAI Gym.
The above figure shows the average runtime comparison for various RL algorithms across different environments which were used to train the agent.
We can infer that PPO2 is taking more than 350 seconds on average to train in various environments.
The above figure shows the average rewards collected while testing for 1000 steps on various environments using different algorithms.
The average reward earned while testing for 1000 steps is highest in the case of Pendulum-v0 followed by the CartPole-v1 environment.
The above figure shows the Average runtime across environments using different algorithms.
We can infer that the MountainCar-v0 environment takes the highest amount of time on an average to train the various agents.
Based on the insights above, we can conclude that Acrobot-v1 is difficult for most of the algorithms as it has the lowest mean reward irrespective of the magnitude.
PPO2 takes the maximum amount of time to train across environments, one of the possible reasons can be a lack of GPU in the hardware spec.
📣 Want to advertise in AIM? Book here
Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain.I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning.I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.
Related Posts
Did Microsoft Spill the Secrets of OpenAI?
OpenAI Must Earn $100 Bn to Prove AGI’s Worth to Microsoft
Microsoft Will No Longer Be Exclusive to OpenAI
OpenAI o3 Consumes Five Tanks of Gas Per Task
Sam Altman Turns a Hype Master
Italy Fines OpenAI 15 Million Euros Over ChatGPT Privacy Violations
OpenAI Set to Release o3 Soon
Google Unveils Gemini 2.0 Flash Thinking, Challenges OpenAI o1
Our Upcoming Conference
MLDS 2025
India's Biggest Developers Summit
February 5 – 7, 2025 | Nimhans Convention Center, Bangalore
When AI Meets Bengaluru Traffic, this Knight is in Control
Vandana Nair
“Since Bengaluru is like the tech capital of India and a lot of AI-based startups are in Bengaluru, we have been lucky enough to be able to work with a
Chip Makers are Betting Big on US
Sanjana Gupta
The H-1B Visa Policy Change Might be Good News for Indian IT
Mohit Pandey
Latest AI News
Synaptics, Google Join Hands to Advance Edge AI for IoT Devices
Shalini Mondal
NVIDIA to Make AR Glasses Soon
Supreeth Koundinya
Blinkit Launches 10-Minute Ambulance Services in Gurugram
Mohit Pandey
AIIMS Delhi Allocates ₹300 Crore to Push AI-Driven Healthcare in India
Vidyashree Srinivas
Maharashtra to Draft New AI Policy with Focus on Datasets
Vandana Nair
IISc Develops ML Model to Predict Material Properties with Limited Data
Vandana Nair
Subscribe to The Belamy: Our Weekly Newsletter
Biggest AI stories, delivered to your inbox every week.
Flagship Events
MLDS 2025
February 5 – 7, 2025 | Nimhans Convention Center, Bangalore
Rising 2025 | DE&I in Tech & AI
Mar 20 and 21, 2025 | 📍 J N Tata Auditorium, Bengaluru
Data Engineering Summit 2025
15-16 May, 2025 | 📍 Taj Yeshwantpur, Bengaluru, India
Cypher India 2025
17-19 September, 2025 | 📍KTPO, Whitefield, Bangalore, India
MachineCon GCC Summit 2025
19-20th June 2025 | Bangalore
GenAI
Corner
View All
Accenture Hits Record $4.2 Billion in Generative AI Sales
Composable Architectures Are Non-Negotiable
How AI Dragons Set GenAI on Fire This Year
From POC to Product: Measuring the ROI of Generative AI for Enterprise
Rabbitt AI Announces Strategic Applications of Generative AI in Defense
Choosing LLMs, Onboarding Employees, and Reinvesting in Generative AI for Enterprise
Can Google Beat AI Rivals and Keep the Ad Cash Rolling?
The Transformative Impact of Generative AI on IT Services, BPO, Software, and Healthcare