Skip to content

Khoulii/Deep-Reinforcement-Learning-Based-Robotic-Arm-Control-Single-Joint-Simulation-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep-Reinforcement Learning Based Robotic Arm-Control: A Single Joint Simulation

Problem Statement

The project focuses on controlling a robotic arm, particularly a one-degree-of-freedom joint, to perform the inverted pendulum swing-up task. The goal is to manipulate the joint effectively by applying torque to swing the pendulum into an upright position. This classic control problem involves learning a policy that allows the agent (robotic arm) to balance the pendulum, with continuous action and observation spaces, and rewards based on the angular position, velocity, and applied torque. Also I will attempt using a bottom-up approach based on Deep Q-Networks (DQN).

Environment

The environment used for this task is "Pendulum-v1", sourced from OpenAI's Gymnasium framework. The simulation allows the agent to apply torque to swing the pendulum upright, with:

  • Action space: Torque range from -2.0 to 2.0.
  • Observation space: x-y coordinates (constrained between -1.0 and 1.0) and angular velocity (constrained between -8.0 and 8.0).
  • Rewards: Based on angular position, velocity, and applied torque.
  • Customizable acceleration of gravity (default: 10.0 m/s²).

Implementation

Using Python, a rule-based policy was developed to control the robotic arm in the inverted pendulum swing-up task. This policy targets angular velocity and selects actions by dividing the angular velocity range into intervals. The testing phase involved applying the rule-based policy over 100 episodes and accumulating episodic rewards for evaluation. Metrics included average test scores and a moving average to identify trends. The DQN algorithm is implemented with a neural network designed with three fully connected layers, using experience replay to enhance learning stability. The training loop involves:

  • Action selection using an epsilon-greedy strategy.
  • Neural network updates based on mean squared error loss between predicted and actual actions.
  • Epsilon decay to reduce exploration over time.

Results

The simple rule based system revealed challenges in effectively controlling the robotic arm:

  • Negative episodic rewards ranged from approximately -1315.9 to 1933.65.
  • An average test reward score of -1624.35 indicated suboptimal performance.
  • The episodic reward plot exhibited a fluctuating pattern, showing no clear improvement trend.

During testing over 500 episodes on the DQN algorithm, the following observations were made:

  • Negative trend in average episode rewards, with a score of -1185.4.
  • Stabilization around episodes 200 to 300, but the agent performed below average.
  • The learning process was indicated by a downward trend in the loss per training step and a balanced exploration-exploitation trade-off.

The DDPG approach showed significant improvement:

  • The agent's reward stabilized around 200 after 100 episodes.
  • Average score of -156.3 demonstrated a huge improvement compared to both DQN and the rule-based agent.
  • Decreasing actor and critic losses indicated improvements in action selection and value estimation.

The SAC approach yielded similar enhancements:

  • The agent's reward stabilized around 200 after approximately 70 episodes.
  • Average score of -213.2 showed significant improvement compared to previous methods.
  • Stable alpha loss indicated a good balance between exploration and exploitation, with decreasing Q1 and Q2 losses demonstrating better accuracy in critic estimations.

Conclusion

The refined intelligent system, utilizing both DDPG and SAC algorithms, successfully addressed the limitations of the initial top-down and bottom-up approaches. The improvements achieved through advanced algorithms and neural network architectures demonstrated the capability of effectively controlling the robotic arm in the challenging inverted pendulum swing-up task.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages