OpenAI-Gym

Download this notebook Author: Oliver Mai As presented in this YouTube video by Phil Tabor Gym Environment: LunarLander-v2 This environment is inspired by a subfield of Optimal Control: rocket trajectory optimization. Sadly the documentation is a bit lacking, but we will briefly talk about features of this environment. In “LunarLander-v2” the agent (or human player) controls a spacecraft, which is supposed to be landed on a planetary surface. The lander can only be moved in a 2D plane (Note: This environment requires the 2D physics engine “Box2d”, which can be installed by pip install -e '.

Download this notebook This tutorial will focus on the multi-armed (k-armed) bandit problem and two solution strategies, namely $\varepsilon$-greedy and $\varepsilon$-decay strategies. Author: Oliver Mai Problem setup The multi-armed bandit problem can be imagined as playing a game of slot-machines, where there are multiple arms to pull (either because one bandit has mutliple arms or because there are multiple bandits). The goal of the game is then to maximize the rewards obtained by pulling on any of the $k$ arms, without knowing how likely you are to receive a reward pulling each individual arm.

Download this notebook This notebook introduces the python package gym from OpenAI and employs a basic search strategy for finding a policy in the frequently used environment “CartPole-v1”. Author: Oliver Mai First we import the relevant packages import gym # the package that supplies us with environments and useful tools import numpy as np # for later array manipulation seed = 42069 # general seed for reproducibility np.random.seed(seed) Environment Now we import the CartPole-v1 environment and take a random action to have a look at it and how it behaves.

Download this notebook This tutorial will take a look at a temporal difference learning method and Q-learning in the OpenAI Gym environment “FrozenLake-v0”. Author: Oliver Mai Environment The environments description reads: The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction.

Deep Q Learning with Pytorch

k-armed bandit: $arepsilon$-greedy and $arepsilon$-decay strategies

OpenAI Gym Example: CartPole

OpenAI Gym Example: Frozen Lake