Reinforcement Learning with TensorFlow : a beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym.
Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models. You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining th...
Saved in:
Online Access: |
Full Text (via ProQuest) |
---|---|
Main Author: | |
Format: | eBook |
Language: | English |
Published: |
Birmingham :
Packt Publishing,
2018.
|
Subjects: |
MARC
LEADER | 00000cam a2200000Mi 4500 | ||
---|---|---|---|
001 | b11256679 | ||
003 | CoU | ||
005 | 20200626163018.1 | ||
006 | m o d | ||
007 | cr ||||||||||| | ||
008 | 180505s2018 enk o 000 0 eng d | ||
020 | |a 9781788830713 | ||
020 | |a 1788830717 | ||
035 | |a (OCoLC)ebqac1034635694 | ||
035 | |a (OCoLC)1034635694 | ||
037 | |a ebqac5371683 | ||
040 | |a EBLCP |b eng |e pn |c EBLCP |d MERUC |d IDB |d NLE |d OCLCQ |d UKMGB |d OCLCO |d LVT |d OCLCF |d UKAHL |d C6I |d OCLCQ | ||
049 | |a GWRE | ||
050 | 4 | |a Q325.6 |b .D888 2018eb | |
100 | 1 | |a DUTTA, SAYON. | |
245 | 1 | 0 | |a Reinforcement Learning with TensorFlow : |b a beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym. |
260 | |a Birmingham : |b Packt Publishing, |c 2018. | ||
300 | |a 1 online resource (327 pages) | ||
336 | |a text |b txt |2 rdacontent. | ||
337 | |a computer |b c |2 rdamedia. | ||
338 | |a online resource |b cr |2 rdacarrier. | ||
505 | 0 | |a Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Deep Learning -- Architectures and Frameworks; Deep learning; Activation functions for deep learning; The sigmoid function; The tanh function; The softmax function; The rectified linear unit function; How to choose the right activation function; Logistic regression as a neural network; Notation; Objective; The cost function; The gradient descent algorithm; The computational graph; Steps to solve logistic regression using gradient descent; What is xavier initialization? | |
505 | 8 | |a Why do we use xavier initialization?The neural network model; Recurrent neural networks; Long Short Term Memory Networks; Convolutional neural networks; The LeNet-5 convolutional neural network; The AlexNet model; The VGG-Net model; The Inception model; Limitations of deep learning; The vanishing gradient problem; The exploding gradient problem; Overcoming the limitations of deep learning; Reinforcement learning; Basic terminologies and conventions; Optimality criteria; The value function for optimality; The policy model for optimality; The Q-learning approach to reinforcement learning. | |
505 | 8 | |a Asynchronous advantage actor-criticIntroduction to TensorFlow and OpenAI Gym; Basic computations in TensorFlow; An introduction to OpenAI Gym; The pioneers and breakthroughs in reinforcement learning; David Silver; Pieter Abbeel; Google DeepMind; The AlphaGo program; Libratus; Summary; Chapter 2: Training Reinforcement Learning Agents Using OpenAI Gym; The OpenAI Gym; Understanding an OpenAI Gym environment; Programming an agent using an OpenAI Gym environment; Q-Learning; The Epsilon-Greedy approach; Using the Q-Network for real-world applications; Summary; Chapter 3: Markov Decision Process. | |
505 | 8 | |a Markov decision processesThe Markov property; The S state set; Actions; Transition model; Rewards; Policy; The sequence of rewards -- assumptions; The infinite horizons; Utility of sequences; The Bellman equations; Solving the Bellman equation to find policies; An example of value iteration using the Bellman equation; Policy iteration; Partially observable Markov decision processes; State estimation; Value iteration in POMDPs; Training the FrozenLake-v0 environment using MDP; Summary; Chapter 4: Policy Gradients; The policy optimization method; Why policy optimization methods? | |
505 | 8 | |a Why stochastic policy?Example 1 -- rock, paper, scissors; Example 2 -- state aliased grid-world; Policy objective functions; Policy Gradient Theorem; Temporal difference rule; TD(1) rule; TD(0) rule; TD() rule; Policy gradients; The Monte Carlo policy gradient; Actor-critic algorithms; Using a baseline to reduce variance; Vanilla policy gradient; Agent learning pong using policy gradients; Summary; Chapter 5: Q-Learning and Deep Q-Networks; Why reinforcement learning?; Model based learning and model free learning; Monte Carlo learning; Temporal difference learning. | |
500 | |a On-policy and off-policy learning. | ||
520 | |a Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models. You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining the power of TensorFlow and OpenAI Gym. | ||
588 | 0 | |a Print version record. | |
650 | 0 | |a Reinforcement learning. | |
650 | 7 | |a Reinforcement learning. |2 fast |0 (OCoLC)fst01732553. | |
776 | 0 | 8 | |i Print version: |a DUTTA, SAYON. |t Reinforcement Learning with TensorFlow : A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym. |d Birmingham : Packt Publishing, ©2018. |
856 | 4 | 0 | |u https://ebookcentral.proquest.com/lib/ucb/detail.action?docID=5371683 |z Full Text (via ProQuest) |
907 | |a .b112566790 |b 06-29-20 |c 06-29-20 | ||
998 | |a web |b - - |c f |d b |e z |f eng |g enk |h 0 |i 1 | ||
915 | |a M | ||
956 | |a Ebook Central Academic Complete | ||
956 | |b Ebook Central Academic Complete | ||
999 | f | f | |i f683a959-ecfc-5935-873d-57efc0e8ac5f |s e2b55655-b522-5d51-a695-5231636488fc |
952 | f | f | |p Can circulate |a University of Colorado Boulder |b Online |c Online |d Online |e Q325.6 .D888 2018eb |h Library of Congress classification |i web |n 1 |