Reinforcement Learning with TensorFlow : a beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym.

Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models. You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining th...

Full description

Saved in:
Bibliographic Details
Online Access: Full Text (via ProQuest)
Main Author: DUTTA, SAYON
Format: eBook
Language:English
Published: Birmingham : Packt Publishing, 2018.
Subjects:

MARC

LEADER 00000cam a2200000Mi 4500
001 b11256679
003 CoU
005 20200626163018.1
006 m o d
007 cr |||||||||||
008 180505s2018 enk o 000 0 eng d
020 |a 9781788830713 
020 |a 1788830717 
035 |a (OCoLC)ebqac1034635694 
035 |a (OCoLC)1034635694 
037 |a ebqac5371683 
040 |a EBLCP  |b eng  |e pn  |c EBLCP  |d MERUC  |d IDB  |d NLE  |d OCLCQ  |d UKMGB  |d OCLCO  |d LVT  |d OCLCF  |d UKAHL  |d C6I  |d OCLCQ 
049 |a GWRE 
050 4 |a Q325.6  |b .D888 2018eb 
100 1 |a DUTTA, SAYON. 
245 1 0 |a Reinforcement Learning with TensorFlow :  |b a beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym. 
260 |a Birmingham :  |b Packt Publishing,  |c 2018. 
300 |a 1 online resource (327 pages) 
336 |a text  |b txt  |2 rdacontent. 
337 |a computer  |b c  |2 rdamedia. 
338 |a online resource  |b cr  |2 rdacarrier. 
505 0 |a Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Deep Learning -- Architectures and Frameworks; Deep learning; Activation functions for deep learning; The sigmoid function; The tanh function; The softmax function; The rectified linear unit function; How to choose the right activation function; Logistic regression as a neural network; Notation; Objective; The cost function; The gradient descent algorithm; The computational graph; Steps to solve logistic regression using gradient descent; What is xavier initialization? 
505 8 |a Why do we use xavier initialization?The neural network model; Recurrent neural networks; Long Short Term Memory Networks; Convolutional neural networks; The LeNet-5 convolutional neural network; The AlexNet model; The VGG-Net model; The Inception model; Limitations of deep learning; The vanishing gradient problem; The exploding gradient problem; Overcoming the limitations of deep learning; Reinforcement learning; Basic terminologies and conventions; Optimality criteria; The value function for optimality; The policy model for optimality; The Q-learning approach to reinforcement learning. 
505 8 |a Asynchronous advantage actor-criticIntroduction to TensorFlow and OpenAI Gym; Basic computations in TensorFlow; An introduction to OpenAI Gym; The pioneers and breakthroughs in reinforcement learning; David Silver; Pieter Abbeel; Google DeepMind; The AlphaGo program; Libratus; Summary; Chapter 2: Training Reinforcement Learning Agents Using OpenAI Gym; The OpenAI Gym; Understanding an OpenAI Gym environment; Programming an agent using an OpenAI Gym environment; Q-Learning; The Epsilon-Greedy approach; Using the Q-Network for real-world applications; Summary; Chapter 3: Markov Decision Process. 
505 8 |a Markov decision processesThe Markov property; The S state set; Actions; Transition model; Rewards; Policy; The sequence of rewards -- assumptions; The infinite horizons; Utility of sequences; The Bellman equations; Solving the Bellman equation to find policies; An example of value iteration using the Bellman equation; Policy iteration; Partially observable Markov decision processes; State estimation; Value iteration in POMDPs; Training the FrozenLake-v0 environment using MDP; Summary; Chapter 4: Policy Gradients; The policy optimization method; Why policy optimization methods? 
505 8 |a Why stochastic policy?Example 1 -- rock, paper, scissors; Example 2 -- state aliased grid-world; Policy objective functions; Policy Gradient Theorem; Temporal difference rule; TD(1) rule; TD(0) rule; TD() rule; Policy gradients; The Monte Carlo policy gradient; Actor-critic algorithms; Using a baseline to reduce variance; Vanilla policy gradient; Agent learning pong using policy gradients; Summary; Chapter 5: Q-Learning and Deep Q-Networks; Why reinforcement learning?; Model based learning and model free learning; Monte Carlo learning; Temporal difference learning. 
500 |a On-policy and off-policy learning. 
520 |a Reinforcement learning allows you to develop intelligent, self-learning systems. This book shows you how to put the concepts of Reinforcement Learning to train efficient models. You will use popular reinforcement learning algorithms to implement use-cases in image processing and NLP, by combining the power of TensorFlow and OpenAI Gym. 
588 0 |a Print version record. 
650 0 |a Reinforcement learning. 
650 7 |a Reinforcement learning.  |2 fast  |0 (OCoLC)fst01732553. 
776 0 8 |i Print version:  |a DUTTA, SAYON.  |t Reinforcement Learning with TensorFlow : A beginner's guide to designing self-learning systems with TensorFlow and OpenAI Gym.  |d Birmingham : Packt Publishing, ©2018. 
856 4 0 |u https://ebookcentral.proquest.com/lib/ucb/detail.action?docID=5371683  |z Full Text (via ProQuest) 
907 |a .b112566790  |b 06-29-20  |c 06-29-20 
998 |a web  |b  - -   |c f  |d b   |e z  |f eng  |g enk  |h 0  |i 1 
915 |a M 
956 |a Ebook Central Academic Complete 
956 |b Ebook Central Academic Complete 
999 f f |i f683a959-ecfc-5935-873d-57efc0e8ac5f  |s e2b55655-b522-5d51-a695-5231636488fc 
952 f f |p Can circulate  |a University of Colorado Boulder  |b Online  |c Online  |d Online  |e Q325.6 .D888 2018eb  |h Library of Congress classification  |i web  |n 1