Solving real world business problem with AI solutions require more than a software project.

Almost always we begin with Plan of Attack. That is composed of ESI (Environment, Solution & Implementation). We define Environment by SAR (States, Actions & Rewards). We define Solution by Intuition & Theory. We define Implementation in terms of which language to use, where to deploy, how data will be collected, processed and output is generated.

**Example: Business process of optimizing warehouse**

*Environment: *States, Actions, Rewards

*Solution: * Markov Decision Process, Temporal Difference & Q-Learning

*Implementation:* Build using Python, Try using Jupitor notebook & Deploy using Amazon AWS Lambda

**Algorithms, Maths & Process**

*Reinforcement Learning: The Bellman Equation*

Assume: s-State, a-Action, R-Reward, Y-Discount (Gamma is represented as Y – in other books it says ‘gamma’), V = value i.e, V(s) is value of a specific state, V(s’) is value of state s’

V(s) = max of a (R(s,a) + YV(s’))

Markov Decision Process (MDP)

A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present states) depends only upon the present state, not on the sequence of events that preceded it. A process with this property is called a Markov process

**Resources and References:**

- Course: https://www.udemy.com/course/ai-for-business
*Arthur Juliani, 2016,**Simple Reinforcement Learning with Tensorflow (10 Parts)**Richard Sutton et al., 1998,**Reinforcement Learning I: Introduction**Richard Bellman, 1954,**The Theory of Dynamic Programming**D. J. White, 1993,**A Survey of Applications of Markov Decision Processes**Martijn van Otterlo, 2009,**Markov Decision Processes: Concepts and Algorithms**Richard Sutton, 1988,**Learning to Predict by the Methods of Temporal Differences*

Sponsor link: https://ripe.ai