SPARC Application

Question 2

Accept me pls


We want to first organise the problem into variables. Here is how I could categorise the following variables:

  1. Actor-determined variables (i.e. what I the business owner should do)
    1. Whether to accept or decline each day's customer order
    2. How many cabbages to order from the farmer each night
  2. Random variables
    1. Farmer cancellation, which can be subsequently modelled with a probability p_cancel
  3. Fixed variables / parameters
    1. Revenue and costs,
      1. demand-side upon acceptance: +4 per successful order, -5 per missed order,
      2. demand-side upon decline: 0
      3. supply-side upon farmer acceptance: -1 with 1 cabbage
      4. supply-side upon farmer decline: -1 with 0 cabbage
    2. Delivery and spoilage timing: cabbages arrive two days after I order them, unused cabbages spoil EOD
    3. Delivery deadlines: must deliver any accepted order within 3 days

So from this list of variables, I can proceed with knowing that I would have to make two decisions everyday, primarily the ones listed out in (1).

What would also be helpful would be to track the state at which the system is in. There are several state variables that we can tease out here:

  1. Current day
  2. Pending orders: a list of orders with deadlines -- how many orders must be delivered today, by tomorrow, etc. preferably an order/ deadline pair
  3. Whether there are any cabbages in transit
    1. How many cabbages are arriving tomorrow
    2. How many cabbages are arriving two days from now
  4. Number of available cabbages

We can first create a simple script assuming naive base cases.

import random 

def run_base_case(horizon=7, p=0.3, seed=42):
	"""
	Here we are assuming the following:
	1. Exactly one customer order each day 
	2. Always accept
	3. Always buy one cabbage
	4. Constant cancellation probability p
	This returns the total profit over the number of horizon days. 
	"""
	random.seed(seed)
	REVENUE_PER_ORDER = 4 
	PENALTY_MISSED = -5 
	COST_PER_CAB = 1

	cabs_arriving = {}
	for d in range(horizon+4): #we init to horizon + 4 since we might receive arrivals after day horizon with enough space
		cabs_arriving[d]=0
	pending_orders = []
	total_profit = 0 #here we init profit 

	for day in range(1, horizon+1):
		was_cancelled = (random.random()<p) #given the seed on any day, is it more or less than the probability of cancelling 
		if was_cancelled: 
			cabs_arriving[day+1]=0
		inventory_today = cabs_arriving[day]
		delivered_count = 0
		still_pending = []

		#here we handle processing of orders 
		for (deadline, accepted_day) in pending_orders: 
			if day <= deadline and inventory_today > 0: 
				inventory_today -= 1
				devliered_count += 1
			else: 
				still_pending.appent((deadline, accepted_day))
		daily_revenue = delivered_count *REVENUE_PER_ORDER

		missed_count = 0 
		really_pending = [] #lol
		for (deadline, accepted_day) in still_pending:
			if day > deadline: 
				missed_count += 1
			else: really_pending.append((deadline, accepted_day))
		daily_penalty = missed_count * PENALTY_MISSED
		pending_orders = really_pending 
		new_deadline = day + 3 
		pending_orders.append((new_deadline, day))
		cabs_arriving[day+2] += 1
		cost_today = COST_PER_CAB

		daily_profit = daily_revenue + daily_penalty - cost_today
		total_profit += daily_profit

	return total_profit

Running this for an example case:

final_profit = run_base_case_simulation(horizon=7, p=0.3, seed=42)

will result in final_profit = 0.


Now, given that this is the base case, we have introduced other variables without yet making our decisions (our decisions are still fixed in an always accept, always buy 1 cabbage per night state). This includes:

  • horizon / length of simulation
  • p (while defined earlier, this is now a variable that we explicitly control)
  • seed (needed to generate the actual result on each day that is higher / lower than the probability)

Focusing back on the main question of how to actually decide to best run my business, I would pick the lowest hanging fruit given the absolute base case. Since the question states that every morning, a customer orders one cabbage, that means that the binary decision of accepting / rejecting orders is a easier one to make as compared to a range that needs to be optimised for the number of cabbages to be bought per night. Step 1: Fixing cabbages bought per night and optimising order acceptance

We would then proceed with trying to find the best possible decisions to be made given fixed horizon lengths, probability p , and seed. To be clear, this is optimising for the best solution / decision computationally. Let's take a look at the outputs in a given simulation run and explain it logically.


Running Simulation: Horizon = 7, p_cancel = 0.3, seed = 23

Day Rand p Cancel? Arrivals Accept? Delivered Missed Revenue Penalty Cost Profit Cumulative
10.925False0True00001-1.00-1.00
20.949False0True00001-1.00-2.00
30.892False1True00001-1.00-3.00
40.084True0True00001-1.00-4.00
50.592False1False114-51-2.00-6.00
60.424False1False104013.00-3.00
70.530False1False104013.000.00
This will give a final profit of 0.00.


Given a random state each day (rand p), we check whether it's above / below the cancellation probability of the farmer. We can then know the final outcome (is it cancelled), as shown in the second column. In this strategy, we will continue accepting orders until day 5, where there are too many pending orders. This is so as to optimise for as many deliveries as possible.


Running multiple simulations would allow to observe that a decision pattern that emerges due to a precomputed optimal policy that is static. For lower values of p <= 0.5 , the strategy accepts orders up to horizon - 3 days because there is a high chance of deliveries arriving on time. However, for higher values of p >= 0.5, the strategy becomes more conservative, resulting in the only loss being that of the ordering of nightly cabbages. This relation holds true for horizon >= 3 days .


Step 2: Introducing dynamic cabbage purchases

Now, we modify the code such that we can toggle

  1. the number of cabbages bought each night (again, the easiest way to do this is to toggle between 0 and 1)
  2. whether to accept or decline a customer's order

Again these two options are based on the probability of the farmer cancelling on us. Given the same scenario as before, we now see a slight improvement in the final profit:


Running Simulation: Horizon = 7, p_cancel = 0.3

Day Rand p Cancel? Arrivals Accept? Cabs Bought Delivered Missed Revenue Penalty Cost Profit Cumulative
10.925False0True0000000.000.00
20.949False0True100001-1.00-1.00
30.892False0True100001-1.00-2.00
40.084True0True100001-1.00-3.00
50.592False1False1114-51-2.00-5.00
60.424False1False0104004.00-1.00
70.530False1False0104004.003.00
This gives a final profit of 3.00.


Running multiple simulations with these two toggles, we can observe that the threshold to accept orders for up to horizon - 3 days now drops to p_cancel <= 0.3 and to reduce losses by not buying any cabbages and by declining all orders when p_cancel >= 0.4 . The maximum final profit of 18.00 is achieved on a horizon = 12 days, p_cancel = 0.1 condition.

Therefore, the policy favours profit maximisation in low-risk scenarios and damage control in high-risk cases. Longer horizons allow for higher maximum profits, but only when the probability of the farmer cancelling remains low.


Future improvements

Ideally, we would not want a static, hard-coded decision that is optimised for fixed conditions. A more sophisticated system would dynamically adapt to changing variables over time. Rather than following predetermined thresholds, an ideal solution would be capable of learning and adjusting decisions based on evolving conditions, just like how a human decision-maker would run this business. A natural extension of this problem is the implementation of a reinforcement learning (RL) model, which could continuously optimise order acceptance and inventory management by interacting with its environment and learning from past outcomes. The problem structure aligns well with an RL framework: the state space consists of key variables such as current inventory, pending orders, expected deliveries, and cancellation probability; the action space revolves around accepting or rejecting customer orders and determining the number of cabbages to purchase each night; and the reward function is explicitly defined in terms of revenue, penalties for missed deliveries, and inventory costs. Given more time, I'll explore this direction by defining a formal RL formulation and selecting an appropriate learning algorithm.