The OpenAI API docs are very bad. In my experience as a coder, I’ve come across my share of bad documentation. Typically, this is because the documentation is poorly organized, too spare, or missing coverage. Or it’s because the design of the API itself is badly conceived, inconsistent, or contains the accumulated cruft of years (or decades!) of bloat and abandoned features.
But I can’t recall ever seeing documentation that contains code samples that are both wrong and also syntactically wrong. It’s bad enough that it comes across as documentation written by GPT–and not even a recent model.
Take this example, part of an entry under the “Core Concepts” section:
context=[{"role":"role","content":"What is the capital of France?"}]res1=client.responses.create(model="gpt-5",input=context,)//Appendthefirstresponse’soutputtocontextcontext+=res1.output//Addthenextusermessagecontext+=[{"role":"role","content":"And it's population?"}]res2=client.responses.create(model="gpt-5",input=context,)
The Python code sample here is not syntactically correct. The comments use the ‘//’ convention of C/Java/Javascript in-line comments, rather than Python’s ‘#’. Additionally, OpenAI has the concept of a role, which indicates who (e.g. the system, the user, the model’s responder) is “speaking.” The string “role” is not a valid value for this and making an API call with it results in an error:
So, there are a total of 7 code statements in this sample, including the comments, and 4 of them have errors. The thing is, GPT-5 is actually pretty good at writing code. It’s even capable of executing Python code in an internal environment. We can see this facility in action by simply asking ChatGPT to debug the code from the OpenAI documentation.
This is a mode of LLM use that I haven’t had a lot of luck with, but here it pinpoints the two errors perfectly.
When documentation is bad in a common fashion, it typically creates a frustrating programming experience. And, to be clear, the OpenAI docs are bad in some of those ways too. But the sheer lack of care it demonstrates is both shocking for all the ways that Tech has integrated AI into our world and, frankly, majestic. Like making a horse consul or completely blowing up the system of global trade.
This is a quick follow-up to my last post about using the Monte Carlo method to predict how easy it will be to schedule Praxis sessions next year.
In that post, I calculated that we might easily be in trouble if students have even a few fixed obligations beyond an average teaching load.
I also mentioned that I’d like to incorporate the actual distribution of classes at UVA into this model. Lou’s List, a long-running and unofficial UVA course listing created by professor emeritus of physics Lou Bloomfield, conveniently has scraped the exact data that I need from UVA’s unfriendly official course selection site. The course search page at Lou’s List even offers convenient CSV downloads (gotta love physics professors).
Even though I’m only interested in the scheduling data, it’s still messy enough that it needs to be cleaned up first. The format is days-of-the-week and then a time range (“TuTh 11:00am - 12:15pm”), so parsing it was a little more involved than usual. There are also a lot of “TBA” values and some probable placeholders (e.g. “MoTuWeThFrSaSu 7:00am - 6:00pm”) to filter out.
Since I want to use this data for scheduling Praxis, I only care about the times that overlap the Scholars’ Lab’s working hours.1 And for this purpose, a class that ends at 12:15pm is functionally the same for us as one that ends at 12:50pm so I rounded down start times and rounded up end times. Crunching through the Fall 2024 undergraduate course data results in this distribution:
I modified my previous code to incorporate Python’s built-in random.choices function to generate random schedules for students weighted by this distribution and that pretty much got me to where I wanted to be.
And it’s good news! Well, good news and bad news. The chances of us finding two 2-hour slots for Praxis are much better since the curve drops off much less steeply. But presumably this is because the chances of us having to have early morning and Friday sessions have gone up.
Since for whatever reason the syntax highlighting only seems to be working for my local build, I guess I’ll just link to the repo on GitHub.
And here’s where I made a mistake, because I’m throwing away slots that fall outside of our business hours but students may well be assigned to a section in those times. But it should be close enough that I’m not going to fix it. ↩
The Praxis Program fellowship will shortly undergo dramatic funding changes as consequences of UVA Library austerity budget cuts (as Brandon Walsh has thoughtfully documented in his recent post). Starting the next academic year, we will no longer be able to buy out our fellows’ teaching obligations. One consequence of this is that they will have a substantially larger number of fixed times where they cannot attend Praxis sessions. The modern Praxis curriculum consists of two 2-hour sessions in a typical week, and it has occasionally been irksome to find times to meet for 5 fellows even when they did not have to teach. With decreased availability, I am anxious that this problem will become insurmountable and the program that we have refined over fifteen years will need to be substantially reconfigured. My way of coping with this anxiety is to crunch the numbers, under the dubious theory that having greater insight about future calamity will make it easier to face. So, what’s the likelihood that we’ll be able to find two free slots a week in common, assuming that each student has a certain number of hours that are already taken? At what point does this number drop off?
The problem is that I don’t really know anything about statistics. I have to look up combination and permutation every time to know which one is which. Happily, for people who know how to code but don’t know how to do stats, there’s the Monte Carlo Method. If we can straightforwardly model the rules of a problem, but it’s onerous to map it to an abstract statistical approach, we can just have a computer try out different random permutations (or is it combinations?) over and over again to create an approximation of the outcome.
In this case, we start with assuming 8-hour workdays and a 5-day workweek, 5 students, and some variable number of hours each week when the student will be teaching (or other inflexible obligations). To simulate the scheduling for one semester for a given number of obliged student-hours, we can start by assuming that these obligations are evenly distributed across the entire week. Then, we create a random schedule for each student, represented by a boolean list of length 40. Since we only really care about when every student is free, we can just take the union of all the times they are busy. After that, we can simply check if there are two 2-hour blocks of contiguous free time to determine the outcome of this run.
Arbitrarily, we can run this 10,000 times for each number of obliged hours per week from 0 to 20 and graph the results.
Here, I’ve also run the numbers for both the case where we enforce that each session be on different days (ideal) or if we will allow them to be on the same day (barbarous) to see if that unenviable prospect buys us anything. From this graph, we can see that, either way, there’s a pretty steep drop-off starting at 8 hours and falling below 50% success rate at 10 hours. Allowing sessions to be on the same day only gets us about 0.5 hours of leeway, which doesn’t seem worth the torturous cost. Typically, a graduate teaching assistant for a single large course may be required to attend three hours of classes and preside over three more hours of discussion sections in a week. There are many more obligations that are either more flexible or require less time, but this represents a reasonable floor for our consideration. This means that we’re relying on students having at most about 2-4 additional hours a week of fixed obligations before we are likely to be in trouble.
There’s a lot of assumptions here and we can maybe make our model more complex using, say, real historical course schedule distribution data from Lou’s List, but I think this does provide a reasonable initial approximation.
So does this make me feel better? Maybe, sort of, yes. The numbers aren’t great but there is a narrow path to success. And I think this is also helpful in that it gives us a lot of time to put mitigation strategies into motion, some of which may also be strengthened by having these numbers. Maybe we can act sooner and steal a march on other, easier to schedule things or work with our fellows’ home departments. And maybe I’ll just keep playing around with refining this model with Lou’s List datasets, just for the sake of anxiety.
Here’s my code, just in case it’s useful for anyone.
"""
Simple simulation of Praxis scheduling
"""importrandomimportcsvNUM_PEOPLE=5SIMULATIONS=10000ENFORCE_MULTIDAY=TrueDAYS=5HOURS_PER_DAY=8# track separately the outcome if we enforce
# multi-day and if we allow same day sessions
success_rates_multiday=[0]*21success_rates_sameday=[0]*21forhours_busyinrange(21):successes_multiday=0successes_sameday=0# Iterate over simulations
foriinrange(SIMULATIONS):busy=[False]*DAYS*HOURS_PER_DAY# For each person, independently mark off hours_busy hours as busy
# Result is a list of DAYS*HOURS_PER_DAY booleans representing slots
# where each person is busy (True) or free (False)
forjinrange(NUM_PEOPLE):forkinrandom.sample(range(DAYS*HOURS_PER_DAY),hours_busy):busy[k]=True# Calculate successes if we enforce multi-day slots
# Split busy slots into days
days=[busy[i*HOURS_PER_DAY:i*HOURS_PER_DAY+HOURS_PER_DAY]foriinrange(DAYS)]days_free=[False]*DAYS# Determine if each day has a free slot or not
forkinrange(len(days)):d=days[k]# a day is free if it has two consecutive free hours
days_free[k]=any(notaandnotbfora,binzip(d,d[1:]))ifsum(days_free)>=2:successes_multiday+=1# Calculate successes if we allow same-day slots
count=0i=0whilei<len(busy)-1:ifnotbusy[i]andnotbusy[i+1]:count+=1ifcount==2:successes_sameday+=1break# skip ahead to avoid overlap
i+=2else:i+=1success_rates_multiday[hours_busy]=successes_multiday/SIMULATIONSsuccess_rates_sameday[hours_busy]=successes_sameday/SIMULATIONSwithopen("freeslots.csv","w",encoding="UTF8")asfp:fieldnames=["Hours busy per person",f"Success rate enforcing multi-day slots","Success rate allowing same day slots"]csvwriter=csv.writer(fp)csvwriter.writerow(fieldnames)foriinrange(21):csvwriter.writerow([i,success_rates_multiday[i],success_rates_sameday[i]])