Adventures coding with AI

benwiggy

Ars Scholae Palatinae
1,046
I've recently been using ChatGPT's "Code Copilot" to produce Swift scripts, and it's worked rather well.

I started off by just asking for a simple function, largely as a means to get a working example of some of Apple's APIs; but by gradually building it up, in the manner of a teacher ("Now let's add this to what we've already done"), I was able to create a really useful command-line utility, without a single line of my own code.

The generated code was always not without error, but feeding the error messages back, it was always able to work out what was wrong and what needed to be changed. You do have to check that its logic is what you wanted -- there was one point where a completion handler was always returning true, no matter the result. (But just saying: "it's reporting success here when it shouldn't", it was able to fix it correctly.)

All in all, I found the experience really enjoyable, instructing my virtual apprentice to produce work for me to inspect. Certainly much more enjoyable, less frustrating -- and faster -- than trawling through Apple's documentation and trying to knock something up myself.

Of course, this is exactly where AI ought to excel -- a large amount of data, easily absorbed, with logical rules to its application. I've also given it some code that I've written, which it improved and corrected. I'm looking forward to the planned addition of AI features to Xcode.

There's another thread somewhere asking what the perfect programming language would be. For me, it would be "Computer, run a simulation to improve warp core efficiency."
 
Last edited:
  • Like
Reactions: continuum
Despite the name "artificial intelligence", the currently hyped crop of large language models is not intelligent at all. Instead, it is a very large heap of statistical information about language, i.e. a set of numbers detailing correlations between words (or word-lets). The specific correlations have been "measured" by observing a huge body of existing text, and effectively counting occurrences.

The LLMs are indeed remarkable in that they are using clever tricks to bring this intractable problem down to manageable sizes (both in terms of storage and computation). In a very real sense, the LLMs are compressing and interpolating a ridiculously large set of input data, and they can decompress it with affordable compute power in more or less real time.

Nonetheless, the only "understanding" that LLMs can have is in terms of correlations between words. That is, some terms and phrases are so-and-so likely to occur with such-and-such distance in a text. LLMs are great for remixing commonly used text fragments, or for reproduction of commonly used structured text types. They are probably pretty good as a convenient interface to widely known encyclopedic knowledge.

LLMs contain no mechanism whatsoever to judge if a creative combination of words is a new insight (as in "not present in the training data") or a "hallucination" with no use or value. In fact, the term "hallucination" is a fairly euphemist way to admit that the LLM has no clue what it was talking about.


So why does programming with LLMs work relatively well? Because programming languages are much more strictly formalized. And are used with a much narrower scope. And most programs consist mostly of "boilerplate code", i.e. very repetitive chores like interfacing with widely used OSes or APIs or services. The correlations here are very strong and clear. Only a very small percentage of program code implements the actually new and interesting features; an even smaller percentage reflects actual ingenuity ...

By and large, software engineering is more craft than art. Not inherently, but there is a lot more money to be made with boring software than with programming as an art form. Those boundary conditions impose their very own set of correlations. :)
 

cburn11

Ars Praetorian
443
Subscriptor
I played with the gpt-3.5-turbo and gpt-4-turbo models through openai's rest api a few weeks ago. Played meaning I had no idea how to efficiently ask it to write code. But It's impressive that you can prompt it in a colloquial manner with "open a file passed on the command line," or rewite the open function to use istreambuf_iterator instead of read, and both models return actual code. Or ask it "is that an expensive operation," and it returns an answer that "understands" expensive in the context means "avoid unnecessary mallocs."

I incrementally asked both models to read a file into memory and then send that memory buffer as POST data to a specific url using libsoup. gpt-3.5 mixed code from versions 2 and 3 of the library. After several attempts, I gave up trying to untangle it. gpt-4 initially mixed calls from the library versions, but with a little prompting got it correct.

The 3.5 conversation didn't cost enough to register on the billing statement. It cost 45 cents for the 4.0 model to successfully write to the moral equivalent of curl -d @filepath http://example.com/cgi-bin/upload

While it initially impressed me. I suspect the end result is that I have $9.55 in credits at openai that will go unused for the foreseable future.
 
  • Like
Reactions: educated_foo