Trusting your own judgement on AI is a huge risk

@[email protected] · 2 days ago

Trusting your own judgement on AI is a huge risk

@[email protected] · 1 day ago

I use AI to make gtk shell widgets for my Linux rice. It’s definitely not as good as a experienced ricer but it can give good boilerplates. At the end I have to trouble shoot multiple logic errors but it’s better than writing all that spaghetti at to myself.

Other than that the only use case I find for AI in coding is to crosscheck my code or make it generate tests for me. Even that is very rare.

My justification: I use AI because I don’t want to write 1000-5000(combined) lines of code for a simple dock widget that can do couple of custom actions I use. Also the guarantee that the shell(ignis, ags) I use today can become the old thing very quickly, so I don’t like spending much time.

@[email protected] · 2 days ago

I don’t know what to think anymore. I guess I’ll have to ask ChatGPT what to do.

@[email protected] · 2 days ago

What called my attention is that assessments of AI are becoming polarized and somewhat a matter of belief.

Some people firmly believe LLMs are helpful. But programming is a logical task and LLMs can’t think - only generate statistically plausible patterns.

The author of the article explains that this creates the same psychological hazards like astrology or tarot cards, psychological traps that have been exploited by psychics for centuries - and even very intelligent people can fall prey to these.

Finally what should cause alarm is that on top that LLMs can’t think, but people behave as if they do, there is no objective scientifically sound examination whether AI models can create any working software faster. Given that there are multi-billion dollar investments, and there was more than enough time to carry through controlled experiments, this should raise loud alarm bells.

@[email protected] · 2 days ago

Although, I believe we can educate people about the truths of AI, but I am scared to trust the corporations or the governments with it.

Although, I can recommend a book called AI Snakeoil by Arvind Narayanan and Sayash Kapoor, which is written for a layman that might help to understand limitations of AI.

@[email protected] · edit-2 2 days ago

The problem, though, with responding to blog posts like that, as I did here (unfortunately), is that they aren’t made to debate or arrive at a truth, but to reinforce belief. The author is simultaneously putting himself on the record as having hardline opinions and putting himself in the position of having to defend them. Both are very effective at reinforcing those beliefs.

A very useful question to ask yourself when reading anything (fiction, non-fiction, blogs, books, whatever) is “what does the author want to believe is true?”

Because a lot of writing is just as much about the author convincing themselves as it is about them addressing the reader. …

There is no winning in a debate with somebody who is deliberately not paying attention.

This is all also a great argument against the many articles claiming that LLMs are useless for coding, in which the authors all seem to have a very strong bias. I can agree that it’s a very good idea to distrust what people are saying about how programming should be done, including mistrusting claims about how AI can and should be used for it.

We need science #

Our only recourse as a field is the same as with naturopathy: scientific studies by impartial researchers. That takes time, which means we have a responsibility to hold off as research plays out

This on the other hand is pure bullshit. Writing code is itself a process of scientific exploration; you think about what will happen, and then you test it, from different angles, to confirm or falsify your assumptions. The author seems to be saying that both evaluating correctness of LLM output and the use of Typescript is comparable to falling for homeopathy by misattributing the cause of recovering from illness. The idea that programmers should not use their own judgment or do their own experimentation, that they have no way of telling if code works or is good, to me seems like a wholesale rejection of programming as a craft. If someone is avoiding self experimentation as suggested I don’t know how they can even say that programming is something they do.

@[email protected] · 1 day ago

Writing code is itself a process of scientific exploration; you think about what will happen, and then you test it, from different angles, to confirm or falsify your assumptions.

What you confuse here is doing something that can benefit from applying logical thinking with doing science. For exanple, mathematical arithmetic is part of math and math is science. But summing numbers is not necessarily doing science. And if you roll, say, octal dice to see if the result happens to match an addition task, it is certainly not doing science, and no, the dice still can’t think logically and certainly don’t do math even if the result sometimes happens to be correct.

For the dynamic vs static typing debate, see the article by Dan Luu:

https://danluu.com/empirical-pl/

But this is not the central point of the above blog post. The central point of it is that, by the very nature of LKMs to produce statistically plausible output, self-experimenting with them subjects one to very strong psychological biases because of the Barnum effect and therefore it is, first, not even possible to assess their usefulness for programming by self-exoerimentation(!) , and second, it is even harmful because these effects lead to self-reinforcing and harmful beliefs.

And the quibbling about what “thinking” means is just showing that the arguments pro-AI has degraded into a debate about belief - the argument has become “but it seems to be thinking to me” even if it is technically not possible and also not in reality observed that LLMs apply logical rules, cannot derive logical facts, can not explain output by reasoning , are not aware about what they ‘know’ and don’t ‘know’, or can not optimize decisions to multiple complex and sometimes contradictory objectives (which is absolutely critical to sny sane software architecture).

What would be needed here are objective controlled experiments whether developers equipped with LLMs can produce working and maintainable code any faster than ones not using them.

And the very likely result is that the code which they produce using LLMs is never better than the code they write themselves.

@[email protected] · edit-2 1 day ago

What you confuse here is doing something that can benefit from applying logical thinking with doing science.

I’m not confusing that. Effective programming requires and consists of small scale application of the scientific method to the systems you work with.

the argument has become “but it seems to be thinking to me”

I wasn’t making that argument so I don’t know what you’re getting at with this. For the purposes of this discussion I think it doesn’t matter at all how it was written or whether what wrote it is truly intelligent, the important thing is the code that is the end result, whether it does what it is intended to and nothing harmful, and whether the programmer working with it is able to accurately determine if it does what it is intended to.

The central point of it is that, by the very nature of LKMs to produce statistically plausible output, self-experimenting with them subjects one to very strong psychological biases because of the Barnum effect and therefore it is, first, not even possible to assess their usefulness for programming by self-exoerimentation(!) , and second, it is even harmful because these effects lead to self-reinforcing and harmful beliefs.

I feel like “not even possible to assess their usefulness for programming by self-exoerimentation(!)” is necessarily a claim that reading and testing code is something no one can do, which is absurd. If the output is often correct, then the means of creating it is likely useful, and you can tell if the output is correct by evaluating it in the same way you evaluate any computer program, without needing to directly evaluate the LLM itself. It should be obvious that this is a possible thing to do. Saying not to do it seems kind of like some “don’t look up” stuff.

@[email protected] · edit-2 2 days ago

Are you saying that it is not possible to use scientific methods to systematically and objectively compare programming tools and methods?

Of course it is possible, in the same way as it can be inbestigated whuch methods are most effective in teaching reading, or whether brushing teeth is good to prevent caries.

And the latter has been done for comparing for example statically vs dynamically typed languages. Only that the result there is so far that there is no conclusive advantage.

@[email protected] · edit-2 2 days ago

Are you saying that it is not possible to use scientific methods to systematically and objectively compare programming tools and methods?

No, I’m saying the opposite, and I’m offended at what the author seems to be suggesting, that this should only be attempted by academics, and that programmers should only defer to them and refrain from attempting this to inform their own work and what tools will be useful to them. An absolutely insane idea given that the task of systematic evaluation and seeking greater objectivity is at the core of what programmers do. A programmer should obviously be using their experience writing and testing both typing systems to decide which is right for their project, they should not assume they are incapable of objective judgment and defer their thinking to computer science researchers who don’t directly deal with the same things they do and aren’t considering the same questions.

This was given as an example of someone falling for manipulative trickery:

A recent example was an experiment by a CloudFlare engineer at using an “AI agent” to build an auth library from scratch.

From the project repository page:

I was an AI skeptic. I thought LLMs were glorified Markov chain generators that didn’t actually understand code and couldn’t produce anything novel. I started this project on a lark, fully expecting the AI to produce terrible code for me to laugh at. And then, uh… the code actually looked pretty good. Not perfect, but I just told the AI to fix things, and it did. I was shocked.

But understanding and testing code is not (necessarily) guesswork. There is no reason to assume this person is incapable of it, and no reason to justify the idea that it should never be attempted by ordinary programmers when that is the main task of programming.

Trusting your own judgement on AI is a huge risk

Trusting your own judgement on AI is a huge risk

Trusting your own judgement on 'AI' is a huge risk