Alibaba’s latest model, QWQ-32B PreviousHe has obtained some impressive criticisms for his reasoning skills. As the GPT-4 O1 of OpenAi,1 His training has emphasized reasoning instead of just reproducing language. That seemed something worth trying, or at least playing, so when I heard that he was quickly available in Ollama And it was not too big to run a moderately well equipped laptop, I downloaded QWQ and tried it. I also tried some competitive models: GPT-4 O1 and GEMMA-2-27B. GPT-4 O1 was the first model in stating that it had been specifically trained for reasoning. Gemma, as I know, does not make such a statement.
How is a reasoning model proved? It is not as easy as just verifying the facts. I thought: “Maybe I could ask him to demonstrate Fermat’s last theorem“But I quickly realized that the only answer I could evaluate was” I cannot prove this. ” Much longer, one hundred pages of dense mathematical reasoning. But that is not a valid test.
Learn faster. Dig deeper. See further.
What about Fermat’s little theorem? That is much easier, but it is an example of a textbook, so whatever the result, I would never know if QWQ reasoned correctly or if it only managed to launch some of its training set. There are more than a few mathematical textbooks online, and it is fair to assume that everyone is in training data.
So I tried a slightly darker example: show that if a number consists only of which 11,111,111) and is cousin, the number of digits (all 1) must also be cousin. I have managed to prove this, but it is not a problem famous enough to attract the attention of the textbook. So let’s go! Here is the QWQ test:
>>> Can you prove that if all the digits of a prime number are 1,
then the number of digits must be prime?
Let’s address this problem step by step. Then, the statement is: if all the digits of a prime number are 1, then the number of digits must be cousin. HMM, interesting. Let’s break this.
First, let’s understand what we are dealing with. We have a prime number where all its digits are 1. for example, numbers such as 11, 111, 1111, etc., but only considering those that are cousins. We need to show that the number of digits in that prime number should be a prime number.
Let me think about this. Let’s denote the number of digits per
\( n \)
. So, the number consists of\( n \)
some. These numbers are known as repunity numbers. A repetition is a number that consists only of its decimal representation. For example,\( R_2 = 11 \)
,\( R_3 = 111 \)
,\( R_4 = 1111 \)
And so on, where\(\ R_n \)
Denotes a length rejection\( n \)
.Our goal is to show that if
\( R_n \)
It’s cousin, then\( n \)
It must be cousin.Consider the contrapositive of this statement. The contrary of “yes
\( R_n \)
It’s cousin, then\( n \)
is cousin “is” yes\( n \)
It is not cousin, then\( R_n \)
It is not better. “If we can prove the counterpositive, that will establish the original statement.
… And so on until …
** Final answer **
\[ \boxed{\text{If } R_n \text{ is prime,
then } n \text{ is prime.}} \]
I have saved you the (very) talk test: if you want to see it, put qwq to the text message yourself. Suffice it to say it was correct. Both GPT-4 O1 and GEMMA-2-27B gave me the same test but with less talk. Gemma jumped some steps that a human could consider obvious.2
Then, can we go one step further and generalize?
>>> does this proof hold for
bases other than base 10?
That is an interesting question! Let’s explore if this test can be extended to bases other than base 10.
Based
\( b \)
A length repetition\( n \)
It is defined as:\[ R_n^{(b)} = \frac{b^n - 1}{b - 1} \]
…etc…
Again, try it yourself, and again, QWQ demonstrated it correctly. It is not so likely that this second proposition appears in textbooks or other resources; It is a test less than an observation that the original evidence did no assumption on the basis.
When I asked GPT to prove the same theorem, I obtained a very similar (and correct) test, declared more formally and with less color comments. That is not particularly surprising, since GPT has also been trained for reason. I was more surprised to see than Gemma-2-27b also gave me a correct test. Gemma has been trained in mathematical texts but not specifically in “reasoning.” (Maybe Google’s marketing never thought about calling this “reasoning” training). Gemma omitted some of the steps: the steps of a regular human would probably omit as obvious, but that a mathematician would write to complete. (Just to make sure, I asked GPT to confirm that the Gemma test was correct. He agreed).
Have we shown that training models to reason “works”? Well, we cannot affirm that we have demonstrated anything on the basis of a successful judgment, or, in the case, on the basis of an extremely large number of trials. (In case he asked, Gemma-2-7B, an even smaller model, failed). But we have learned something very important. Think about the size of the models: Operai has not said anything about the size of GPT-4 O1, but it is rumored that it has more than one billion parameters. QWQ weighs 32 billion parameters, and Gemma-2-27b to 27 billion. Then QWQ and Gemma2 are between almost two orders of magnitude smaller than GPT. In addition, GPT is executed in what should be considered one of the world’s largest supercomputers. We do not know the size, but we do know that Openai’s infrastructure is massive and includes a large percentage of the high -end GPUs in the world. QWQ and Gemma ran happily in my Macbook Pro. They turned the fan and suck the battery, but nothing extraordinary. Okay, GPT is serving thousands of users simultaneously, so it is not really a fair comparison. But it is important to realize that GPT is not the only game in the city and that the models that are executed locally can match GPT in non -trivial tasks. The majority of people who have experienced with models in execution have reached similar conclusions, but think of what this means. If you are building an AI application, you don’t have to tied Openai. The smallest open models can do the job, and will protect it from OpenAI’s invoices (and inevitable price increases), will allow you to keep your local data and leave it in control of your destination.
What else can we learn? I have wondered how a language model can be trained for logic; My intuition said it would be a more difficult and complex problem than training it for language. My intuition was wrong. I do not know how these models trained, but now I think that producing successfully logic is, in many ways, simpler than generating language. Because? QWQ’s verbosity gives us a great clue: “Consider the contrapositive of this statement.” A counterpositive is simply a logical pattern: if A implies B, then not b implies no A. What other logical patterns can we think? SILOGISMS: If A implies B and B, it implies C, then it implies C. Test by contradiction: prove that A implies B, suppose that it implies B and shows that the assumption is false. Induction: show that A(n)
implies B(n)
so A(n+1)
implies B(n+1)
; Then show that A(0)
implies B(0)
.
It would be easy to cultivate a much longer list of patterns. There are better annotations to represent these patterns, but a longer list and better representations are not important here. The important thing is to realize that these are patterns, and that composing logical patterns in statements or logical evidence is not fundamentally different from composing words (or tokens) in sentences. Push patterns around the essence of logic? That is not a fair question: it is logical if you do it correctly, illogical if you do not. The logic is not in the patterns, but in knowing how to assemble the patterns to solve problems, and the process of assembling patterns must be the focus of training, observing millions of examples of logical reasoning to model the way in which the patterns are assembled in the lower parts. Any of these logical patterns can carry it on a bad way if you are not careful; It is easy to build false syllogisms starting with premises that are incorrect. I do not hope that logic will heal the problem of hallucination. But I suspect that training a model in logical patterns is a better way for the “learning” model to simply train it in words (human expressions). That is the bet that Openai, Alibaba and possibly Google are doing, and seem to be winning.
Can we go further? Are there other types of patterns in which language models could be trained? Yes. The generative AI has proven useful for generating code, but has not yet (yet) significant incursions into the software design. Could training models specifically on design patterns be an advance?3 I don’t know, but I would like to see someone try. It is worth having a specialized model for software design.
Could we do better with generative music if we train models in the patterns analyzed in music theory, in addition to audio? Applications like Suna They are very funny, but when you do, they are only repeating the clichés of common musical styles. Would you help Suno give a knowledge of musical theory, knowledge of the patterns behind music in addition to music itself? Would language models write a better poetry if they were trained in patterns found in poetic language (rhetorical devices, figurative speech) instead of just words? One of my first experiments with generative was to ask GPT-3 to write a Petranchen sonnet, which has a different structure of the most common Shakespearean sonnet. GPT-3 and his contemporaries failed. He spent a lot of time before finding a model that could do it successfully; Although most models could define a Petrarca sonnet, they could only generate Shakespeare sonnets. That generation of models was trained only in words, not in the largest patterns.
Is this a path to follow for AI? I don’t know, but I would like AI researchers to try. Meanwhile, however, it is enough to realize that, such as GPT models, you can run small open models on a laptop or a phone that works equally well.
Foot notes
- I tried the preview, which has now been promoted to GPT-4 O1. I did not try again with the final O1, which presumably has had more training and optimization.
- Later I decided that it was unfair to omit Claude. So I tried Claude’s small haiku model (parameters 8b) and obtained a correct test. Of the group, it was the shortest and required more thought to see that it was correct.
- Design patterns are usually associated with an object -oriented design, but the concept is really more general. Design patterns try to name solutions to problems you see every day; Naming the solution allows you to talk about it. This definition is applicable to any discipline, including functional programming and (of course) architecture.
#Generative #Logic #OReilly