Qwen2.5-Coder 32B Instruct: A Best Coding Model-A Complete Step-by-Step Guide and Performance…

Rifx.Online
Programming , Machine Learning , Generative AI
14 Nov, 2024

Qwen2.5-Coder 32B Instruct: A Best Coding Model-A Complete Step-by-Step Guide and Performance Evaluation for Developers

Photo By Author

Introduction

In the ever-evolving landscape of AI-powered programming tools, large language models (LLMs) have dramatically transformed the way developers write, debug, and optimize code. Today, we are thrilled to explore the Qwen2.5-Coder series, an open-source marvel that promises to set new standards in the realm of code generation and AI coding assistants. The latest release in this family, Qwen2.5-Coder-32B-Instruct, has redefined the state-of-the-art (SOTA) performance in open-source coding models, rivaling the capabilities of established models like GPT-4o. Let’s dive deeper into what makes Qwen2.5-Coder so “Powerful”, “Diverse”, and “Practical”.

In this comprehensive guide, we’ll explore the core capabilities of the Qwen2.5-Coder-32B model. We’ll demonstrate how to use it with the transformers library, test its coding abilities and highlight its practical applications.

Why Qwen2.5-Coder?

Key Highlights

Powerful: The flagship Qwen2.5-Coder-32B model matches the coding prowess of GPT-4 on major coding benchmarks, while also excelling in general and mathematical skills.
Diverse: The release covers multiple model sizes (0.5B, 1.5B, 3B, 7B, 14B, 32B), offering flexibility for different resource constraints.
Practical: Designed for real-world applications, including code assistants and artifact generation. The models are licensed under Apache 2.0, ensuring freedom to use and modify for both commercial and research purposes.

The Qwen2.5-Coder Series: A Game-Changer for Open Code LLMs

The Qwen2.5-Coder series is dedicated to pushing the boundaries of open-source code generation. With a focus on flexibility and scalability, this release includes models in a range of sizes: 0.5B, 1.5B, 3B, 7B, 14B, and the flagship 32B version. These models cater to various developer needs, from lightweight, resource-efficient models to high-capacity, feature-rich models suitable for demanding applications.

1. Powerful: Setting New Benchmarks in Code Generation

Qwen2.5-Coder-32B-Instruct stands as the flagship model, boasting a range of capabilities that have earned it the title of the current SOTA open-source code model. It excels in:

Code Generation: On popular benchmarks like EvalPlus, LiveCodeBench, and BigCodeBench, it matches the performance of GPT-4o, offering precise code generation across a multitude of scenarios.
Code Repair: Fixing broken or inefficient code is crucial in software development. On the Aider benchmark, which tests code repair skills, Qwen2.5-Coder-32B-Instruct scored an impressive 73.7, rivaling GPT-4o’s prowess.
Code Reasoning: The ability to understand and reason through code execution paths is critical for debugging and optimizing complex software. This model’s capabilities extend beyond mere generation — it excels at predicting inputs and outputs, making it an invaluable tool for software engineers.

Source: (https://qwenlm.github.io/blog/qwen2.5-coder-family/)

2. Diverse: Supporting Multiple Programming Languages and Rich Model Sizes

The versatility of Qwen2.5-Coder is evident in its support for over 40 programming languages, including niche languages like Haskell and Racket. This broad support is backed by meticulous data cleaning and balanced training, ensuring the model performs optimally across different coding environments.

Multi-language Code Repair: Its proficiency extends to code repair in unfamiliar languages, which can significantly reduce the learning curve for developers exploring new technologies.
Model Size Flexibility: The Qwen2.5-Coder series offers models across six sizes, ensuring developers with varying resource constraints can find a model suited to their needs. The scaling law philosophy that underpins these models means that performance correlates positively with model size, giving developers the flexibility to choose the right balance between performance and computational resources.

Performance Insights: Evaluating Qwen2.5-Coder Models

1. Instruct vs. Base Models

Qwen2.5-Coder is available in both Base and Instruct versions:

Base models are designed for developers who want a raw model to fine-tune for their specific applications.
Instruct models come pre-aligned and are optimized for interactive and conversational use cases, making them ideal for chat-based code assistants.

2. Benchmark Comparisons: Leading the Pack

Across various core benchmarks:

MBPP-3shot was chosen to evaluate Base models, providing a robust metric to gauge their code comprehension and synthesis capabilities.
The LiveCodeBench questions set was used to evaluate Instruct models, focusing on their adaptability to new and unseen coding problems.

The results? Qwen2.5-Coder consistently outperforms other open-source models, proving that scaling up to larger sizes indeed correlates with better performance.

Hands-On Guide: Using Qwen2.5-Coder-3B for Code Generation with Transformers

In this hands-on tutorial, we will demonstrate how to use the Qwen2.5-Coder-3B model from the transformers library to generate code. This model is part of the Qwen2.5-Coder series, which is designed to excel in code generation, repair, and reasoning. By the end of this tutorial, you’ll see how to integrate this powerful open-source model into your own projects for a range of code-related tasks.

Prerequisites

Before diving into the code, make sure you have the following installed:

pip install torch transformers

Additionally, ensure you have access to a GPU-enabled environment if you want to leverage the model’s performance optimally.

Step 1: Import Required Libraries

We’ll start by importing the necessary components from the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

Step 2: Load the Model and Tokenizer

In this step, we load the Qwen2.5-Coder-32B-Instruct model and its corresponding tokenizer. The device_map="auto" option will automatically allocate the model to your available GPU or CPU.

Qwen2.5-Coder has released models in various sizes — 0.5B-Instruct, 1.5B-Instruct, 3B-Instruct, 7B-Instruct, 14B-Instruct, and 32B-Instruct — on Hugging Face. If you want to run them locally, choose the model that best fits your GPU capacity. These models are also available on Ollama, so you can use them within the Ollama environment. If you’re interested in an Ollama tutorial, feel free to let me know in the comments!

model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

Step 3: Writing a Chat Template Function

The Qwen2.5-Coder model is designed to handle conversation-like prompts using a chat template. The following helper function sets up the input prompt in a way that aligns with the model’s expectations:

def generate_response(model, tokenizer, prompt):
    messages = [
        {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    # Prepare the chat input
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    # Tokenize and prepare inputs
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    # Generate response
    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=512
    )
    # Remove prompt tokens from output
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
  
    # Decode and return the generated text
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

Step 4: Testing the Model with Code Generation

Let’s run some examples to see how well the Qwen2.5-Coder-32B model performs in generating Python and Java code. We will test it with three different programming prompts.

Test 1: Quick Sort Algorithm

Prompt: “Write a quick sort algorithm.”

prompt_1 = "write a quick sort algorithm."
response_1 = generate_response(model, tokenizer, prompt_1)
print(response_1)

Output:

Certainly! Here’s a simple implementation of the Quick Sort algorithm in Python:

def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

# Example usage:
arr = [3, 6, 8, 10, 1, 2, 1]
print(quick_sort(arr))

This code defines a quick_sort function that takes an array as input and returns a sorted version of the array using the Quick Sort algorithm. The function works by selecting a pivot element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted. The base case for the recursion is when the array has one or no elements, in which case it is already sorted.

You can use this code as a starting point and modify it according to your specific needs. Let me know if you have any further questions or if you need help with anything else!<|fim_middle|>

Test 2: Factorial of a Number

Prompt: “Write a Python program to find a factorial of a number.”

prompt_2 = “Write a python program to find a factorial of a number.” response_2 = generate_response(model, tokenizer, prompt_2) print(response_2)

Output:

Setting pad_token_id to eos_token_id:151643 for open-end generation. Certainly! Here’s a Python program that calculates the factorial of a number:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

# Example usage
number = int(input("Enter a number: "))
result = factorial(number)
print(f"The factorial of {number} is {result}")

This program defines a recursive function factorial that calculates the factorial of a given number n. It uses the base case n == 0 to return 1, and for other values of n, it recursively calls itself with n - 1 until it reaches the base case. Finally, the program prompts the user to enter a number and displays the factorial of that number.

## Test 3: Formatting Numbers with Commas

**Prompt**: “Write a Python function named ‘format\_number’ that takes a non-negative number as its only parameter. Your function should convert the number to a string and add commas as a thousand separators.”

prompt_3 = """Write a python function named ‘format_number’ that takes a non-negative number as its only parameter. Your function should convert the number to a string and add commas as a thousand separators. For example, calling format_number(1000000) should return ‘1,000,000’.""" response_3 = generate_response(model, tokenizer, prompt_3) print(response_3)

Output:

Setting pad_token_id to eos_token_id:151643 for open-end generation. Certainly! Here’s a Python function that accomplishes this task:

def format_number(num):
    # Convert the number to a string
    num_str = str(num)
  
    # Initialize an empty string to store the formatted number
    formatted_num = ''
  
    # Initialize a counter for the number of digits
    digit_count = 0
  
    # Iterate over the digits in reverse order
    for i in range(len(num_str) - 1, -1, -1):
        # Add the current digit to the formatted number
        formatted_num = num_str[i] + formatted_num
      
        # Increment the digit count
        digit_count += 1
      
        # If the digit count is a multiple of 3, add a comma
        if digit_count % 3 == 0 and i != 0:
            formatted_num = ',' + formatted_num
  
    return formatted_num

You can test the function with the following code:

print(format_number(1000000))  # Output: '1,000,000'
print(format_number(123456789))  # Output: '123,456,789'
print(format_number(1234567890))  # Output: '1,234,567,890'

This function works by iterating over the digits of the number in reverse order, adding each digit to the formatted number and adding a comma every third digit. The function then returns the formatted number as a string.

## Test 4: Java Implement a Stack Using Arrays

**Prompt**: “Write a Java program, Implement a Stack Using Arrays”

prompt_4 = """Write a Java program, Implement a Stack Using Arrays.""" response_4 = generate_response(model, tokenizer, prompt_3) print(response_4)

Output:

Setting pad_token_id to eos_token_id:151643 for open-end generation. Certainly! Here’s a simple Java program that implements a stack using arrays:

public class StackUsingArrays {
    private int[] stackArray;
    private int top;
    private int capacity;

    public StackUsingArrays(int size) {
        stackArray = new int[size];
        top = -1;
        capacity = size;
    }

    public void push(int item) {
        if (isFull()) {
            System.out.println("Stack is full. Cannot push item: " + item);
        } else {
            top++;
            stackArray[top] = item;
            System.out.println("Pushed item: " + item);
        }
    }

    public int pop() {
        if (isEmpty()) {
            System.out.println("Stack is empty. Cannot pop item.");
            return -1;
        } else {
            int item = stackArray[top];
            top--;
            return item;
        }
    }

    public boolean isEmpty() {
        return top == -1;
    }

    public boolean isFull() {
        return top == capacity - 1;
    }

    public static void main(String[] args) {
        StackUsingArrays stack = new StackUsingArrays(5);

        stack.push(10);
        stack.push(20);
        stack.push(30);

        System.out.println("Popped item: " + stack.pop());
        System.out.println("Popped item: " + stack.pop());

        stack.push(40);
        stack.push(50);

        System.out.println("Popped item: " + stack.pop());
        System.out.println("Popped item: " + stack.pop());
        System.out.println("Popped item: " + stack.pop());
    }
}

This program defines a StackUsingArrays class that uses an array to implement a stack. The push method adds an item to the top of the stack, and the pop method removes and returns the item at the top of the stack. The isEmpty and isFull methods check if the stack is empty or full, respectively. The main method demonstrates how to use the stack by pushing and popping items.

## Test 5: Simple classification algorithm

**Prompt**: “Write a Python code to run a simple classification algorithm using Sklearn”

prompt_5 = """Write a Python code to run a simple classification algorithm using Sklearn""" response_5 = generate_response(model, tokenizer, prompt_3) print(response_5)

Output:

Setting pad_token_id to eos_token_id:151643 for open-end generation. Certainly! Here’s a simple example of a classification algorithm using the sklearn library in Python:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Generate some sample data
np.random.seed(42)
X = np.random.rand(100, 2)  # 100 samples, 2 features
y = np.random.choice([0, 1], size=100)  # Binary classification

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

This code generates some sample data, splits it into training and testing sets, trains a logistic regression model, makes predictions on the test set, and calculates the accuracy of the model. Feel free to modify the code to suit your specific needs.

More details:

[Qwen/Qwen2.5-Coder-32B-Instruct · Hugging FaceWe're on a journey to advance and democratize artificial intelligence through open source and open science.huggingface.co](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)

[qwen2.5-coder:32bThe latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and…ollama.com](https://ollama.com/library/qwen2.5-coder:32b)

# Step 5: Analyzing the Model’s Performance

From our tests, the ***Qwen2.5-Coder-32B-Instruct*** model demonstrates:

* **Strong code generation capabilities**, producing efficient, human-readable solutions for classic coding problems.
* **Understanding of Python syntax and best practices**, especially when it comes to using Pythonic solutions like list comprehensions and formatted strings.
* **Flexibility** in adapting to a variety of prompts, which is essential for real-world programming assistant use cases.

# Potential Use Cases

Given its performance, the Qwen2.5-Coder model can be effectively used in various scenarios, such as:

* **Coding assistants**: Integration into IDEs or text editors to help developers write code faster.
* **Automated code reviews**: Assisting in identifying bugs, optimizing code, and suggesting improvements.
* **Educational tools**: Helping students learn to code by generating example solutions and explanations.

# Conclusion

The **Qwen2.5-Coder** series, particularly the **32B model**, offers a powerful and versatile tool for developers, researchers, and organizations looking to leverage AI for code-related tasks. Its strong performance on benchmarks like EvalPlus, Aider, and McEval proves its competitive edge in code generation, repair, and reasoning.

By open-sourcing these models, Alibaba Cloud is paving the way for a future where AI-powered coding assistants are accessible to everyone. Whether you’re a developer looking to automate repetitive tasks or a student aiming to learn new programming concepts, Qwen2.5-Coder is a reliable tool to add to your arsenal.