Reading Document and Coding Functions with Multi-Agent AI Systems using Magentic-One

Rifx.Online
Programming , Technology , Machine Learning
26 Nov, 2024

Magentic-One is designed to streamline complex tasks by leveraging multiple AI agents, each with specialized capabilities. One of my previous post also introduce Magentic-One. Recently, I embarked on a journey to develop a mobile app (named “MotionLab”) capable of connecting with BLE sensors, such as motion sensors, or utilizing the device’s built-in motion sensors like accelerometers.

However, during the development process, I encountered a challenge: decoding data received from a BLE sensor subscribed to a characteristic with a specific UUID as shown in following screenshot. This is where Magentic-One, a multi-agent AI system, came to my rescue.

Setup env for Magentic-One. Model is ‘gpt-4o-2024–08–06’.

export CHAT_COMPLETION_PROVIDER='openai'

export OPENAI_API_KEY="your_api_key_here"

Then just use exampe.py to execute. This time, I’ve included a feature to save screenshots, allowing us to observe some of the behind-the-scenes processes. python3 examples/example.py —logs_dir ./my_logs —save_screenshot

python3 examples/example.py --logs_dir ./my_logs --save_screenshot

Here is one prompt that used, also tried with some othe prompts.

As a software developer for a mobile app connected to a BLE motion wearable sensor, 
you can access the BLE sensor, subscribe to a characteristic with 
UUID 0x0000FFE4-0000-1000-8000-00805F9A34FB, and receive a list of 
twenty integer values when the sensor is in motion, 
here is one exmaple [85, 97, 119, 2, 168, 254, 146, 254, 6, 2, 48, 254, 205, 255, 248, 240, 83, 252, 171, 196]. 
These values may represent acceleration (x, y, z) at some points 
by decoding packet header, flag. To understand and interpret these values 
correctly, you need to decode them based on the Bluetooth 5.0 communication 
protocol of the wearable sensor. 
The protocol documentation can be found at https://wit-motion.gitbook.io/witmotion-sdk/ble-5.0-protocol/bluetooth-5.0-communication-protocol . 
Can you assist in decoding these values and translating them into meaningful data for the app?

Then userProxy get user prompt input.

Orchestrator get info and will do a plan.

Orchestrator makes a plan decision.

Orchestrator process final plan.

The WebSurfer agent is tasked with accessing the provided website to retrieve document information. It captures screenshots and utilizes OCR methods, leveraging large multimodal models, to extract the necessary details.

Orchestrator does reflection based on response, then decide next plan to focus on whether retrieve specific data interpretation section relevant with BLE data.

Orchestrator makes plan that asks Coder to write python script to decode BLE data based on BLE communication protocal that retrieved from website.

Coder writes one python function.

Then we check my_logs for saved screenshots during this process.

Magentic-One is designed to streamline complex tasks by leveraging multiple AI agents, each with specialized capabilities. By providing it with a website containing BLE sensor documentation, Magentic-One autonomously navigated the site, captured screenshots, and employed OCR technology to extract the Bluetooth 5.0 communication protocol details. With this information, the system’s coder agent drafted a Python function tailored to decode the sensor data, utilizing sample data I included in the prompt. This experience underscores the transformative impact of AI technologies in modern app development, paving the way for more innovative and intelligent applications.

Prompt, Prompt, Prompt

Throughout my experimentation with various prompts to replicate these tasks, I**’ve observed that the prompt and system message are the most crucial factors influencing the agents’ workflow and the final outcome.** While I haven’t delved into comparing different Large Multimodal Models (LMMs), my primary focus has been on optimizing workflows and refining prompt engineering. By honing these aspects, we can significantly enhance the efficiency and effectiveness of agent-based workflows.

Happy Coding, Happy Reading. Happy Thanksgiving.