🤝 Mistral Moves Towards an Open-Source GPT-4 Competitor
Dear curious minds,
Likely you didn’t notice, but I changed the mail address which I use for substack and with that, the address you receive this newsletter from firstname.lastname@example.org to email@example.com. The new address is way shorter and speaks for itself in contrast to the previous one which does not make sense to be used for sending mails.
In this week's issue, I bring to you the following topics:
Mixtral 8x7B: The Open-Source Model That Gets Past GPT-3.5
Song Generation: Microsoft Copilot Expands Its Capabilities with Plugins
Two PKM Legends in One Podcast: Tiago Forte Interviewed by Nick Milo
If nothing sparks your interest, feel free to move on, otherwise, let us dive in!
Thanks for reading! If you are not already subscribed, enter your mail address below to receive new issues and support my work.
🌐🖥️ Mixtral 8x7B: The Open-Source Model That Gets Past GPT-3.5
The Mixtral-8x7B LLM, released under the Apache 2.0 license on December 11th 2023 by Mistral, represents a significant leap in the performance of open-source models.
Its architecture is a Mixture of Experts (MoE), which, if the rumors are correct, is also the architecture used by GPT-4:
one gating network
8 expert models instead of 16 (2x reduction from GPT-4 rumors)
7B parameters per expert instead of 166B (24x reduction from GPT-4 rumors)
The model has a total of 46.7B parameters, but only uses two experts, and with that 12.9B parameters, to generate a token. As a result, it achieves the same speed as a 12.9B model.
Besides the benchmark scores shared in the release blog article, the model shows its strength in the chatbot arena and is the best open-source model as of today and even outperforms GPT-3.5 from OpenAI and Gemini Pro from Google.
Mistral, the company which developed the model, is based in Paris (France) and was founded by former Meta and Google employees. They raised 385 Million Euros ($414.41 million) in their second funding round, which put their evaluation to 2 billion Euros, as stated by Reuters.
You can access the model via the Mistral API service, which started in an open-beta. More info, including the sign-up link, in the corresponding platform blog article. Alternatives to running the model are shown in this maginative article.
I started to explore the model myself locally by running a down scaled version using only 3 bits per parameter (quantization) on my Nvidia GeForce 3090 Desktop GPU. As stated on the HuggingFace model page, this version is the largest one which fits completely in the 24GB of GPU memory my card has, but looses some performance by its compression. As a result, I achieved roughly 8 token per second, which feels a bit slower than using most cloud-based LLMs but is still fine. The framework I used to run Mixtral was Ollama, which is my current favorite for executing LLMs locally due to its ease of use.Loading video
The currently most used fine-tuned version of the model is Dolphin-Mixtral which was created by Eric Hartford. It excels in coding tasks, thanks to being trained with additional datasets. Notably, it's uncensored and requires caution in use. Eric's blog post provides further insights into the nuances of using such uncensored models. The model took 3 days to train over 1.5 epochs on 4x A100s and with the current costs of $2 per hour for one A100 GPU, this results in $576 for fine-tuning the model.
My take: It's a positive development to see competitive open-source models like Mixtral in the realm of Large Language Models (LLMs). These models are not just benchmarks of technological advancement but also catalysts for privacy and data security. Open-source LLMs enable individuals and organizations to process sensitive data locally, without the need to share it with cloud services. This is particularly crucial for data that is private or confidential. By utilizing these models, users can leverage the power of advanced AI while maintaining control over their data, ensuring that their private information remains just that - private.
🌐🛠️ Song Generation: Microsoft Copilot Expands Its Capabilities with Plugins
Microsoft Copilot (formerly Bing Chat) has recently introduced a significant update that includes support for plugins, a feature initially popularized by ChatGPT in March 2023.
Meanwhile, OpenAI superseded plugins, which are still available, with the new iteration of integrating custom tools and services into GPTs. The latter also has an equivalent in the Microsoft world, named Copilot Studio, but this will not be covered here.
Microsoft Copilot's range of available plugins is currently more limited compared to the extensive offerings in OpenAI's plugin store.
An outstanding plugin in Copilot is 'Suno,' which generates short songs from text prompts. However, my tests sometimes resulted in 404 errors, indicating room for improvement.
A notable mention is the new ability to disable web-search in the plugin view, a functionality not available directly in ChatGPT. This seems counterintuitive, given Bing Chat's emphasis on enhancing web search capabilities through its chatbot, especially with source-based answers.
Microsoft has also introduced a history feature for recent chats in Copilot. This addition addresses a long-standing user request for better tracking and revisiting of past interactions, which can now also be exported as Word, PowerPoint or Text file.
My take: I've been closely tracking Microsoft Copilot's evolution. It's clear that Copilot is catching up to the paid version of ChatGPT. The only key features missing from my perspective are the code interpreter (announced for Copilot) and the possibility to add custom instructions, which allows tailoring how the model answers. So far, I will continue my paid ChatGPT subscription, but I will also evaluate Copilot as an alternative, especially when I run into the usage cap of GPT-4 in ChatGPT.
🎙️🧠 Two PKM Legends in One Podcast: Tiago Forte Interviewed by Nick Milo
In the second episode of the podcast “How I think” by Nick Milo, he interviewed Tiago Forte, the creator of the “Building A Second Brain” book and course.
Tiago and Nick discuss various topics related to productivity, creativity, problem-solving, mindset, and personal growth.
Insightful are the 10 questions Tiago uses to prove and develop his thoughts:
How is this thing that seems bad actually good. In fact, how is it exactly what I need at this moment?
How can I turn up the chaos / entropy in the situation? Conversely, how can I turn down the chaos / entropy?
What would it look like to dial up the scope, making it much better than would normally be expected? How can I dial down the scope, make just an MVP (minimum viable product)?
When facing a problem, what is upstream of this problem that might be easier to address? What is downstream?
How is the opposite of what I think / believe also true? How can I borrow elements of that opposite truth to incorporate into my own worldview?
When I find myself judging someone, how is my judgment of them really a projection of parts of myself I can't love or accept? How would embracing those parts give me more freedom?
How can I start this with abundance? Also, how can I start this with scarcity?
How would an extraterrestrial alien who knows nothing of our history or culture view this?
What is the feeling I'm avoiding feeling right now? How can I feel it?
What is the bottleneck in the current system? How can I relieve it such that the whole system changes?
My take: I have been waiting for this podcast since Nick Milo announced it on 𝕏 as I love the content both create and share about PKM. My journey in PKM began with Tiago's "Building A Second Brain" cohort in 2022, a program where I gained substantial knowledge and skills. This year, I further expanded my expertise by participating in Nick's "Linking Your Thinking" program. Given this background, my excitement for their interview was sky-high, and it certainly did not disappoint. What stood out most was Tiago's segment, where he delved into the crucial questions that shape his thought process. This was a refreshing and unique perspective, distinct from other interviews he's done recently.
💻🔍 Tech Terms: Quantization
Quantization, in the context of LLMs, is a technique used to reduce the model's computational requirements without significantly compromising its performance. It's particularly relevant for deploying these models in resource-constrained environments or for improving their efficiency.
Basic Concept: Quantization involves converting the weights and activations of a neural network from high-precision formats (like 32-bit floating-point) to lower-precision formats (like 8-bit integers).
Reducing Memory and Computational Load: By using lower-precision formats, the amount of memory required to store the model's weights is reduced. This also reduces the computational load, as operations on lower-precision integers are generally faster than on high-precision floating-point numbers.
Impact on Performance: While quantization can slightly degrade the model's performance due to the reduced precision, careful implementation can minimize this impact. Advanced techniques like quantization-aware training can even help the model adapt to the lower precision during training.
Real-World Example: Imagine you have a detailed map (high-precision data) for navigating a city. Quantization is like converting this map into a simpler, less detailed version (lower-precision data). While some details are lost, the simpler map is easier to carry and use, especially if you only need basic navigation information.
Application in LLMs: For LLMs, quantization makes it feasible to deploy these models on devices with limited processing power or memory, like smartphones or edge devices, enabling a wider range of applications.
❓🤔 Mind Bender
Can AI help in identifying false or misleading information in our knowledge base?
Think about the prompt and identify what your answer to this question is. If you are curious what GPT-4 replies to this prompt, take a look here. You are welcome to share any thoughts by replying to this mail. I would love to hear from you!
Disclaimer: This newsletter is written with the aid of AI. I use AI as an assistant to generate and optimize the text. However, the amount of AI used varies depending on the topic and the content. I always curate and edit the text myself to ensure quality and accuracy. The opinions and views expressed in this newsletter are my own and do not necessarily reflect those of the sources or the AI models.