“Imperfect things with a positive ingredient can become a positive difference.” – JasonGPT
I don’t know how you are wired, but for me, I become intoxicated with new technology. I have a compulsive need to learn all about it. I’m also a kinesthetic learner which means I need to be hands on. So into the code I go. My latest fixation is large language models (LLMs) and the underlying generative neural network (NN) transformers (GPTs) that power them. I confess, the last time I built a NN, we were trying to read George H.W. Bush’s lips. And no, that experiment didn’t work out too well for us… or for him!
Do you want to know what I have discovered so far? Too bad. I thought I would take you along for the ride anyway. Seriously, if you are fed up with all the artificial intelligence news and additives, you can stop now and go about your week. I won’t mind. Otherwise, hang on, I’m going to take you on an Indiana Jones style adventure through GPT! Just don’t look into the eyes of the idol… that could be dangerous, very dangerous!
Where do we start? YouTube of course! I have a new nerd crush. His name is Andrej Karpathy. He is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla and currently works for OpenAI. He lectured at Standford University and has several good instructional lectures on YouTube. I first saw him at the Microsoft Build conference where he gave a keynote on ChatGPT but what blew me away was his talk, “Let’s build GPT: from scratch, in code, spelled out.” (YouTube link). It’s no joke. He builds a GPT model on the works of Shakespeare (1MB), from scratch. After spending nearly 2 hours with him, Google Colab and PyTorch, I was left with a headache and some cuts and bruises. But I also had an insatiable desire to learn more. I have a long way to go.
The way I learn is to fork away from just repeating what an instructor says and start adding my own challenges. I had an idea. I have done a lot of writing (many of you are victims to that) and much of that is on my blog site. What if I built a GPT based solely on the corpus of all my writing? Does that sound narcissistic a bit to you too? Oh well, for the good of science, we go in! Cue the Indy music. I extracted the text (468k). It’s not much, but why not?
By the way, if you are still with me, I’ll try to go faster. You won’t want to hear about how I wasted so much time trying to use AMD GPUs (their ROCm software sucks, traveler beware), switched to CPUs, Nvidia CUDA and eventually Apple Silicon MPS (Metal Performance Shaders built in to the M1). All the while, I was using my fork of the code I built with Andrej Karpathy (ok, not him directly, but while watching his video). I started off with the simple Bigram NN Language model. And it is “Bi-Gram” not “Big RAM” but I found that to be ironically comical in a dad joke sort of way.
My JasonGPT bigram.py started learning. It ran for 50,000 iterations and took about 8 hours. It even produced an output of random musings. While there was quite a bit of nonsensical output, I was amazed at how well this small run did at learning words, basic sentence structure and even picked up on my style. Here are some samples from the output I found interesting, comical and sometimes, well, spot on:
- It’s a lot of time… But I think we also need science.
- What are your big ideas?
- Set our management to the adjacent ground (GND) pin.
- I have a task to Disneyland out that this day.
- I love the fun and fanciful moments as kids get to dream into their favorite characters, embrace the identity of their heroes, wrap themselves up starfish back.
- Bring on the “power” of his accidental detail.
- Your character provided faith, all kindness and don’t care.
- Grab a difference too.
- After several days of emailing, texting and calling, I received a text message.
- Curl has the ability to provide timing data for DNS lookup, it will easily show or avoided.
- Imperfect things with a positive ingredient can become a positive difference, just get that time.
- I also believe we should exploit the fusion power that shows up each day in our company’s data.
- Have you found a vulnerability? Are you concerned about some missing measures or designs that should be modernized or addressed? If so, don’t wait, raise those issues. Speak up and act. You can make a difference.
- “I know what you are thinking.” the irony
- We are the ones who make a brighter day.
- The journey ahead is ahead.
- What are you penning today? What adventures are you crafting by your doing? Get up, get moving… keep writing.
Look, it’s no ChatGPT, but it blew my mind! I’m only using a 4 layer NN with 7 million parameters. In comparison, ChatGPT uses 96 layers and 175 billion parameters! Before the weekend ended, I set up nanoGPT to build a more elaborate model on my data set. It’s still running, but already I can see it has learned a lot more of my style but seems to lack some focus on topics. It’s easily distracted and interrupts its own train of thoughts with new ideas. Squirrel! Nothing like me.
So my JasonGPT won’t be writing my Monday updates anytime soon, but who knows, maybe it will help me come up with some new ideas. I just hope it stays benevolent and kind. I would hate for it to suddenly become self-aware and start…
Connection to imac.local closed.
- My JasonGPT code, text and the nanoGPT setup is here: https://github.com/jasonacox/ProtosAI/tree/master/llm
- “Snakes. Why’d it have to be snakes?” Indiana Jones, Raiders of the Lost Ark first opened 42 years ago, today, June 12th, 1981!
- Attention is All You Need paper: https://arxiv.org/abs/1706.03762
- OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165
- What does Artificial Intelligence look like? Image above generated from Dall-E.