PlaygroundDocumentation API Reference LoginGet started

Products

  • Pre-recorded audio
  • Streaming audio

Tools

  • Voice agent
  • Documentation
  • API Reference
  • Login
  • Get started
00:0000:00

Hi, Victor. Hello. What are we going to talk about today? Yeah, we're going to do a strategic analysis of the AssemblyAI company, which is basically a audio to text chat, GPT, quote, unquote, so you can upload any kind of audio and ask questions and you can process automatically any kind of audio up to 10 hours.

And why we do that? Because there are too many tools on the market and it's almost impossible to even keep up with them. And we just want to share with you all the tools we use, the good and bad and ugly, and how they can be efficient and how they can help you to bring your business to the next level and actually how we use it and what kind of businesses can be built on top of these tools which we are using. And can you give a quick recap of this company? Okay, so what is AssemblyAI? AssemblyAI is an API first application.

So it is built for developers and current, and we check and they already raised $63 million from big names like Excel and Inside Partners and also the exe of GitHub Netfliidman. You mentioned that it's an API first company. And those who are not familiar with this expression, it means that, for example, Stripe, maybe you heard about Stripe.

It's a payment processing company and they grew by first catering to developers. So they made it extremely easy to integrate payment into your applications and that's how they grew. And they are kind of like using the same playbook but in basically the audio processing field, is that right? Yeah, it's correct.

And their founder and CEO said the exact word that they want to be the stripe of speech to text AI tools. And also there are good API and bad API. And good API is something like Stripe that even I was able to integrate it with an application.

So it means even if it's code, it's easy to handle and there are a lot of tutorials on YouTube how to do it. So even if it's for developers, but it's for no code and local developers as well. So when you hear API and you are not a coder like me, you don't have to be afraid.

Yeah, don't scare off. So that's kind of like the takeaway, it shouldn't be scary and to be honest, it's not scary. So even you are with a veterinary background and marketing research background and marketing creativity background, you were able to integrate it.

So you shouldn't be scared of APIs. But anyway, so why we are talking about this? Because the number one biggest mistake I see people make, they optimize prematurely. And what does it mean? They kind of try to fine tune, they try to see which model is the best and they kind of like get themselves busy without first staying on a high level and actually trying out things.

And what does it mean? It's like, for example, for OpenAI there's this foundry model where you basically have to make at least $80,000 commitment per year and you get dedicated compute. So in OpenAI's case you can basically fine tune GPT four which is not available for public if you pay a lot of money. But it only makes sense if there's a clear use case.

So it's like one use case and it's a huge scale, right? For example, you are Jasper AI or Gopi AI. So these companies which are offering a very narrow service of okay, I help you to write better so that's what they offer then it makes sense to optimize obviously the model they use. But for everyday folks like you and me and I guess most of the listeners, it doesn't make sense to prematurely optimize.

So instead of going offroad and trying to find a better and faster track, you should stay in a fast lane and actually use companies like assembly use tools, for example, like midjourney and these kind of tools, which are actually usable out of the box and you don't have to get your hands dirty. Regarding the model itself and fine tuning the model itself. And why does it make sense? Because the bottleneck is actually not the cost.

So if you think about that, a few hundred dollars if you spend on GPT four for example, three API is not really much if you're a developer. It's not really much if you have a proper hourly rate, right? The actual bottleneck is time, right? So you should be progressed much faster and deal with the exact problem you are facing instead of just tuning the model itself. So 99% of the time exploration and testing is the limiting factor.

So always choose the fast name use tools that others are already optimized. Obviously if you found a good use case then think about the model underlying the model and how it can be optimized. But most of the people I think this is the biggest mistake they make.

I heard an agreement from the other side about this. So actually, this is from Petchya Balog, who is one of the most successful startup pair and now investor in Hungary. And he said about no code to us and I think this fast lane that you're talking about and also no code tours have some parallel things.

But he said that maybe the first 80% is cheaper if you're building a product, but the last 20% can be impossible with this fast lane approach and no code tools. What do you think about this? I don't agree with that. So it's like you will be rebuild the whole system anyways, right? So as soon as you find like a product market fit you have to rewrite it.

So even I have a good friend who is building his company and they already rewrote it from the ground like four times and he's the smartest person in it. So he knows all the mental models, he knows very deeply the different architectures. So he's really like the smartest guy I know.

And they have had to even rewrite the whole code anyways, and they didn't even start from no code, right? Yeah, right. So they properly planned it. Right.

But you cannot plan for everything because life unfolds as you explore. So that's kind of like why Chad GPT is useful, because it can adapt to your need and it's not just like a predefined vase and predefined this is why Chat GPT works, because it's not a predefined decision tree. You have to stay in that it actually can adapt to new situations.

And I think at first, progressing faster means a lot more than optimizing. Obviously the problem is a siren call, like siren song, that it's very seductive, that okay, let's explore new tools, let's explore new models, let's fine tune them, let's check which is even better for free and those kind of things. But if you think about that, it's sexy for sure, but you won't have business faster at the end of the day.

And we are here about the business of AI. We talk about business of AI. So if you are interested about making money, then being faster and quicker and iterating faster means makes more sense than just trying to optimize the underlying models.

That's what assemblies it's kind of like a fast lane in audio AI. And just as I said, it's basically Chat GPT for audio. So what does it mean? You can upload up to 10 hours of audio to them and they transcribe it, right? And they use Conformer model which was published in 2020 by Google Brain and it's using the Transformer model with attention management and so on.

But anyways, you don't have to get into details, but just for the listeners to have a good anchor point. It's kind of like similar what Bisper is doing. And Whisper was published by OpenAI.

It's kind of like a similar system, but assembly AI's audio to text model is actually producing less error. So it's more accurate and it's quite reasonable. The pricing is quite reasonable.

What does it mean? It's reasonable? So if you want to analyze and transcribe an hour of audio, it's just a couple of bucks. It's one to $2, that all it takes. So it's quite reasonable.

So you can basically upload it and ask any kind of questions. Let's get into how is it already useful? So if you want to use it out of the box and even, let's say without much coding experience, they have a nice playground, you can upload audios and you can ask questions. So how can you use it today? How do we use it and what is the reason that we talk about them? Because we use SMB AI for every single episode.

We use for this episode in the research phase. And also we use it in the post production when we write the show notes. So what we are doing in the preparation is we looking for similar episodes, we're talking about the similar topic.

So, for example, in the previous episode we talked about mid Journey image generation and I gathered five podcasts, long podcasts about mid journey and it's so Seattle implant, privacy reasons and every kind of topic. And what we did is just get the YouTube link, put it into the playground in AssemblyAI, it took about five minutes and it has the whole transcript. And after that there is an interface there as well, a chat GPT like chat interface where you can ask about the transcript so you don't have to read them all.

And it's very long. So currently it's longer than the maximum capability of Chad GPT. So it's very hard to analyze it with chgpt.

But you can ask an AssemblyAI about what are the key takeaways of this podcast, what are the highlights, what are the most actionable steps and we gather it and we take it in note and we use it and also we ask it how could this podcast episode be more viral and how could it be more engaging? And this is what we try to incorporate in our podcast. Can you give an example like how useful it is? So it's kind of like what was the feedback? Which was either way just like reaffirming or it was completely new or we didn't have it top of the mind. So yeah, it makes sense after the fact, it says, but we weren't thinking actively about that.

Yeah. So for example, there's a recurring feedback from AssemblyAI and this is lack of focus in podcast episodes. Also, I think this is true for us sometimes that we touch a lot of topics, but for example, in mid journey it was an actual feedback from AssemblyAI for other podcast episodes to keep focus on something and go deep.

And this is we try to achieve, for example, we talked a lot about how to create a logo, how get inspiration to get a logo. So we try to make chapters and dig deep in chapters. Okay, that makes a lot of sense.

And also it's like the easiest and lowest hanging fruit is like feedback, as you said. So after we do an episode that's kind of like, okay, how could it be better, right? And we get the feedback and we try to incorporate that. But also this research is quite ingenious because we can kind of go ahead and see what others produced and you can do the same.

So it's like whatever field you are operating in, just go on YouTube, find some videos and analyze it. So basically just feed it into AssemblyAI and ask questions. So that's quite neat.

And also, I guess it's not just for content creators but also for enterprise, it's quite useful as well. So, for example, in the case of AssemblyAI, they can use the same thing, right? So they can use this YouTube search about like Whisper and Google products or these transcription services and see what people are talking about. Yeah, so for example, in case of OpenAI, they are famous enough so a lot of YouTubers talk about their products and one of them can be Whisper AI right? And so they don't have to listen to hours of YouTubers and some of them are beginners and it's very hard to extract the insights from these videos but you can feed AssemblyAI with like 100 hours of podcast and talking head videos of YouTube and you can just gather the insights, what they are talking about, right? It's insane.

Like not just the cost side, but also the time saving side as well. So you don't need to employ someone or several someones to parallel process everything and make your research, which takes like, weeks or months or something, but in this case, it literally takes like 1 hour. And you can go through hours of video and analyze them and summarize them and see the big points, see the missing points, and then kind of like, summarizing them.

You have a good outline of what you want to cover and it's also good for podcasters as well because if you think about that, it takes around ten to 15 hours at least to research someone, right? So to read everything what they produce, listen to everything what they produced. So when you invite a guest for your podcast to interview and you don't want to ask the same questions that everybody asks from him or her, like and you want to get a new angle and the actual process is to read a lot, listen to a lot, like ten to 15 hours. Right, but how can assembly help this process? Yeah, it's just like in this case, if you already just make a quick search on YouTube, it's already in an order, sort of the search is already sorted by obviously some relevance factor and those kind of things.

So if you search for someone's name, then you can just basically copy the top five YouTube videos, feed it in the assembly and ask these exact questions. Okay, what are the main takeaways? How could these interviews be better and what are the missing points? If you just ask these questions and summarize them literally in 15 minutes, you save like days or weeks of work and you can create much better work. Yeah, this is obviously great for content creators, small content creators like us, but also it's good for enterprise if you think about that, like sales coaching and compliance just comes to my mind.

That's something which is quite hard to do otherwise, right? Because if you want to keep a close eye on how your salespeople are following sales scripts right, it's quite tough because you have to listen, right? And what's crazy, it already can be done. It's already possible. We're not talking about what is going to be possible in the future, but what you can do now is after every single sales call, you can integrate SMBI and transcribing the sales call.

You provide the transcript because there's a context. You can provide context. So this is the transcript which has to be followed and ask, okay, was it followed and how could it be better, right, but not do anything else.

So it's just giving a score after each call. That how well the script was followed. Just by measuring, everything gets improved.

So that's kind of like Management 101. You have to first measure and what gets measured gets improved, right? So if you feed back a timely matter so as soon as someone is finishing and they see and you don't have to even incorporate any kind of incentives into it, what you have to do is just really just feed it back, right, so they see on a dashboard how they fare. So how many calls they make.

They obviously know that already. They know their closing rate, but it's kind of like a big gap between them, right, so they want to have higher closure rate, but that's what they see basically now it's like how many calls they make and how many successful sales or lead generation was done right. But with this, it's actually a fine tune.

It's not a binary thing and it's not just like one number. It's kind of like a moving scale of how well they do. And if they see on a graph, obviously they're going to improve.

So most of the people are going to improve if they have a timely feedback. And currently it's reality. You can do it now, right? So there is an API with SMB directly and also there are some third party applications who are integrating with CRM software, like HubSpot and other ones.

We'll talk about them. So this is currently reality. This is not science fiction, what we are talking about.

Yeah, right. And I mean, it's like nothing beats instant. So that's the beauty of Chat GPT as well.

I can ask stupid questions, which I wouldn't get into otherwise. I wouldn't ask what's the difference between different kind of peppers, like spices? Because there's actually lots of peppers and I wouldn't ask it, but I could. I did it and my mind blows was blown that I ordered like 15 different peppers and they're all different and like sachuan pepper and timothy pepper.

And some of them have numbing effects. Some of them have like a very citrus flavors. The whole new word opened up in five minutes.

Just because chedgpt can answer an instant, that's for fun, right? It's not making more money for me. I just get more knowledgeable because I can ask this question because I know it won't take like half an hour of my time. In two minutes, I will know everything what I want to know about a new topic.

But in a business setting, if timely, instant feedback is given to your employees to your salespeople, it's invaluable. Right? And they actually want to do better, I guess. So they actually want to learn more.

They want to be more successful because that's why they chose the field of sales. But now there's a tool to not just do like a weekly, quarterly, monthly, I don't know, review and just go through and listen to a few phone calls and going through them. It's like specifically for every single phone call, there's a feedback.

And that's insane. This is mind blowing. And also what I want to mention regarding this is like this is just like the easiest way of integrating it.

So you don't have to do anything fancy. You just give a feedback basically with the existing tool. But what you can do also, and it's not even more complicated or complex, is ask like, okay, give me top three things which went well and top three things which could have been better.

Right. And already it's assembly, I can coach and can give answer to these kind of questions. Right? That's also good because it's not like you have a score of 78% and you kind of like not follow these steps, but also you get positive feedback of what went good and also you can get very specific feedback about what could have been better.

And there is this Tiganic effect. I'm not sure whether you are familiar with that. Can you tell me what is it? The Tiganic effect is basically describing why Vaders can keep in mind lots of orders, but as soon as the orders are done, they completely forget it.

It's kind of like the distinction, the big contrast between they have seemingly have a very good working memory, so store a lot of orders in their head and suddenly as soon as the table is gone, it's just like the memory is vanished. So that's kind of like, intriguing thing. That Saganic effect basically says that open issues take up mind space, right? So that's what's happening with Vaders as well.

So they have all the orders in mind and it's taking up mind space until it's closed. So closure, it's kind of like freeing up mind space. It's helping you.

And LLMs of these large language models are good because they get you an instant closure, right? That's why, for example, if I'm reading a Michael Burry tweet about some economic thing which I don't understand, it's not bothering me because I'm just asking Chad GPT to explain in plain English and just copy paste the tweet. And it's explaining to me and that's the same here with salespeople as well, that, all right, here was a sales call. Let's get a closure.

And you can get a timely feedback, a timely closure. As soon as the call is done, evaluation can be done, like, okay, what's the percentage you did and what were the three best things, top three things which could be better? And that's kind of like freeing up mind space and then you can move on instead of just like reliving in your head. Okay, I'm not sure what could have been better.

Right? So it's like taking away this whole uncertainty out of the picture. One more thing here which just came to mind that we talked about with research and coaching feedback for content creators. So it's not just post hook, so it's not just after the fact, it can be used before making a call.

And obviously it doesn't have to be assembly, it can be chat GPT, kind of like large language model where you feed into the system. Like okay, this is the company I'm calling, this is the person I'm calling. This is the LinkedIn profile of the person I'm calling.

This is the website of the person I'm calling. And then give me top three things I should focus on. Considering this is our sales script, right? It's not just like okay, there's a sales script and I don't really follow it because it's the same one, same word all the time.

So it's not really personalized. But you can personalize it and just like this to the top three things you should focus on, even the history. So it's like what was the communication chain of communication with the exact person? And a good friend of mine, they created a CRM system and they have 2500 clients.

And what he did is just like scraped all the websites of their companies and fed into chat GPT, got back the gist of it so what they are working on and then make some clustering on it, on top of it. And now he has an understanding what are the different needs which are actually solved by his software. So that's kind of like the idea you can do today as well.

It's not even complicated, it's quite trivial to do, but that's kind of like just a foot for toad that you can do it the same way as with the content creation for the research purposes. You can do the same thing for sales as well. Yeah, it's very interesting.

I just started the research for a new client and this sparked me an idea that they have 400 clients as well. So this is big enough to scrape their websites and understand the profiles and cluster them. Of course they have segmentation and things like this, but yeah, it's a good idea.

Victor, thank you. I'm going to send you an invoice. No worries.

So let's move on in a business setting where you could use again like today, so you don't have to wait. And we are not still not speaking about the future. So in operations and customer support, aggregating, voice communication is a huge plus.

So once again, don't know, nothing fancy, right? It's not like you need to have a complicated pipeline of different modules of AI and those kind of things extremely easy. Just like you have shitload of voice data just feed it through AssemblyAI or a similar system and then just like nothing else. Just like basically just aggregate them or make a summary of what is the need of what is the category it should be put into and aggregating similar things together and processing batch processing them like warranty issues.

For an ecommerce company, if someone is processing warranty issues, it's completely different than payment issues, right? So if you have to process like ten warranty issues first, obviously whoever is doing it, it's much easier if it's already aggregated, right, categorized into and they do just the same task. That's kind of like once again, Management 101 is like it's easier and better and higher quality unless mind space is taken up if your job is batched into similar tasks. So that's kind of like, what can we do today as well? And obviously on top of it you can extract information and ask like, okay, how could we do better? Right? Here are lots of feedbacks which we get and how can we improve? If you just ask the question like, what is missing? How could we service customer better? And you just get this answer and aggregate those answers.

So you do this for the audio audios and then get the result and summarize it with chat GPD or even with SD, it doesn't really matter. Then you can obviously get a very specific feedback about how you can improve. And it's not just like an interesting thing is like okay, obviously this is something which you could do by just like employing lots of human capital, right? So just like throw a lot of people to the problem.

They have to listen. This is just a veriflow, right? So it's like a veriflow of you have to listen to the call, you have to answer these questions, okay, what is the problem? How could we do better? Then give the task to someone to summarize it so this can be done already. And if you just sit down, you should just go to the customer support team and have a discussion, weekly discussions with them.

It's going to be there already. Like what are the top three issues? But what's intriguing and what's get unlocked and this is the first time it can be used on a scale is like emerging problems. So what is a new problem, right? What is below already? The threshold of really noticing, but it's kind of like starting to bubble up.

What are those issues? Kind of like tackling these issues before they grow into a huge and even bigger problem, which is a bigger challenge for us. So that's kind of like also what's possible regarding customer support already? So we are not speaking about Sci-fi and those kind of things. We only talk about what's possible.

It's going to be in the show notes, there's a trdr toolkit on Res summary example on OpenAI. It has an exact prompt example prompt of okay, this is the input text and what is the summary of it. So you can just copy this information and that's kind of like reducing all the audio to the trdr and getting the trdrs and just reducing them to the biggest takeaways.

So that's kind of like the workflow, if it makes sense. I would like to mention two other use cases here which is currently available and already a lot of companies using. First one is meeting notes and the second one is hiring.

So about meeting notes, there are several applications that will transcribe your company meetings and also make the key highlights. So if currently people are taking notes, it takes a lot of time and money from your coworkers. And also it's very easy to not forget things.

So, for example, highlight the to dos in a meeting note and you can integrate it in your humans make mistakes as well, so it's like we love to make fun of AI models making mistakes, like Hallucinations coming up with something which is not true and those kind of things, but humans actually make mistakes as well. So the quality of the output we do is not 100%. Depending on a task, it can be even much lower, like the 60%.

And definitely if someone is like having a bad day or they're just doing the same task for a long time, they get fatigued. So like, this task fatigue kicks in. So that's the beauty of these EAI models, that they don't get fatigued.

Yeah, obviously they can make mistakes, but as they improve, they constantly improve. So they stay at that level and they don't have a bad day after that. There's a famous story about, I think about the New York field harmonics, that their admission was 15% women and they changed something.

And the next year they had almost 50% admission of women. And Victor, can you guess? What did they change? I guess they just masked the gender of the applicant. So the hiring managers didn't see it.

Yeah, actually these are very experienced musicians who are listening to the applicants. And what they change is in the previous years they were able to see them, how do they perform, and also they saw their gender. The next year it was like they just hear it.

Yeah. So they reduced it to an audio. It wasn't a video anymore because that's kind of like the output you're going to produce.

Right. Whatever is the role you are getting hired, you should be judged based on the output which is required, which is mostly most of the time, it's not tied to your gender or race or whatever. Right.

So in this case, because you're going to be a musician, you're going to produce music. So your output should be judged based on the music you produce. And this is genius.

So I love it. In the case of big companies like Google, they have about five rounds of interviews for new hires and in this case, SMB AI. Can summarize these interviews and give them a rating and you can compare it with the people who are of course biased.

I'm not sure, but maybe assembli is less biased than people. But also this is developed by people and develop of content created by people. But I guess still there is an opportunity to make less biased decisions in hiring.

Yeah, and just to make it clear, we're not talking about automating away the decision making, right? We're talking about augmenting it and making it less biased or following just the same bit sales script. Right. So just following the sales script, following the hiring script better, right.

So these are the criteria, objective criteria which we look at. So please judge it. And obviously it has to be reevaluated how well it performs, right.

But as an addition, it can be huge in getting through more candidates and basically judging them faster and also giving them personalized feedback. Like, okay, this time is like, it wasn't a good fit because of this, this, or, we just keep you in the loop because you were a good fit, but you just didn't make the cut. So in the future maybe we would like to hire you.

So just like the communication wise and personalization wise. So basically, just like supporting the decision as a supporting tool, aiding the decision, it can be invaluable and it can even make it much better and much more objective in my mind. But obviously you have to be mindful about the downside.

So these are not perfect, you shouldn't treat them like perfect, but just like testing them. It makes a lot of sense if hiring is an issue for you. Okay, so we talked about the compliance, quality assurance, hiring, meeting notes and sales as well.

This is nice. Let's move on to the next topic, which is giving ideas to entrepreneurs out there who want to build on top of assembly or similar tools. So they want to create businesses because now we have just covered how businesses can use it over the existing businesses.

But obviously there is opportunity unlocked and let's look at that, what kind of opportunities are there? Okay, so first, transcription for niches, what is it? Victor transcription is like yeah, okay, assembly is already providing the same solution. So how can we build a business on top of it? But that's the beauty of niches. So niching down like to blue Ocean is always a valid strategy.

So if you just focus on for example, business meetings or negotiations or interviews for UX researchers or lecturers for educational institutions, so mocks. So these massive online courses. So if you think about these niches, they have specific needs.

So even though the main engine is the same like in a car, right? So the engine can be the same. Maybe the car itself is like it's good for offroading, it's good for Formula One, it's good for putting out flames as a fire truck or something like that. So even though the main engine is the same, these different niches need different things and if you understand them and if you have a good understanding about their pains, you can build a better flow.

It's not just about transcription, but also for example, if you educational institute, it's not just transcribing but also you would like to create notes for the different lecturers and also like study cards for the students and also tests and also mock tests. So the thing is here that you have a study material which can be lecturing in the school. And usually one semester is about ten to 20 hours of lectures in a given subject.

And you can chat with these content. Like you can ask about what are the key highlights, you can ask special questions. How can I solve this? What is mentioned about this topic? Right? Yeah.

So it's kind of like two ways. So first is like as the content producer yourself, you can spin out different kind of contents from the same content, right? So it's got like I'm having a lecture in English but if I transcribe it, I can internationalize it, right? So I can internationalize the text and I can hire where I can use text to speech solutions to produce the same material in different languages, right? Or I can just create more value for the listeners or for my students to creating summaries automatically, right? So I can do it on a mass because I have lots of data, lots of audio and it would take like an enormous effort on my side. But now it's easy because if I just want to make a summary, I just want to make a study card and those kind of things, I can provide it, I can give bigger and better value to my customers, right? And also what's unlocking is I can create a chat interface so the students, if they stuck, they can ask in their own language so they can ask in their own language at a specific point they are asking, they can get a timely instant feedback on it, right? So they can instantly whatever is a stacking point can be resolved.

We have to have two ways. It's like either way as a service for their customers or just like producing more content. And it's for content producers, it's easy.

It's like for us, for example, specifically if we have an episode, we have to create transcript, of course, but also like show notes, also LinkedIn posts, also Twitter posts, also email threads, written PDF protocols so it's like different kind of and different formats can be generated from the same source and it's scalable, right? So it's not like you have to hire someone to do it if you have a process, it's scalable. Victor, I think we should include the assembly, summary and chat interface for this episode in the show notes, right? So our listeners can ask about the content of this episode. Yeah, sure, absolutely.

We can do that. It's not even hard. So you're going to do that.

So it's going to be a link in the show notes and you can click on it. Assembly is going to pop up and you can ask any kind of question about our episode. So that's quite neat.

Yes. Also, just not just translating, but also video can be generated. Right.

It's like text to video with like did or there's like text to audio, like Eleven Labs and those kind of things. So you can generate more audio from the text which you were generating from the original audio. So you can actually spun out a huge new batch of content.

And that's quite useful. I think that wasn't really possible before on a scale, or at least wasn't really affordable. Okay, so just one example here that we talked about, that Udemy is the biggest online course platform currently with about 130,000 courses.

They have the biggest traffic on the Internet. It's bigger than Coursera and Skillshare and other ones. How many people are visiting them in a month? About 60 million.

Oh, wow. Okay. And we checked and 97% of these 130,000 courses are below 15 hours.

So currently what they could do maybe just start with the most popular courses, but they can make a transcription from the whole course and students can ask about its content. So it can be an interactive experience without any extensive investment. Yeah, right.

So that's neat. It's not like you have to do machine learning operations and fine tune models and test models and those kind of things you just say kind of like plug it in and do it. It's like even a smaller scale.

So you don't have to do it for 130,000 courses, but as you said, maybe just like get the top thousand, split them in AB groups, do it for the B groups. Right. This chat integration, leave the A group alone.

So the 500 is like nothing changed. The 500 is integrated. And let's see.

So it's like you just put it out and you can actually measure the difference of maybe engagement, of maybe purchasing, maybe satisfaction. So actually what can be done is like, Udemy can do this test and evaluate whether the investment makes sense or not. Because if it makes sense, they can once again just like even roll it out to other courses.

Or they just can maybe iterate quicker before rolling out so they can do tests quite quickly instead of just like trying to figure out how these AI models work. All right, let's go to the next Business, which is a research tool for content creators, radio, TV, podcasters. So automating the finding analysis of content.

We already share that we use it, so it's like we already use it for every single episode for researching. And also, as far as I know, a lot of media companies is using it to find the topics of their extensive amount of media, videos, audios, and they use it, how should they promote it? On social media? Yeah, right. So we already use it.

But this can be a tool as well. So if someone is out there and want to make it as a tool and make it easy for anyone without any coding experience, this could be a tool. And last time we talked about using testimonials instead of personas because personas can be hard to relate to and the testimonial can convey the gist quicker and better, right? And I just coined it as a testimonial.

So just using testimonial for this research tool, let me give you a short testimonial. So, as an overburdened Astrophysic podcaster, the Content Research Automation Tool has been my lifeline. It has miraculously transformed 15 hours research slogs into mere minutes, sourcing analyzing obscure yet crucial content with pinpoint accuracy.

It's not just a tool, but an invaluable team member keeping my podcast fresh and my life balanced. For content creators drawing in research, I wholeheartedly recommend this game changer. So this is like a small testimonial which I just made using my testimonial tool to bring it alive.

So if this is resonating with you creating this Content Research Automation tool, please do it because I guess the market would be extremely happy about that. All right, let's do two of these and then we can move to the next step. So there's also customer support analyzes.

So we talked about that. Okay, we use it, right? Yes, we use it, but to analyze feedback and it already can be used and implemented. But this could be an easy to use tool in any kind of niche.

So once again, using the testy model as a customer support manager in a bustling tech firm, the Abcod Analyzes tool has revolutionized our service. It's like having a vigilant analyst on board proficiently deciphering customer sentiment from hundreds of support calls daily. Not only has it improved our understanding of customer pain points, but it has also tremendously streamlined our workflow.

It is a priceless asset for any customer centric organization seeking to optimize their service delivery. So once again, it's kind of like giving you a better understanding of what this specific tool could give someone and let's do one more and then we can move on. So it's a sales course as a service once again, upload calls and give specific recommendation to improve.

So the testimony for this as the head of Sales at the fast growing startup, the Sales course service has been instrumental to our success. By simply uploading our sales course, we get clear actionable feedback to refine our strategies. The precision and relevance of the recommendations have led to notable improvements in our team's performance.

I wholeheartedly recommend this service to any sales team looking to step up their game. So once again, I guess this is kind of like brings this whole tool to life much better than a persona could. And if it's resonating with you and you want to create a business around this, I can wholeheartedly recommend it because, yes, this is painful and this is actually useful and practical in any field where sales people are employed.

Okay, let's move to the community part. If you want to build a business on top, let's see how supportive they are. How could AssemblyAI be better? So let's do some quick analysis of first.

What are the good and the ugly part of or how could they be better? So let's start with the good one. Yes. I guess most of the listeners who are familiar with Whisper, they may ask like okay, OpenAI has a similar service.

You just upload an audio, you get that transcript. What's different? First, it's more accurate so it makes less error. So SMDI makes less error.

And also Whisper through API has a file size limit of 20 megabytes. So what does it mean in practice? If you want to use Whisper out of the box and you don't want to mess with the model itself and you want to use through an API, then you have to first chunk the audio file, right? So you have to chunk it up and make sure that it is chunked up properly at maximum 20 megabytes. And then you have to feed it into very painful to me, you have to feed it into you get back a transcript with obviously timestamps, but then you have to offset it each timestamp because you are making in 20 minutes chunks.

Right. It kind of takes a lot of steps, at least more steps than you would guess. Right? And in comparison with SMDI, you can feed up to 10 hours.

The More model, I guess it's easier and quicker to use and it sounds like price. Similarly. So for our videos it's about 2gb/hour.

So 10 hours of video can be 20GB and it's compared to 20 megabytes. I mean, it's 1000 more, right? In our case, because it's set from YouTube, I'm not sure whether they use the two gigabyte file size of file resolution. I'm not sure about that.

But yes, it's bigger. Also they just use the audio. So it's not a perfect comparison.

We are talking about like hundreds of megabytes. It's like 150 for video like us, if you only look at the audio. So it takes like still seven to eight chunks and it's kind of like a painful in OpenAI's case.

And also the customer support, you have some experience with them so you can, I guess, share. What was your experience so far? I was using it for episode two or three. So it was a few weeks ago and the interface stuck at one point of the process of generating the transcript.

And I was very frustrated because I wanted to get it done fast, the whole editing of this episode. And I just write to the online chat on their website and in two minutes they reply that oh, I see you have a problem. I will check it with the same video link what you are doing.

They checked it in 1 minute and they said oh, this is a big problem here, so I will escalate it to the developers and in ten minutes they wrote that okay, we fixed the problem, now your transcript is ready and you can do it with other videos. So it was like mind blowing ten minutes. Victor, that shows that they are actually AI as a service or AI model as a service has an upside of you don't have to spend like days and lots of engineering hours to fix an issue if they come up because their main and only job is to provide a model and make it reliable.

Right, that's extremely good. So also their social media presence is quite neat and even the developer education is quite good as well. I remember I came across them at the end of last year and they made a blog post about how to use Whisper at that time.

But they also make lots of posts about different technological aspects of what they do. But also in LinkedIn they have 21,000 followers. They constantly posting videos, post case studies and use cases.

So their marketing communication and is quite neat, I guess. Yes, they are focusing on YouTube and focusing on LinkedIn, but for example, they have only 30 likes on Facebook and not too much content. So I think why they have $60 million in funding.

They are very focused on where the target audience is. They are very focused on developer education and also on their hiring page, they are hiring marketing people and almost all of the roles are in developer education. So they know what their way of doing things.

Yeah, right. As a developer I can attach to it that I learn a lot on YouTube. So if I want to learn something, the first thing is I just go on YouTube and see who did recap or overview the topic I'm interested in.

So that's quite neat, but there are a few ways or a few aspects where they could do better and let's go through them. So for example, in OpenAI's case, if you go to the platform Dopeenai.com, which is for developers, on the first page you see the use cases and actually just from one click you can get to the examples and you get a big list of examples and use cases and you can instantly try them and that's quite neat.

So that's something they could copy to have a huge list of okay, how this in practice of what just we discussed now in practice, how in each field like just click here, this is a sales call example, sales call and this is the exact question how you can improve it and those kind of things. So just like making it extremely simple to understand the use case and with just one click, you just click on it and you get the result and you can actually change the prompt a little bit and mess around. So it's just kind of like reducing the cognitive load of trying to come up with something useful just to stay on this train of thought.

Actually just a quick chat or quiz. So understanding the use case of the person recommending use cases based on it. So it's like if you have the library of the examples, it's quite trivial to ask a few questions and recommending, okay, these are the top three examples we recommend for you.

So it's not just like, here are the examples that you can search them and you can go through them one by one. But also just like I have three questions really, it's like, okay, what's the field you are in, what are you dealing with, what is the biggest hurdle for you regarding Odo and those kind of things? Just ask a few questions and actually these large language models are insanely good at just like picking the top three examples which are relevant for you. So that could be done as well.

Jasper AI, which has the best onboarding experience ever. So if you are a marketer out there, if you're an entrepreneur out there, just go to Jasper AI and check out how they onboard you and be really mindful because what they do is they push you towards the education journey. So if you watch videos, educational videos, you get points and you can use those points as credits.

So actually you can spend them with their service. Right. And it's ingenious because they push you through from not being really familiar with their offering to actually you watch all their videos, you know exactly how it can be done and you actually earn money to spend in the meantime.

So they gamify the onboarding experience, right? Yes, sure. Which is very rare. I work with a software as a service company and onboarding is very hard to find the right amount of pushing the people in what direction.

Because the thing is that you want to just show one kind of function and you want your users to experience value and be happy. Then after comes the next function and experience value and if you do it too much, they will burn out and cognitive overload and go away. But this gamification thing is I don't see with a lot of companies, right, yes.

It's not common. So it's doing it well, it's not common and Jasper, it's like world class. So I honestly don't know anyone else doing better than them.

So absolutely, you can learn a lot from them. As you said, it has to be sequential so it cannot overwhelm the user. So you have to be quite mindful about like, okay, what's the next step? That actually happens with physical education as well.

So in sports and fitness, if you have a client who never worked out in their lives before then you have to offer them something which is easy and at their level, right? And this progressive nature of giving harder and harder exercises, it's kind of like the gist and bread and butter of a good coach, right? So they pinpoint exactly where are you at currently on the big picture and then they can just like okay, help you to make the next step basically. And that's kind of like not trivial but you can get lots of inspiration from Jasper, I guess. Okay, one more thing.

When we started to prepare for this episode, I ask you why do they do audio only? Because they have a longer context window than Chad GPT which means you can feed longer text and currently you can only feed audio to AssemblyAI. And I was listening to a few podcasts with the founder of AssemblyAI and I understand they do three things. So first they do the transcription part so you get the audio and it will generate a text.

But this text is usually not very human readable. So the next step is to make it more readable. What does it mean? Make sentences, capitalize the sentences, also make paragraphs, make chapters.

So this is the second part for those who are listening normally. So back in the day, like the first, these transcription services, they basically transcribe words. So it's like a sea of words what you get back without punctuation, without proper paragraphs or sentences and so on and so on.

So it's just like a sea of words. And what they do is obviously for this transcribe but then they kind of edit the transcription to make it like a proper text. Yes, and the third part is the fine tuning.

So they can fine tune it for make a summary or the Quality Assurance for Sales course or also in customer support, they can fine tune it to create SMS based on the customer support call and make it available in the CRM system. So in this case the employee just only have to click on one button, just send the SMS for the user based on the content of the call. So there are three parts transcribing the test, making it more readable and also comes to fine tuning.

And this is what SMDA is doing in a scale that they try to fine tune for a lot of use cases. So they try to find a lot of product market fits, not just one. But they say the ultimate product market fit is the 100% Speech recognition without errors.

It's very similar to autonomous driving. So there is a use case to have an autonomous car in a university campus and there is a use case for having a highway so it can go on a highway and turn, but after that you have to steer manually. Yeah, but the ultimate use case for autonomous cars is to handle every kind of environment, handle the city, handle the big traffic, the low traffic, the highway.

So this is what they are doing. They are focusing on the big use cases. Yeah, that makes a lot of sense.

And also since they have their own model so it's like the engine, they made their own engine, basically they train it off on 65 terabytes of data or something like that. Yeah. And it's very interesting that you mentioned in the previous episode, the Chad GPT is trained on less than 1 data when it's text only.

But now it's audio, so it's a magnitude bigger, it's 60 terabytes, it's about 650,000 hours of audio. And now we can say that they are very good. The error related rate is lower than Whisper and whisper is already mind blowing.

So it needs so since they're building everything on the engine now it is AUD only. But in the future it's going to make a lot of sense to just focus on the problems they solve and yeah, their model is just like the audio engine is one thing, but also open it up to other kind of inputs as well. Let's discuss some possible business ideas.

Right? Great. Okay, first one, audiobook analysis. What do you think about this, Victor? Sell it to me.

So what's the problem here? I think for marketing. So think about a writer writes a book and they have to generate a lot of social media content. For example, social media post, summaries, blog post and I think AssemblyAI make it very easy.

First, again, 95% of books are less than 10 hours if somebody reads it, so they fit in the current limitations in AssemblyAI, it can chunk up from different kind of content and ten hour book can give enough content for like half a year for the writers to post on every kind of social media channel. It's basically like content repurposing and promotional materials for writers who are writing books. Yeah, you need like a landing page, right? What should be the landing page, what should be the heading, the subheading, what structure should it have? And also emails.

Right. So it's like email sequence if someone's signing off. So what could be the values given an email sequence? What could be Twitter posts, what could be LinkedIn posts and those kind of things.

So it's like extracting and repurposing a book. It makes a lot of sense. And this is where, for example, now it's like currently to use assembly, you should make audio from it first so that generates an audio and then you can get all these information out from it.

But now if they find a product market fit here, it makes a lot of sense. Just upload data of the book itself, which is already written because it's easier to work from it. I like this idea.

I'm curious though, it's like how it could be priced and sort of median book writer doesn't earn a lot of money, so they don't have a huge budget. So it's kind of like a question how well this could be monetized but yeah, it's definitely saving a lot of time and it doesn't have to be sold to the writers directly. It can be sold to editors, to publishing houses.

Yeah, it can be sold to the aggregators basically. Next one is video Explainer feedback. So in one of our previous company we created a two minute video Explainer Video and it took at least one month or maybe six weeks.

It was very slow the process and it's just the creation. And now we can write a script with the half of Chat GPT we can get feedback on that and also we can generate the audio with our voice for example, with that script. So we have the audio of an explanatory video and now we can get feedback from AssemblyAI.

How can we make it more viral? How can we make it more engaging? Where should we put hooks in the video? And we can ask these coaching questions and we can produce Explainer video, I guess in two weeks and it could be way better quality. Also it's like integrating into other tools as well. You mentioned a lot of no quote tools and low quote tools and you used them in the past.

So I guess they could grow by integrating into like Bubbleglide Adalo and chat interfaces like Chat GPT plugin or databases like Aid Base or Superbase. So just kind of like aim to the developers, the local developers who are using different tools to speed up their workflows. So it would make it easier to developers actually using and building on top of SMDI.

Next one is for YouTube creators and YouTube creator is a huge community victor. There are I think more than a millions of YouTube creators. There are excellent tools like TubeBuddy for A B testing subnet images of YouTube videos and so there are a lot of companies focusing on these creators.

I think companies should move with the help of AssemblyAI for this place as well. So for example, instantly analyzing a video and asking the question of how to be Mr. Beast.

So for example, having a hooking at the beginning and how to keep the attention on a WordPress level like Mr. Beast. So this is the questions you can actually ask from SMB AI.

Yeah, that's quite neat. So I guess since Mr. Beast is so inspirational for most of the creators, just helping them because Mr.

Bez is quite generous regarding sharing all the secrets. So he is willing to share what makes a good video. But it's not practical in a sense that okay, he shared that like okay, the first, I don't know, 30 seconds should be a lot of cuts.

So it should be at least 15 cuts in the in the first 30 seconds to engage and pull in the viewer. But that's kind of like not relevant and not instantly applicable for you. But with the large.

Language model with an AssemblyAI model, these kind of things can be tailor made for your own content. So if you are like a cooking channel, you have a coaching channel, whatever channel you are, just like testing gadgets. No matter what you have, it can be tailor made for you.

So a tool can be extremely useful. I guess that okay. Here are my videos.

How do I make it better and just coach me to be more like Mr. Beast? But it can be others as well. So it can be aspirational in a sense to other kind of creators as well.

Okay. Also, it can also generate short clips. This is one of our challenge for us as well.

We want to cut up this long podcast to very short TikTok style videos and also longer YouTube clips, like five minutes. And it's very hard to find what is the interesting part, which part would go viral? And this is again a question you can ask from AssemblyAI. Yeah, and creating show notes is quite trivial as well and we already use it for that, so that's quite neat.

What other use cases are there? So with AssemblyAI, you can feed all the lectures from a semester in their interface and you can ask about questions about all the lectures and think about it can be a great cheating tool as well. So during the exam, you can ask the actual exam questions from AssemblyAI about the lecture content and it will just give you the answer. Yeah, it's a holy grail of cheating during university.

But if we want to frame it positively and ethically, then it's a good coaching tool for sure. So if you want to get ready for an exam, then obviously you feed all the lectures inside and ask all the questions. And most of the time they have example exams.

Right? So these mock exams and getting the questions from the mock exams, you can generate similar questions, which GPT or even AssemblyAI? And then generating those questions, you can try to answer them and compare it what SMB is telling you back. And if you just stack, you just type manually, ask follow up questions. And it's insane because it's timely, it's well tailored, it's just like fine tuned for your specific need.

And it already can be done. It's already there, nothing else is needed. Okay.

And the last business idea that I would like to mention is about Spotify. So on my Spotify so called your episodes, I have about 300 episodes. At least 60% of them are boring and I don't want to hear them.

But I usually listening to podcasts when I'm training, when I'm running or working out, and it's very hard to change track and it's like, oh no, again, a bad episode. But think about AssemblyAI could hear all these 300 episodes. It can give me a summary of them or it can just rate it based on my preference and give me the list of by subject by topic and give me a list.

What podcast is the best on your episode list? Yes, it's kind of like a personalized recommendation tool within Spotify. Okay, just have like a personal assistant, like you give them a list of 300 podcasts, listen to it and give me the five I would enjoy the most. And it can be done now.

And one more last idea is healthcare. Because when I was studying on site at Stanford doing the innovative Healthcare Leader program, it turned out that actually it's quite because I'm an engineer, so I don't have healthcare background normally. And what was stunning for me is that for doctors, it's a lot of pain to sit down and type whatever they speak with the patient and what they use is like they have assistance, right? But it's not always scalable, right? And not everyone has assistance, but now it's possible.

So now it's like, okay, just like speaking with the patient itself and everything is transcribed and can give even words. It can be treated like a digital assistant. Like, okay, make sure that you make a note about this or put it in the important section and this can be done now as well.

So it's like, once again, nothing has to be invented for this to work. It's already there, every building block is there and I think it's insanely valuable. And what they do now, or at least back when I was at Stanford, they were outsourcing it to India, for example.

And on the other side they were sitting someone who was a doctor, or medical doctor actually, and they were like a virtual assistant for doctors. But that's not scalable neither. So I think this can unlock a huge value and also can increase the empathy toward the patient because the doctor can be 100% present, can look you in the eye, can talk to you, and only can just speak like, okay, this is important, let's make a note.

And just like that, in 2 seconds, in 5 seconds it's done. And you can keep on speaking. It has some implication about privacy and those kind of things.

Of course, I mean, besides that, technically this is already possible. Okay, can we touch our last part? Recruitment. This is the part of these strategic deep dives of companies where we look at the recruitment page of these companies and we try to understand what they are aiming for now, what is the direction for them and also for the listeners.

If you want to work for companies like these, which are well funded, you think that what they do, their mission is resonating with you, then maybe we can give you an idea what kind of open positions they have so maybe you can jump on board. And obviously, since this whole field is booming, the stock option pool can be quite valuable. So it makes a lot of sense in a few years frame that working for a company like this, it is an extremely low risk as an employee and the reward is outsized.

So you don't have to be an entrepreneur to make a huge vault with low risk. Okay, so we check their positions and they talk about that their engineering team, there are small engineering subteams, there are one researcher and there are two research engineers for every researcher. So what they are doing in these small teams, they are looking for a possible solution, what they can do and they fine tune their model for this.

So they are doing it. So there are a few open position for engineers in the marketing part. Most of the open positions are for developer education.

We already touch this topic. They have a lot of content on YouTube and also push it to LinkedIn. And the third one is Product development.

And for example, one of their open position at the product development is Growth Product Manager. And what they are doing is API onboarding experience. So this is the same as software as a service companies where you mentioned Jasper AI and that they are focusing on the onboarding process because it will bear attention and it will decrease the churn in the long term.

So onboarding is very important. But they are doing this for API, the API experience for developers. This is like a good point for those who like to develop but don't like to code that much.

So even just educating developers and working on these kind of things makes a lot of sense. You don't have to be a hardcore machine learning engineer for you to find a position here because I have a hunch from the open positions that they are quite well positioned on the deep technical side. And what they need help is actually scaling it's, finding ways to teach it and product market fits and actually how to help others to use their tools.

So it's not that much on the deep, deep, deep technical side. But also I guess if you are deep into this technology like audio to text, you could find a position for sure. But also another point is CS Engineer is kind of like a similar so that's if I had to choose one position which I personally like from the open position is a CS Engineer.

Because if you love solving problems then you can get inside look of how to solve real problems. For lots of clients. That's kind of like a cheat sheet in a sense that you go there and you understand their technology, how to use it and you apply it to lots of clients, lots of use cases and you can see the use cases, how people build, what they build.

And if you spend like a few years there, you can get a good gist and good understanding and maybe you can do your own shtick based on that if you see where the needs are. So I think that position is quite sexy for me as an engineer. But I'm curious what jumps out for you? The gross product manager.

What I mentioned. API onboarding experience manager. Yeah.

Okay. Why we had this segment is because it's a low risk, high return, big funding, good product. So if you listen to my first million, it's kind of like a Sarah's list.

So that's kind of like a shout out, which is basically working for companies with startup, companies with good funding and where the stock option pool can be quite reasonable even in a few years time frame. So, just to wrap this up, one final question is like okay, we talked about lots of ideas, how they can improve if you're an entrepreneur, how you can use them if you want to be an employee, how you can join them. And then these kind of topics were covered well, but what kind of companies are already using them? This is kind of like missing.

And let's have a wrap with this segment. Okay, so I would like to quickly mention a few companies. I think the most accessible is called Grain Co and they are making AIpowered meetings.

So they make the transcription, they make the highlights and also key takeaways from your meetings. It can be for remote teams, customer success, high end recruiting. What we already mentioned very similar is Fireflies AI.

Also meeting notes for bigger companies. I think Alloware.com is interesting because they are managing the whole phone system and all the phone processes and transcribing your calls so you can make the calls through their system and they directly putting the transcription of the phone calls in your CRM system, like HubSpot and also the salesforce and all the big CRM systems.

And for us, I just started to experiment with that. It's called video AI. This is a service that makes short clips, like TikTok style clips from long videos.

And I think it's very useful for everybody who make content in a longer form. And the last one is Runwayml.com, I think a very huge company in a few months or a few years because they are doing text to video generation.

And I think this is the next big milestone in Generative AI. It's also this so called generation to model, which is available for a few people but not available to the general public. But they say that available soon.

So what you can do is you write a prompt to Chedgypt or mid journey, but you will get a whole video. And think about Victor, how new dimensions will it open up. Last time we talked about how even mid journey five x the click through rate on Facebook ads for you and that was static images, right? So it's like imagine if you can explore quickly different kind of video formats for advertisements.

So there is a reason why Facebook already integrated text to image generation within their ad interface and why Runway ML, which were actually behind stable diffusion models as well. So they are also integrated deeply into this ecosystem. And they ensure, just like creating a web based video editor, which is enhanced by AI, so that's their stick and what they do is mind blowing, because you can just remove people, remove objects, you can do basically magical things so we can cover them as well in the future.

And we are using a lot of AI technology for this podcast as well. So if you are listening to this episode and you are still with us at the end, I'm really grateful for that. And also please give us a feedback and we would like to know whether you are interested in further episodes where we dig deep into the strategy of these companies, analyzing how they do what they do and what are their strengths and weaknesses and how you can apply them as an entrepreneur or a developer or as an employee.

If you're interested to work for these companies. If this is something valuable, please provide us the feedback. And we're going to make more episodes like this because we are, as I said, using a lot of AI tools, and they are amazing.

Thank you for your attention and thank you, Victor. Yeah, that's a wrap. Bye.

ctaFooter
Save your work, keep exploring
Create an account to securely archive your transcriptions as you explore our capabilities.Get your API Key