Google’s Audio Overview can turn those boring documents into engaging podcasts

As long as you have content that is in a Word file, plain text, a PDF, or Google doc, you can feed it to Gemini to turn it into an Audio Overview.
As long as you have content that is in a Word file, plain text, a PDF, or Google doc, you can feed it to Gemini to turn it into an Audio Overview.
Summary

This magical new feature uses Google's Gemini AI models to generate short, conversational audio summaries of uploaded documents or search queries. Try it the next time you have to plough through an 83-page PDF.

The New Normal: The world is at an inflexion point. Artificial Intelligence is set to be as massive a revolution as the Internet has been. The option to just stay away from AI will not be available to most people, as all the tech we use takes the AI route. This column series introduces AI to the non-techie in an easy and relatable way, aiming to demystify and help a user to actually put the technology to good use in everyday life.

The first time I heard an article I had written being discussed, I sat up and listened in utter surprise. Two people I had never come across before were deep in conversation about what I'd written. This man and woman team went through everything, making up a slick podcast. These were AI voices that sounded totally natural and pleasant.

This kind of conversation is generated by a feature called Audio Overview. To experience it immediately, download the Gemini app on your phone. Tap the plus sign at the bottom and navigate to one of your documents. Once uploaded, see the tab on top of it, click - and go make yourself a cup of coffee.

By the time you get back with your streaming cup, the Audio Overview should be ready. Click, as indicated, and sit back to listen. The two AI hosts will now talk about your content. 

And they do so with impressive clarity and skill. It’s no gimmick or party trick. 

Also read: Why India is so far behind in the fight for AI supremacy

Listening to content can be a great way of absorbing it. Anyone can get tired of reading, since we have to do so much of it each day. As long as you have content that is in a Word file, plain text, a PDF, or Google doc, you can feed it to Gemini to turn it into an Audio Overview. I was putting off going through an 83-page document, when I figured I could quickly get the general gist of it with an Audio Overview. At work this can really help productivity. It’s also great for just giving your eyes  a rest. If you happen to have a visual impairment, the feature is a relief as you can get so much more done. 

NotebookLLM: podcasts from anything

Audio Overview can be even more magical in its original home, Google's NotebookLM. To find that, go to your browser on any device and type NotebookLM in the search bar. Sign in with your Google account and you're in. Add up to 50 items of content including articles, notes, YouTube videos, presentations and more, to make up a notebook. All of these will be combined into an Audio Overview or a more full-fledged Deep Dive conversation through the Chat and Studio tabs. This does take a few minutes, so find something else to do for a bit. Once the conversation is ready you can listen in the browser, or download for later. Or even share it. 

This amazing audio feature gives you more control in NotebookLM than it does in Gemini. NotebookLM does have an app, but that doesn’t seem to have all the features. You can select the playback speed, the length of the conversation, and incredibly even the language the AI hosts should speak. And yes, Hindi is on the list, making it possible to reach a wider audience with that content. It’s easy enough to imagine the feature being used for training and education, making it so much more widely useful. 

Also read: AI didn’t take the job. It changed what the job is.

As if all this weren't impressive enough already, here's another way you can control the conversation. In NotebookLM you'll also find a Customise tab for the Deep Dive audio. Here, you can actually describe what you want the hosts to focus on. Request a focus on some selected aspect of the content, or ask to keep the language simple or technical. 

You have the option of deleting the conversation and re-generating it with fresh instructions. You can easily create a conversation in multiple languages for use with different audiences, or change the difficulty level. 

If you visit aistudio via the browser, you'll see that Google is experimenting with users being able to change the accent or style of speaking in a feature called Native Speech Generation. There's no announcement to the effect but one can easily see how this could be added to Audio Overview sometime. It works very well and is fascinating to try out. 

Join the conversation

Another impressive but experimental feature lets you actually 'join' the podcast, by tapping a button. Interrupt the hosts and ask a question or make them change focus or ask for a comment on your opinion on the subject. This is a little slow and you'll be left wondering if the hosts heard you at all, but I fully expect it to become more fluid in the future as Google adds new features quite frequently. 

Also read | Mary Meeker's AI report: Decoding what it signals for India's tech future

Audio Overview isn’t flawless, but chances of getting things wrong are minimised because it’s you giving the content. The feature has worked well enough for Google to have brought it to Search, where it will give you AI Overview in audio form – being tried out in the US first.

Mala Bhargava is most often described as a ‘veteran’ writer who has contributed to several publications in India since 1995. Her domain is personal tech and she writes to simplify and demystify technology for a non-techie audience. 

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
more

topics

Read Next Story footLogo