126
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 16 Dec 2024
126 points (96.3% liked)
Open Source
31717 readers
132 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
- !libre_culture@lemmy.ml
- !libre_software@lemmy.ml
- !libre_hardware@lemmy.ml
- !linux@lemmy.ml
- !technology@lemmy.ml
Community icon from opensource.org, but we are not affiliated with them.
founded 5 years ago
MODERATORS
Quite curious... does it actually do that and if so how? Because STT to get a plaintext file or subtitle (so with timing) has been available via e.g. Whisper quite efficiently for a while now. If this though does do more, e.g. structure (differentiating a title, list, etc) I'd like to learn how.
There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691
Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.
Thanks for the clarification. I checked the code you linked and noticed
recognize_google
and seems it's relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.
Might open up a GDPR related issue there. I don't think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.