126

Microsoft open-sourced a Python tool for converting files and office documents to Markdown (github.com)

submitted 1 week ago by yogthos@lemmy.ml to c/opensource@lemmy.ml

23 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] utopiah@lemmy.ml 49 points 1 week ago* (last edited 1 week ago)

FWIW if you are interested in such tooling consider also soffice and pandoc which have (as far as I can tell) similar features but have been existing for years now and are not related to Microsoft.

Edit: not related to Microsoft AND Google, seems the transcription aspect (which IMHO is still weird in that context but OK) is done via Google servers, cf https://lemmy.ml/post/23629310/15586865

[-] haverholm@kbin.earth 7 points 1 week ago

The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn't take a closer look at what is used to perform this, but at least it's not "just" document conversion like pandoc.

[-] utopiah@lemmy.ml 5 points 1 week ago

audio transcription tool

Thanks for the clarification but I'm a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what's the use case? When I think of Office documents audio transcription is not something I have in mind.

[-] utopiah@lemmy.ml 3 points 1 week ago

PS: related, asked on Github too https://github.com/microsoft/markitdown/issues/20#issuecomment-2544630753

[-] JackbyDev@programming.dev 1 points 1 week ago

You should open a fresh issue for questions like that instead of asking on an unrelated one.

[-] haverholm@kbin.earth 2 points 1 week ago

I'm not completely clear either on how Microsoft have implemented this previously. As I said, I didn't look very deep into the repository.

If these are indeed other Python projects they piled together, as others suggest, I'd be happy to hear what speech recognition library this might've built on.

this post was submitted on 16 Dec 2024

126 points (96.3% liked)

Open Source

31717 readers

127 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

Cloak@lemmy.ml

kevincox@lemmy.ml

CrypticCoffee@lemmy.ml

Lettuceeatlettuce@lemmy.ml