In a groundbreaking collaboration, Wikimedia Deutschland, the powerhouse behind Wikidata, has partnered with DataStax to transform how developers access and utilize the world’s largest open-source knowledge graph. Leveraging DataStax’s AI platform, equipped with NVIDIA AI tools, Wikimedia successfully ingested and vectorized over 10 million articles from Wikidata in under three days.
Wikidata, a multilingual, linked open data platform serving over 300 languages, is crucial for AI/ML developers and open-source contributors worldwide. This initiative aims to make the knowledge graph’s data more accessible, fostering innovation within the Open Source AI/ML Community.
Vectorizing Knowledge at an Unprecedented Scale
One of the significant hurdles was embedding a dynamic dataset of this magnitude while maintaining its accessibility. Traditional systems could not meet the demand for real-time updates and scalability. However, with DataStax’s serverless Astra DB platform, hosted on AWS, and NVIDIA NeMo Retriever, Wikimedia achieved near-zero latency, enabling instant updates and real-time access to vectorized data.
“This near-real-time speed will permit us to experiment at scale and speed by testing the integration of large subsets in a vector database aligned with the frequent updates of Wikidata,” said Dr. Jonathan Fraine, CTO, Wikimedia Deutschland.
A Visionary Collaboration
The partnership has unlocked new possibilities for open-source developers. Lydia Pintscher, Portfolio Lead for Wikidata, stated:
“Our cooperation with DataStax and their approach has unlocked new capabilities and streamlined our processes, which will allow us to deliver faster and more accurate insights to our community.”
DataStax Chief Product Officer Ed Anuff added: “We’re thrilled to see Wikimedia Deutschland improving accessibility to the world’s largest knowledge graph with our AI platform.”
Also Read: Driving Innovation and Growth: Insights from Claire Weston, CMO of Coda
What Lies Ahead
Both organizations are committed to expanding their efforts. Wikimedia aims to enhance its multilingual capabilities, providing access to hundreds of languages and improving search reliability through innovative solutions like graphRAG. DataStax’s Astra Vectorize and Langflow tools, powered by AWS Graviton processors, will continue to simplify the development of scalable and cost-efficient AI applications.
As the collaboration grows, this partnership sets a new standard for how open-source platforms can leverage AI to democratize access to knowledge and data.
About DataStax
DataStax empowers developers and companies to build innovative GenAI applications quickly and efficiently. Offering an integrated generative AI stack, DataStax partners with leading AI ecosystem players to deliver scalable solutions on any cloud. Global enterprises, including Audi and Capital One, rely on DataStax for groundbreaking AI-driven applications.