In the AI Startup of the Week, the editorial staff of ai.nl is featuring promising AI startups, their innovations, solutions and challenges. In this third episode, we are taking a look at Amsterdam-based SeMI Technologies, a startup maintaining and commercialising vector search engine Weaviate.
Data is central to the success of any business organisation or an AI application. In order to contemplate the data, companies often tend to rely on databases. A database can be for a myriad number of things. There can be a database for maintaining all the vendors, a database for suppliers and even a database for recurring payments.
The databases maintained by businesses are one of the most effective data touchpoints and AI is here to make databases better for the world. SeMI Technologies, the developer of the open-source Weaviate vector-search database, is leading the way with its approach to building an AI-first database solution that is also a search engine.
Weaviate combines objects and vectors
Weaviate is an open source vector search engine built from scratch in Go that stores both objects and vectors. This is in contrast to a traditional database that would only store objects. This differentiated approach leads to Weavite being both a search engine as well as a database solution, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database.
It is not just how Weaviate works that makes it interesting for the broader tech community but also the fact that it is accessible through GraphQL, REST, and various language clients. For decades, businesses have found themselves locked to a single database vendor due to lack of data portability, but Weaviate eliminates that with its support for different language clients.
Within Weaviate, SeMI Technologies have designed all individual data objects to be based on a class property structure where a vector represents each data object. This makes it easier for the user to connect data objects (like a traditional graph) and search for data objects in the vector space. This unconventional idea has led to $16M Series A co-led by New Enterprise Associates (NEA) and Cortical Ventures.
The user can add data to Weaviate through the RESTful API end-points and retrieve data through the GraphQL interface. The vector indexing mechanism is also modular, and Weaviate says that the “available plugin is the Hierarchical Navigable Small World (HNSW) multilayered graph.”
Why do you need a vector search?
The ingenuity of Weaviate derives from the fact that it is able to envision search for vast amounts of data. With a traditional search engine, a user is generally required to be specific when looking for any kind of information. Because of their core architecture, a traditional search engine comes with “limitations when it comes to finding the data you are looking for.”
Weaviate uses a vector indexing mechanism at its core to represent the data. The vectorisation modules (or the NLP module) vectorise the data object in a vector-space where the data object sits near the text. As a result, Weaviate cannot make a 100 per cent match but can make a match that is closer to the final result.
While it can rely on NLP for text search, it can be used with any machine learning model that vectorises, like images, audio, video, genes, etc. With a vector search engine for your machine learning, the process is not different from using a traditional search engine but it only makes it easier to scale fast, search and classify in real-time.
Weaviate also offers other advantages such as Semantic Search, Question-Answer Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), to name a few.
Who can use Weaviate?
Weaviate is an ML-first database for applications and can be used by software engineers as an out-of-the-box module for NLP/semantic search, automatic classification and image similarity search. It is also cloud-native, distributed, and runs well on Kubernetes, which should be music to the ears of software engineers.
Since we are talking about a vector database that is built from the ground with ANN at its core, Weaviate is ideal for data engineers. Data scientists can also use Weaviate for a seamless handover of their ML models to MLOps. The platform makes it easy to deploy and maintain ML models in production reliably and efficiently. The modular design allows data scientists to easily package any custom trained model.