Inspiration
What it does
How we built it
Challenges we ran into
Accomplishments that we're proud of
What we learned
What's next for jina
Cloud-Native Neural Search[?] Framework for Any Kind of Data
Jina๐ allows you to build deep learning-powered search-as-a-service in just minutes.
๐ All data type - Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.
๐ฉ๏ธ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.
โฑ๏ธ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.
๐ฑ Own your stack - Keep an end-to-end stack ownership of your solution, avoid integration pitfalls with fragmented, multi-vendor, generic legacy tools.
Run Quick Demo
- ๐ Fashion image search:
pip install --pre && jina hello fashion - ๐ค QA chatbot:
pip install --pre "jina[chatbot]" && jina hello chatbot - ๐ฐ Multimodal search:
pip install --pre "jina[multimodal]" && jina hello multimodal - ๐ด Fork the source of a demo to your folder:
jina hello fork fashion ../my-proj/
Install
2.0 is in pre-release, add --pre to install it.
$ pip install --pre jina
$ jina -v
2.0.0rcN
via Docker
$ docker run jinaai/jina:master -v
2.0.0rcN
๐ฆ More installation options
| x86/64,arm64,v6,v7,Apple M1 | On Linux/macOS & Python 3.7/3.8/3.9 | Docker Users| | --- | --- | --- | | Standard | `pip install --pre jina` | `docker run jinaai/jina:master` | | Daemon | `pip install --pre "jina[daemon]"` | `docker run --network=host jinaai/jina:master-daemon` | | With Extras | `pip install --pre "jina[devel]"` | `docker run jinaai/jina:master-devel` | Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md). Jina can run on [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). We welcome the community to help us with [native Windows support](https://github.com/jina-ai/jina/issues/1252).
Get Started
Document, Executor, and Flow are the three fundamental concepts in Jina.
- ๐ Document is the basic data type in Jina;
- โ๏ธ Executor is how Jina processes Documents;
- ๐ Flow is how Jina streamlines and distributes Executors.
Copy-paste the minimum example below and run it:
๐ก Preliminaries: character embedding, pooling, Euclidean distance
import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests
class CharEmbed(Executor): # a simple character embedding with mean-pooling
offset = 32 # letter `a`
dim = 127 - offset + 1 # last pos reserved for `UNK`
char_embd = np.eye(dim) * 1 # one-hot embedding for all chars
@requests
def foo(self, docs: DocumentArray, **kwargs):
for d in docs:
r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
d.embedding = self.char_embd[r_emb, :].mean(axis=0) # average pooling
class Indexer(Executor):
_docs = DocumentArray() # for storing all documents in memory
@requests(on='/index')
def foo(self, docs: DocumentArray, **kwargs):
self._docs.extend(docs) # extend stored `docs`
@requests(on='/search')
def bar(self, docs: DocumentArray, **kwargs):
q = np.stack(docs.get_attributes('embedding')) # get all embeddings from query docs
d = np.stack(self._docs.get_attributes('embedding')) # get all embeddings from stored docs
euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1) # pairwise euclidean distance
for dist, query in zip(euclidean_dist, docs): # add & sort match
query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]
query.matches.sort(key=lambda m: m.scores['euclid'].value) # sort matches by their values
f = Flow(port_expose=12345).add(uses=CharEmbed, parallel=2).add(uses=Indexer) # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip())) # index all lines of this file
f.block() # block for listening request
Keep the above running and start a simple client:
from jina import Client, Document
from jina.types.request import Response
def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches[:3]): # print top-3 matches
print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')
c = Client(host='localhost', port_expose=12345) # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)
It finds the lines most similar to "request(on=something)" from the server code snippet and prints the following:
Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"
๐ Doesn't work? Our bad! Please report it here.
Read Tutorials
- ๐ง What is "Neural Search"?
- ๐
Document&DocumentArray: the basic data type in Jina. - โ๏ธ
Executor: how Jina processes Documents. - ๐
Flow: how Jina streamlines and distributes Executors. - ๐คน Serving Jina
- ๐ Developer References
- ๐งผ Clean & Efficient Coding in Jina
- ๐ 3 Reasons to Use Jina 2.0
Support
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public events calendar/.ical) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel.
Join Us
Jina is backed by Jina AI. We are actively hiring full-stack developers, solution engineers to build the next neural search ecosystem in open source.
Contributing
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.
- Contributing guidelines
- Code of conduct - play nicely with the Jina community
- Good first issues
- Release cycles and development stages
Log in or sign up for Devpost to join the conversation.