# Jina-AI **Repository Path**: mirrors/Jina-AI ## Basic Information - **Project Name**: Jina-AI - **Description**: Jina 让你在几分钟内即可构建基于深度学习的搜索即服务 - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: https://www.oschina.net/p/jina-ai - **GVP Project**: No ## Statistics - **Stars**: 35 - **Forks**: 8 - **Created**: 2021-07-26 - **Last Updated**: 2025-12-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Jina-Serve PyPI PyPI - Downloads from official pypistats Github CD status Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic. ## Key Features - Native support for all major ML frameworks and data types - High-performance service design with scaling, streaming, and dynamic batching - LLM serving with streaming output - Built-in Docker integration and Executor Hub - One-click deployment to Jina AI Cloud - Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI Key advantages over FastAPI: - DocArray-based data handling with native gRPC support - Built-in containerization and service orchestration - Seamless scaling of microservices - One-command cloud deployment
## Install ```bash pip install jina ``` See guides for [Apple Silicon](https://jina.ai/serve/get-started/install/apple-silicon-m1-m2/) and [Windows](https://jina.ai/serve/get-started/install/windows/). ## Core Concepts Three main layers: - **Data**: BaseDoc and DocList for input/output - **Serving**: Executors process Documents, Gateway connects services - **Orchestration**: Deployments serve Executors, Flows create pipelines ## Build AI Services Let's create a gRPC-based AI service using StableLM: ```python from jina import Executor, requests from docarray import DocList, BaseDoc from transformers import pipeline class Prompt(BaseDoc): text: str class Generation(BaseDoc): prompt: str text: str class StableLM(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) self.generator = pipeline( 'text-generation', model='stabilityai/stablelm-base-alpha-3b' ) @requests def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]: generations = DocList[Generation]() prompts = docs.text llm_outputs = self.generator(prompts) for prompt, output in zip(prompts, llm_outputs): generations.append(Generation(prompt=prompt, text=output)) return generations ``` Deploy with Python or YAML: ```python from jina import Deployment from executor import StableLM dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345) with dep: dep.block() ``` ```yaml jtype: Deployment with: uses: StableLM py_modules: - executor.py timeout_ready: -1 port: 12345 ``` Use the client: ```python from jina import Client from docarray import DocList from executor import Prompt, Generation prompt = Prompt(text='suggest an interesting image generation prompt') client = Client(port=12345) response = client.post('/', inputs=[prompt], return_type=DocList[Generation]) ``` ## Build Pipelines Chain services into a Flow: ```python from jina import Flow flow = Flow(port=12345).add(uses=StableLM).add(uses=TextToImage) with flow: flow.block() ``` ## Scaling and Deployment ### Local Scaling Boost throughput with built-in features: - Replicas for parallel processing - Shards for data partitioning - Dynamic batching for efficient model inference Example scaling a Stable Diffusion deployment: ```yaml jtype: Deployment with: uses: TextToImage timeout_ready: -1 py_modules: - text_to_image.py env: CUDA_VISIBLE_DEVICES: RR replicas: 2 uses_dynamic_batching: /default: preferred_batch_size: 10 timeout: 200 ``` ### Cloud Deployment #### Containerize Services 1. Structure your Executor: ``` TextToImage/ ├── executor.py ├── config.yml ├── requirements.txt ``` 2. Configure: ```yaml # config.yml jtype: TextToImage py_modules: - executor.py metas: name: TextToImage description: Text to Image generation Executor ``` 3. Push to Hub: ```bash jina hub push TextToImage ``` #### Deploy to Kubernetes ```bash jina export kubernetes flow.yml ./my-k8s kubectl apply -R -f my-k8s ``` #### Use Docker Compose ```bash jina export docker-compose flow.yml docker-compose.yml docker-compose up ``` #### JCloud Deployment Deploy with a single command: ```bash jina cloud deploy jcloud-flow.yml ``` ## LLM Streaming Enable token-by-token streaming for responsive LLM applications: 1. Define schemas: ```python from docarray import BaseDoc class PromptDocument(BaseDoc): prompt: str max_tokens: int class ModelOutputDocument(BaseDoc): token_id: int generated_text: str ``` 2. Initialize service: ```python from transformers import GPT2Tokenizer, GPT2LMHeadModel class TokenStreamingExecutor(Executor): def __init__(self, **kwargs): super().__init__(**kwargs) self.model = GPT2LMHeadModel.from_pretrained('gpt2') ``` 3. Implement streaming: ```python @requests(on='/stream') async def task(self, doc: PromptDocument, **kwargs) -> ModelOutputDocument: input = tokenizer(doc.prompt, return_tensors='pt') input_len = input['input_ids'].shape[1] for _ in range(doc.max_tokens): output = self.model.generate(**input, max_new_tokens=1) if output[0][-1] == tokenizer.eos_token_id: break yield ModelOutputDocument( token_id=output[0][-1], generated_text=tokenizer.decode( output[0][input_len:], skip_special_tokens=True ), ) input = { 'input_ids': output, 'attention_mask': torch.ones(1, len(output[0])), } ``` 4. Serve and use: ```python # Server with Deployment(uses=TokenStreamingExecutor, port=12345, protocol='grpc') as dep: dep.block() # Client async def main(): client = Client(port=12345, protocol='grpc', asyncio=True) async for doc in client.stream_doc( on='/stream', inputs=PromptDocument(prompt='what is the capital of France ?', max_tokens=10), return_type=ModelOutputDocument, ): print(doc.generated_text) ``` ## Support Jina-serve is backed by [Jina AI](https://jina.ai) and licensed under [Apache-2.0](./LICENSE).