top of page
Writer's picturesimplyaijobs

Career Coaching Powered by iRekommend's Multi Agent LLMs

Updated: Aug 25


Multi-agent large language model (LLM) systems represent a cutting-edge evolution in artificial intelligence. These systems utilize the collaborative capabilities of multiple LLMs to address complex tasks that surpass the capabilities of individual models. By assigning specialized roles to different agents, enabling inter-agent communication, and fostering collaborative problem-solving, these systems harness the extensive capabilities of LLMs in natural language processing, reasoning, and task planning.


Why Multi-Agent LLM Systems Are Gaining Prominence:



Multi-Agent LLMs offer advantages:


Enhanced Problem-Solving Capabilities: Multi-agent systems combine the strengths of various specialized agents, enabling them to tackle more intricate and diverse challenges.


Improved Reasoning and Accuracy: Collaborative efforts among agents allow for cross-verification and debate, potentially reducing errors and enhancing factual accuracy.


Flexibility and Scalability: These architectures offer dynamic and adaptable AI systems capable of handling a broader spectrum of scenarios, enhancing operational versatility.


Emulating Human Collaboration: By mimicking human teamwork, multi-agent systems aim to achieve more robust and creative problem-solving outcomes.


Addressing Limitations of Single LLMs: Multi-agent approaches can mitigate issues like context management and the need for specialized knowledge, which are limitations of single LLMs.


At iRekommend, we endeavor to continuously improve the underlying AI to deliver improved capabilities for our customers.


Demo of the iRekommend's Improv - AI Career Advisor



Try the AI powered career advisor for free


  • Trained on 5000+ expert transcripts on career advise

  • Contextual advice based on your experience

  • Free up to 10 user queries per day




Target Architecture and Specifications behind the iRekommend's Improv - AI Career Coach


Given below is the target architecture being used by iRekommend to enable superior career coaching experience for students, working professionals alike.



Explanation behind the Multi Agent LLM Architecture


1. User Interface

  • The system accepts user questions from multiple users simultaneously.

  • User questions are input on the right side of the diagram and are passed to the Decomposer.

2. LLM as a Service (LLMaaS)

  • This is the core language model service, consisting of two main components:

  1. Google Gemini (SLM): A large language model is being used for primary query processing and response generation.

  2. Groq/LLAMA2 (70B LLM): Another large language model, used for validation and augmentation of responses.

  • Both models are accessible via APIs, allowing for flexible integration and scaling.

  • The service includes a "Fine Tune" component where these models are customized for specific use cases.

  • Training data is fed into both models, indicating continuous improvement capabilities.

3. Decomposer

  • Function: Simplifies the user's question and breaks it down into multiple part-questions.

  • This component is crucial for handling complex queries that may require multiple processing steps.

  • It interfaces directly with the user input and the agent system.

4. Multi-Agent System with Constitutional chain

  • The architecture employs multiple agents to process different aspects of the decomposed question:

  1. Agent #1: Develops and executes prompts for Question #1 using Gemini.

  2. Agent #2: Applies constitutional checks, validates and augments the response, Validates and augments the response from Agent #1 using Groq/LLAMA2.

  3. Agent #N: Handles additional questions (N) in a similar manner to Agent #1.

  4. Agent #N+1: Applies constitutional checks, validates and augments responses for additional questions, similar to Agent #2.

  • This design allows for parallel processing and specialized handling of different query components.

  • Constitutional Chain Implementation:

    • Each even-numbered agent (2, N+1) implements a constitutional chain.

    • The chain ensures responses adhere to predefined ethical guidelines, factual accuracy, and safety constraints.

    • It includes components:

      • a. Ethical Validator: Checks responses against ethical guidelines.

      • b. Fact Checker: Verifies factual claims in responses.

      • c. Safety Filter: Ensures responses don't contain harmful or inappropriate content.

      • d. Bias Detector: Identifies and mitigates potential biases in responses.

5. Aggregator

  • Function: Combines responses from all agents, simplifying the output by removing redundant messages.

  • This component ensures that the final response to the user is coherent and concise.

6. Interaction History Management

  • Maintains a record of user interactions and system responses.

  • This component likely aids in context preservation for ongoing or future interactions.

7. Integrated Session Management

  • This component manages the overall flow and state of each user session.

  • It coordinates between the LLMaaS, the multi-agent system, and the user interface.

  • Continuously improves the constitutional chain based on interaction history and human feedback.

  • Refines ethical guidelines, fact-checking mechanisms, and bias detection algorithms.


Overall Data Flow


  1. User submits a question.

  2. The Decomposer breaks down the question into sub-components.

  3. Multiple agents process these sub-components in parallel:

  • Odd-numbered agents (1, N) use Gemini for initial processing.

  • Even-numbered agents (2, N+1) use Groq/LLAMA2 for validation and augmentation.

  1. The Aggregator combines and refines the responses from all agents.

  2. The final response is sent back to the user.

  3. Interaction History Management records the entire process.

  4. Integrated Session Management oversees the entire workflow. It also analyzes interactions to improve future performance via constitutional chain.


Key Features

  • Scalability: The use of APIs and multiple agents allows for easy scaling.

  • Redundancy and Validation: The dual-model approach (Gemini and Groq/LLAMA2) provides built-in validation and enhancement of responses.

  • Flexibility: The architecture can handle a wide range of query complexities by decomposing and distributing the workload.

  • Continuous Improvement: The inclusion of training data inputs suggests ongoing model refinement capabilities.


Try our AI powered career advisor for free


  • Trained on 5000+ expert transcripts on career advise

  • Contextual advice based on your experience

  • Free up to 10 user queries per day




Improving Stability and Resiliency of the application


While Google Cloud run offers inbuilt container stability and reliability, timeouts are bigger concerns when deploying at production scale.


Following aspects have to be considered while building a more resilient and stable application

We have implemented a retry logic and circuit breaker logic for an LLM API on Google Cloud Run, focusing on the front-end UI interaction with the backend LLM hosted on serverless, and ensuring the app remains active for at least 5 minutes:

  • Front-end UI:

    • Implement a loading indicator to show when a request is in progress

    • Use exponential backoff for retries

    • Set a maximum number of retry attempts (e.g., 3)

    • Display appropriate messages based on circuit breaker state

  • Backend (Cloud Run):

    • Implement request queuing to manage concurrent requests

    • Use a circuit breaker pattern to prevent overwhelming the LLM service

    • Implement timeouts for LLM API calls

  • Circuit Breaker Pattern:

    • Implement three states: Closed, Open, and Half-Open

    • Closed: Normal operation, requests pass through

    • Open: Requests are immediately rejected without calling the LLM API

    • Half-Open: Allow a limited number of test requests to pass through

    • Define thresholds for opening the circuit (e.g., error rate, response time)

    • Use a sliding window to track recent requests and errors

    • Implement automatic transition from Open to Half-Open after a cooldown period

  • LLM API Interaction:

    • Use async/await for non-blocking API calls

    • Implement error handling for various failure scenarios

    • Log errors and retry attempts for monitoring

    • Update circuit breaker state based on API call results

  • Keeping the App Active:

    • Implement a heartbeat mechanism to ping the service every 4 minutes

    • Use Cloud Scheduler to trigger the heartbeat

    • Implement a simple health check endpoint

  • Error Handling:

    • Categorize errors (e.g., network issues, LLM service errors)

    • Implement appropriate retry strategies for each error type

    • Provide user-friendly error messages in the UI

    • Update circuit breaker state based on error types and frequency

  • Monitoring and Logging:

    • Use Cloud Monitoring to track API calls, errors, and latency

    • Set up alerts for high error rates, extended downtime, or circuit breaker state changes

    • Implement detailed logging for troubleshooting

    • Monitor circuit breaker state transitions and failure rates

  • Circuit Breaker Configuration:

    • Error Threshold: Set a percentage of failures (e.g., 50%) that triggers the Open state

    • Timeout Duration: Define how long the circuit stays Open before transitioning to Half-Open

    • Reset Timeout: Set a duration for successful operations in Half-Open state before fully closing the circuit

    • Failure Count: Define the number of consecutive failures that trigger the Open state


Implementation of Retry and CircuitBreaker logic is possible by Adding the Resilience4J Dependency in the frontend JavaScript code.

With this setup:

  1. The retry will allow the callRemoteService method to be retried up to 3 times if it throws HttpServerErrorException

  2. The circuit breaker will monitor if the failure rate exceeds 50% over a window of 10 requests

  3. If the failure threshold is exceeded, the circuit will open for 20 seconds, during which it will immediately fail without attempting the call and invoke the fallbackMethod

  4. After 20 seconds, it will allow a few requests through to test if the service has recovered

  5. If the circuit is closed but all retry attempts fail, the fallbackMethod will also be invoked

Some additional considerations:

  • Order the annotations with @CircuitBreaker first and @Retry second, so the retry happens within the circuit breaker

  • Ensure the circuit breaker and retry are configured with appropriate values based on the characteristics of the remote service

  • Monitor the circuit breaker and retry metrics to tune the configurations

  • Consider adding a bulkhead to limit the number of concurrent calls to the remote service

In summary, by adding the @CircuitBreaker annotation along with the @Retry annotation and providing a shared fallbackMethod, you can implement a resilient call to the backend LLM API that will retry on failures, trip the circuit on too many failures, and provide a fallback response.


Why Constitutional Chain?


The LangChain Constitutional Chain plays a crucial role in enhancing the ethical standards, reducing bias, and minimizing hallucinations in language models, which are key challenges in AI-driven systems. Here’s how it addresses these issues:


1. Ethics Enforcement


The LangChain Constitutional Chain is designed to enforce a set of predefined ethical rules or guidelines—often referred to as a "constitution"—on the outputs generated by language models. These rules can include principles such as fairness, privacy, non-discrimination, and avoiding harmful content. The Constitutional Chain acts as a safeguard that reviews the model's outputs against these ethical guidelines, ensuring that any response generated aligns with the desired ethical standards.


For example, if a language model generates content that could be considered offensive, discriminatory, or misleading, the Constitutional Chain would identify this violation and either modify the response to align with ethical standards or reject it altogether. This process helps ensure that the AI system consistently produces outputs that are ethical and socially responsible, which is critical in applications where trust and user safety are paramount.


2. Bias Reduction

Bias in AI models often stems from the training data, which can inadvertently reflect societal biases. The LangChain Constitutional Chain can help mitigate these biases by incorporating specific rules aimed at identifying and correcting biased content in the model's outputs.

When a response is generated, the Constitutional Chain evaluates it against a set of bias-mitigation rules. For example, it might check whether the content unfairly favors a particular gender, race, or socioeconomic group. If such a bias is detected, the Constitutional Chain can modify the output to remove or neutralize the biased elements. This approach not only reduces the likelihood of biased responses but also promotes fairness and inclusivity in the AI system's interactions with users.


3. Minimization of Hallucinations

Hallucinations in language models refer to instances where the model generates content that is factually incorrect or nonsensical. These hallucinations can undermine the reliability of AI systems, especially in critical applications like healthcare, finance, or legal services.

The LangChain Constitutional Chain helps reduce hallucinations by enforcing rules that require responses to be grounded in verifiable facts and logical coherence. For example, the chain might include rules that flag any statement that appears to contradict known facts or that lacks sufficient context or support from the input data. When such hallucinations are detected, the Constitutional Chain can either reject the output or require additional validation from other sources before the response is finalized.


By filtering out hallucinations, the Constitutional Chain ensures that the responses generated by the language models are not only accurate but also trustworthy. This is especially important in contexts where users rely on the AI for accurate and reliable information.


Try the AI powered career advisor for free


  • Trained on 5000+ expert transcripts on career advise

  • Contextual advice based on your experience

  • Free up to 10 user queries per day




173 views0 comments

Comments


More clics

Never miss an update

Thanks for submitting!

bottom of page