AWS for Industries
BMW Group fosters data-driven culture with a no-code generative AI data analytics solution on AWS
Today, many organizations face challenges in transforming raw data into actionable insights due to the specialized skills or programming expertise required for data analysis. Recognizing this barrier, BMW Group embarked on an innovative project using Amazon Web Services (AWS) to empower its workforce and unlock the full potential of its data through generative artificial intelligence (AI). The result of this effort is the BMW Data Interpreter solution, a no-code generative AI application that helps democratize data analytics by letting users perform complex analyses using natural language instructions.
This innovative solution lets BMW employees across all business and tech departments upload documents and run tasks, such as exploratory data analysis, data preprocessing, data transformations, and chart and plot generation, without requiring users to write a single line of code. Using a large language model (LLM) guided by user instructions, the application generates and runs code in an isolated compute environment to complete the user’s desired analytics task, presenting the output through a user-friendly web-based interface. By equipping every BMW employee with data analytics capabilities, the BMW Data Interpreter fosters a truly data-driven culture, empowering informed decision-making across all levels of the organization. Today, BMW Data Interpreter is available to BMW Group employees for use it in their daily work.
In this blog post, we delve into the specifics of this solution, exploring how BMW Group used generative AI services, including Amazon Bedrock, which is a fully managed service that offers a choice of high-performing foundation models; generative AI agents; and the Reasoning and Acting framework (ReAct) to design and implement this innovative solution, democratizing data analytics and driving operational efficiency.
Solution overview
The screencast recording in figure 1 demonstrates a common journey of a BMW Data Interpreter user. As a primary step, the user uploads an Excel file that contains a dataset through application’s web UI. Once the document is ingested, the user starts to interact with the generative AI LLM engine through natural language, instructing it to perform various data aggregation, analysis, and visualization tasks. The LLM transforms the user’s input into code, runs it against the data, and returns results together with the underlying Python code in the user interface.
Below, you can see the recording of BMW Data Interpreter in action. For demonstration purposes, we use a randomly generated mock data.
Figure 1. BMW Data Interpreter demo
BMW Data Interpreter’s capabilities extend beyond basic data manipulations, encompassing a range of tasks from querying data frames, identifying data types, and presenting summary statistics to generating complex visualizations, performing correlation analysis, and more. This versatility empowers users to extract valuable insights from their data.
It is important to note that to deliver desired results of each user instruction, this solution adopts the ReAct framework for iterative code generation and runtime. Let’s dive deep into ReAct and understand why it is important for BMW Data Interpreter.
Building agents with LLMs: The ReAct loop
The ReAct loop explained
Agents powered by LLMs like Anthropic’s Claude 3 can generate text and run actions using external tools, affecting the environment they operate in. These agents follow an iterative process called the ReAct loop. The ReAct loop lets agents continually refine their understanding and actions based on feedback and new information.
Here’s how it works:
- Reason: The agent starts by reasoning about the task or problem at hand, formulating a plan or strategy for approaching the task using the available tools.
- Act: Based on its reasoning, the agent takes an action.
- Observe: After taking an action, the agent observes the results or effects of that action.
- Repeat: With the new observation, the agent goes back to the reason stage, refining its understanding and adjusting its plan or strategy as needed. The loop then repeats, with the agent taking another action based on its updated reasoning.
- Final answer: Once the agent has completed its task and incorporated all necessary observations and context, it generates a final answer or output, which is provided back to the user. Outputs could be completed reports, analysis, or other deliverables, based on the original task or problem.
Figure 2. ReAct loop explained
This iterative process lets the agent break down complex tasks into smaller steps and run them one by one using the available tools.
BMW’s Data Interpreter agent
BMW’s Data Interpreter agent interacts with its environment by writing Python scripts that are then run by the environment. The agent observes the results of a Python script by using Python’s built-in print function to add outputs to a text buffer. The contents of this text buffer are provided to the agent as an observation after script runtime. This is how the agent analyzes data and creates visualizations.
When users request the agent to analyze a file, the agent typically writes a Python script to peek at the data first. For example, it may read the schema or the first few rows of the file and use the print function to generate an observation to better understand the data. Then, it proceeds with the analysis and generates plots using the Matplotlib library, which it writes to the file system and displays to the user in its final answer using special XML tags that can be interpreted by the web-based UI.
This behavior is not hard-coded. The agent uses its reasoning capabilities to formulate and run a unique plan for each user request based on its detailed understanding of the environment. This plan includes information about how to display plots to the user, how to read and write data files, how to use the Python interpreter to run code, and which Python libraries it’s allowed to use.
The LLM, which acts as the brain inside the agent, cannot directly use tools or interact with the external environment. It can only output text. Therefore, the LLM has to communicate with the environment in a highly structured manner because the environment must be able to parse and interpret the LLM’s commands correctly so that it can run them and provide the results of the action as an observation to the LLM.
The following is an example of the agent’s structured reasoning and acting trace. For demonstration purposes, we use mock data of car sales. Note that the Observation in this trace is generated by the environment by running the Python scripts:
User Question:
What are the top 5 cars sold by body type? Use the CSV file I just uploaded to answer the question.
Agent Loop 1
Thought: To answer this question, I need to:
1. Output the top 3 rows of the CSV file using the python_repl
tool to understand the table schema.
2. Run the analysis using the python_repl
tool.
Action: python_repl
Action Input:
Observation:
Agent Loop 2
Thought: Now that I understand the data schema, I can proceed with the analysis.
Action: python_repl
Action Input:
Final Answer
The top 5 cars sold by body type are:
1. Sedan (125,487 units)
2. SUV (98,765 units)
3. Coupe (67,321 units)
4. Hatchback (54,298 units)
5. Wagon (32,109 units)
In this example, the agent breaks down the task into smaller steps, running Python code to understand the data schema and then perform the analysis. The structured communication between the agent and the environment involves the agent’s thoughts, actions (running Python code), observations (output from the code runtime), and the final answer.
BMW uses Anthropic’s Claude 3 Sonnet model, available through Amazon Bedrock, to power their agent. This model was used for this agentic use case due to the combination of its cost, quality, and speed.
To further improve the model’s performance, BMW employed advanced prompt engineering techniques to prevent unwanted behavior like hallucinations and to help the model recover from failures. In the system prompt, the agent was instructed not to display large amounts of data to the user or generate extremely large observations by printing the entire dataset using Python, as this will cause performance problems or failures and increase cost by filling the model’s context with data. The solution also uses Guardrails for Amazon Bedrock, which provides additional customizable safeguards on top of the native protections of foundation models.
To reduce the length of the answers generated by the LLM and to prevent hallucinations, the agent is reminded to provide succinct answers and interpret the data only through Python scripts after each action is run by the environment. If the agent’s output contains syntax errors and the environment cannot parse the agent’s outputs, the environment feeds detailed error messages and helpful instructions on how to resolve the error back to the agent, allowing it to try again.
Solution architecture
Figure 3. BMW Data Interpreter solution architecture
Figure 3 above depicts the architecture of the BMW Data Interpreter solution on AWS. The solution is based on AWS serverless services, including AWS Lambda, a compute service that runs your code in response to events and automatically manages the compute resources; Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale; and Amazon DynamoDB, a serverless, NoSQL database service that helps you to develop modern applications at any scale. The solution uses these three services for handling user requests, while Anthropic Claude 3 Sonnet through Amazon Bedrock is used for LLM reasoning, code generation, and the result output. The application is secured using the BMW Group’s identity provider integrated with Amazon Application Load Balancer to provide a highly available and scalable web frontend.
Below is a detailed breakdown of the BMW Data Interpreter architecture’s end-to-end workflow.
Overview of architecture workflows
- User access: The BMW Data Interpreter solution is hosted in a dedicated virtual private cloud (VPC), which is peered to BMW’s corporate network. The user accesses the application through a web-based user interface. Requests from the user are directed to the Application Load Balancer, which distributes the traffic to Amazon API Gateway—a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale—and the VPC endpoint for Amazon Simple Storage Service (Amazon S3), which is an object storage service offering scalability, data availability, security, and performance.
- Authentication: User authentication is handled by the BMW Group’s central identity provider integration, which is implemented through the Application Load Balancer using OpenID Connect Protocol (OIDC), verifying secure login and user identity verification.
- Amazon API Gateway: Amazon API Gateway receives the requests and routes them to the API Integrations AWS Lambda function.
- API Integrations: The API Integrations AWS Lambda function validates the received input payload and then invokes the Agent Service AWS Lambda function that handles the chat session. The latter, in turn, invokes the Code Execution AWS Lambda function asynchronously to securely run the generated code in an isolated environment. Once the code is run and the output is provided, control is sent back to the Agent AWS Lambda function. The same AWS Lambda function then stores the response in the Chat History table in Amazon DynamoDB and updates the session status in the Session Storage table.
- Agent AWS Lambda function: The Agent AWS Lambda function continually updates the session status and stores all responses in an Amazon DynamoDB table. On top of that, this function manages the Agent framework and ReAct chain.
- LLM prompting: LLM prompts are sent to Amazon Bedrock.
- Code execution: An isolated AWS Lambda function runs LLM-generated Python code.
- File storage: For session-related files, the API Integrations AWS Lambda functions interact with Amazon S3 using presigned URLs to store or retrieve files as needed.
- Session maintenance: The solution uses Amazon DynamoDB to store session and messages information. Each session is associated with a user, identified by their OIDC identity as retrieved by the Application Load Balancer. Amazon DynamoDB streams are configured to trigger the AWS Lambda Session Maintenance function whenever a record is deleted, either manually or because of the time-to-live (TTL) setting. This function is responsible for removing the corresponding chat history entries from the related Amazon DynamoDB table and files from an Amazon S3 bucket.
Focus on the private VPC setup solution
For session file storage and exchanges between the user and agent, the solution uses Amazon S3. The Amazon S3 bucket is configured to only be accessible from the customer’s VPC through a VPC endpoint (this solution is described here, with a reference implementation here). Therewith, we verify that customer data does not traverse the public internet. To interact with the Amazon S3 bucket, clients use the REST API, which will issue pre-signed upload and download URLs to authorized users. The application load balancer is configured with rules to route Amazon S3–related requests for upload and download directly to the Amazon S3 VPC endpoint.
Focus on the code runtime isolation solution
To help verify security and prevent the runtime of malicious code, a dedicated AWS Lambda function has been created specifically for running code generated by LLMs. This AWS Lambda function operates without network access (with an exception for the Amazon S3 bucket to exchange session files) and can only use a predefined set of Python packages, which are preinstalled in its environment. This setup helps verify that any malicious code requiring external packages or resources will fail.
Another layer of security is provided by the use of short-lived credentials with session policies associated with the Code Execution AWS Lambda function. By default, the Code Execution AWS Lambda function has attached the AWSLambdaBasicExecutionRole. The Code Execution environment needs access to be able to read files provided by the user and provide files back to the user. Specifically, such files are exchanged through the Amazon S3 bucket, for which the Code Execution AWS Lambda function requires additional permissions. These permissions are temporarily granted using a short-lived credentials mechanism that allows access only to files related to the current session. Here, the Agent AWS Lambda function uses the AWS STS AssumeRole action to assume a specific IAM role with Amazon S3 access. It attaches a session policy that narrows down the Amazon S3 access to the specific Amazon S3 prefix relevant for a session and then obtains the assumed role’s credentials (AccessKeyId, SecretAccessKey, SessionToken). The credentials are then passed by the Agent AWS Lambda function to the Code Execution AWS Lambda function, allowing the Code Execution AWS Lambda function to access only to the session-specific files.
Figure 4. Code runtime isolation reference architecture
Below is an example of the session policy that will dynamically be created by the Agent Lambda function for each session at runtime.
Conclusion
This blog post showcases the transformative power of generative AI in democratizing data analytics within the BMW Group. Using AWS, BMW Group has developed the BMW Data Interpreter, a pioneering generative AI solution that empowers BMW employees across departments to unlock the power of data without writing a single line of code. The BMW Data Interpreter lets employees conduct complex analyses, extract valuable insights, and turn data into actionable steps.
To learn more about boosting productivity, building differentiated experiences, and innovating faster with AWS, visit Generative AI on AWS.