Go to your space settings in the left navigation, and you will see your API keys on the right hand side. You'll need the space key and API keys for the next part.
The following code snippet showcases how to automatically instrument your OpenAI application.
import openaiimport os# Import open-telemetry dependenciesfrom opentelemetry import trace as trace_apifrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporterfrom opentelemetry.sdk import trace as trace_sdkfrom opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessorfrom opentelemetry.sdk.resources import Resource# Import the automatic instrumentor from OpenInferencefrom openinference.instrumentation.openai import OpenAIInstrumentor# Set the Space and API keys as headers for authenticationheaders =f"space_key={ARIZE_SPACE_KEY},api_key={ARIZE_API_KEY}"os.environ['OTEL_EXPORTER_OTLP_TRACES_HEADERS']= headers# Set resource attributes for the name and version for your applicationresource =Resource( attributes={"model_id":"quickstart-llm-tutorial", # Set this to any name you'd like for your app })# Define the span processor as an exporter to the desired endpointendpoint ="https://otlp.arize.com/v1"span_exporter =OTLPSpanExporter(endpoint=endpoint)span_processor =SimpleSpanProcessor(span_exporter=span_exporter)# Set the tracer providertracer_provider = trace_sdk.TracerProvider(resource=resource)tracer_provider.add_span_processor(span_processor=span_processor)trace_api.set_tracer_provider(tracer_provider=tracer_provider)# Finish automatic instrumentationOpenAIInstrumentor().instrument()
Now start asking questions to your LLM app and watch the traces being collected by Arize. For more examples of instrumenting OpenAI applications, check our openinferenece-instrumentation-openai examples.
The following code snippet showcases how to automatically instrument your LLM application.
import os# Import open-telemetry dependenciesfrom opentelemetry import trace as trace_apifrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporterfrom opentelemetry.sdk import trace as trace_sdkfrom opentelemetry.sdk.resources import Resourcefrom opentelemetry.sdk.trace.export import SimpleSpanProcessor# Import the automatic instrumentor from OpenInferencefrom openinference.instrumentation.llama_index import LlamaIndexInstrumentor# Set the Space and API keys as headers for authenticationheaders =f"space_key={ARIZE_SPACE_KEY},api_key={ARIZE_API_KEY}"os.environ['OTEL_EXPORTER_OTLP_TRACES_HEADERS']= headers# Set resource attributes for the name and version for your applicationresource =Resource( attributes={"model_id":"quickstart-llm-tutorial", # Set this to any name you'd like for your app })# Define the span processor as an exporter to the desired endpointendpoint ="https://otlp.arize.com/v1"span_exporter =OTLPSpanExporter(endpoint=endpoint)span_processor =SimpleSpanProcessor(span_exporter=span_exporter)# Set the tracer providertracer_provider = trace_sdk.TracerProvider(resource=resource)tracer_provider.add_span_processor(span_processor=span_processor)trace_api.set_tracer_provider(tracer_provider=tracer_provider)# Finish automatic instrumentationLlamaIndexInstrumentor().instrument()
To test, you can create a simple RAG application using LlamaIndex.
from gcsfs import GCSFileSystemfrom llama_index.core import ( Settings, StorageContext, load_index_from_storage,)from llama_index.embeddings.openai import OpenAIEmbeddingfrom llama_index.llms.openai import OpenAIfile_system =GCSFileSystem(project="public-assets-275721")index_path ="arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/"storage_context = StorageContext.from_defaults( fs=file_system, persist_dir=index_path,)Settings.llm =OpenAI(model="gpt-4-turbo-preview")Settings.embed_model =OpenAIEmbedding(model="text-embedding-ada-002")index =load_index_from_storage( storage_context,)query_engine = index.as_query_engine()response = query_engine.query("What is Arize and how can it help me as an AI Engineer?")
Now start asking questions to your LLM app and watch the traces being collected by Arize.
The following code snippet showcases how to automatically instrument your LLM application.
import os# Import open-telemetry dependenciesfrom opentelemetry import trace as trace_apifrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporterfrom opentelemetry.sdk import trace as trace_sdkfrom opentelemetry.sdk.resources import Resourcefrom opentelemetry.sdk.trace.export import SimpleSpanProcessor# Import the automatic instrumentor from OpenInferencefrom openinference.instrumentation.langchain import LangChainInstrumentor# Set the Space and API keys as headers for authenticationheaders =f"space_key={ARIZE_SPACE_KEY},api_key={ARIZE_API_KEY}"os.environ['OTEL_EXPORTER_OTLP_TRACES_HEADERS']= headers# Set resource attributes for the name and version for your application# Set resource attributes for the name and version for your applicationresource =Resource( attributes={"model_id":"quickstart-llm-tutorial", # Set this to any name you'd like for your app })# Define the span processor as an exporter to the desired endpointendpoint ="https://otlp.arize.com/v1"span_exporter =OTLPSpanExporter(endpoint=endpoint)span_processor =SimpleSpanProcessor(span_exporter=span_exporter)# Set the tracer providertracer_provider = trace_sdk.TracerProvider(resource=resource)tracer_provider.add_span_processor(span_processor=span_processor)trace_api.set_tracer_provider(tracer_provider=tracer_provider)# Finish automatic instrumentationLangChainInstrumentor().instrument()
To test, you can create a simple RAG application using Langchain.
from langchain.chains import RetrievalQAfrom langchain.retrievers import KNNRetrieverfrom langchain_openai import ChatOpenAI, OpenAIEmbeddingsknn_retriever =KNNRetriever( index=np.stack(df["text_vector"]), texts=df["text"].tolist(), embeddings=OpenAIEmbeddings(),)chain_type ="stuff"# stuff, refine, map_reduce, and map_rerankchat_model_name ="gpt-3.5-turbo"llm =ChatOpenAI(model_name=chat_model_name)chain = RetrievalQA.from_chain_type( llm=llm, chain_type=chain_type, retriever=knn_retriever, metadata={"application_type": "question_answering"},)response = chain.invoke("What is Arize and how can it help me as an AI Engineer?")
Now start asking questions to your LLM app and watch the traces being collected by Arize.
Run your LLM application
Once you've executed a sufficient number of queries (or chats) to your application, you can view the details on the LLM Tracing page.
Once you have traces in Arize, you can visit the LLM Tracing tab to see your traces and export them in code. By clicking the export button, you can get the boilerplate code to copy paste to your evaluator.
# this will be prefilled by the export command. # Note: This uses a different API Key than the one above.ARIZE_API_KEY =''# import statements required for getting your spansimport osos.environ['ARIZE_API_KEY']= ARIZE_API_KEYfrom datetime import datetimefrom arize.exporter import ArizeExportClient from arize.utils.types import Environments# Exporting your dataset into a dataframeclient =ArizeExportClient()primary_df = client.export_model_to_df( space_id='', # this will be prefilled by export model_id='', # this will be prefilled by export environment=Environments.TRACING, start_time=datetime.fromisoformat(''), # this will be prefilled by export end_time=datetime.fromisoformat(''), # this will be prefilled by export)
Run a custom evaluator using Phoenix
Import the functions from our Phoenix library to run a custom evaluation using OpenAI.
Create a prompt template for the LLM to judge the quality of your responses. Below is an example which judges the positivity or negativity of the LLM output.
MY_CUSTOM_TEMPLATE =''' You are evaluating the positivity or negativity of the responses to questions. [BEGIN DATA] ************ [Question]: {input} ************ [Response]: {output} [END DATA] Please focus on the tone of the response. Your answer must be single word, either "positive" or "negative" '''
Notice the variables in brackets for {input} and {output} above. You will need to set those variables appropriately for the dataframe so you can run your custom template. We use OpenInference as a set of conventions (complementary to OpenTelemetry) to trace AI applications. This means depending on the provider you are using, the attributes of the trace will be different.
You can use the code below to check which attributes are in the traces in your dataframe.
primary_df.columns
Use the code below to set the input and output variables needed for the prompt above.
If you'd like more information, see our detailed guide on custom evaluators. You can also use our pre-tested evaluators for evaluating hallucination, toxicity, retrieval, etc.
Log evaluations back to Arize
Export the evals you generated above to Arize using the log_evaluations function as part of our Python SDK. See more information on how to do this in our article on custom evaluators.
Currently, our evaluations are logged within Arize every 24 hours, and we're working on making them as close to instant as possible! Reach out to support@arize.com if you're having trouble here.
import osfrom arize.pandas.logger import ClientAPI_KEY = os.environ.get("ARIZE_API_KEY")SPACE_KEY = os.environ.get("ARIZE_SPACE_KEY")# Initialize Arize client using the model_id and version you used previouslyarize_client =Client(space_key=SPACE_KEY, api_key=API_KEY)model_id ="quickstart-llm-tutorial"# Set the evals_df to have the correct span ID to log it to Arizeevals_df = evals_df.set_index(primary_df["context.span_id"])# Use Arize client to log evaluationsresponse = arize_client.log_evaluations( dataframe=evals_df, model_id=model_id,)# If successful, the server will return a status_code of 200if response.status_code !=200:print(f"❌ logging failed with response code {response.status_code}, {response.text}")else:print(f"✅ You have successfully logged evaluations to Arize")
Next steps
Dive deeper into the following topics to keep improving your LLM application!