databot

databot-lancedb-testrigor-architecture
databot-codeset-lancedb
databot-huggingface-space-lancedb-details
databot-huggingface-space-run
databot-testRigor-testcase
databot-testRigor-testcase-details
databot-testRigor-datasets
databot-testRigor-testcase-dataset-emergencyRow-result
databot-testRigor-testcase-dataset-oceanRow-result
databot-hugging-face-api-test-result

Inspiration

Since I am interested in Generative AI, LLM and already have a initial version of gpt web interface with Langchain, LlamaIndex on Python environment, I have decided to participate to the LanceDb (vector database) and TestRigor ( for testing web interface) challenges

What it does

Databot is a web application to help end users to explore copernicus custom data (ocean, emergency,...), which are pre-compiled into text metadata (datasource id, abstract, title, thumbnailUrl), stored in data directory. Then the user can request and search for the data in plain english as illustrated by following screenshot

databot web run

How we built it

adapt an existing Generative AI web application developed with openai , langchain, llamaindex, gradio python modules to integrate LanceDb vector database for index creation and query
deploy the resulting application on hugging face space to enable public access, then run the hugging face application
configure test suite / subroutine / test case / test dataset on TestRigor to automatize testing of the hugging face web and api application, then run the test case with takes into account various cases defined in test dataset rows and not hardcode these test data.

For each web and api interface, happy path and anormal unexpected spotential situations test cases have been defined, (more information in accomplishment section below)

More details about experience to apply Test Driven Development with TestRigor can be found here

Experience of Loriebot Test Development with TestRigor

This application can be adapted easily to ingest, index, query different custom databases. The data directory should just contain the custom data in text format

Challenges we ran into

lancebot module integration works only after upgrading llamaindex from 0.6.12 to latest version 0.7.21
authorization access problem on the hugging face application/api on first testing with TestRigor => fix by changing space visibility to public

Accomplishments that we're proud of

running web + api interface application with LanceDb vector database integration
testing and validation check of hugging face web interface made with following TestRigor features
Test Suite : databot web interface test suite
Test Data : enabling to iterate with multiple data category values (examples)

++ Emergency

++ Ocean

Test Cases :

++ happy path => databot hf test case. (12 steps with various commands)

// init question by concatenating template model with dynamic category value and store in web form input
enter from the string with parameters "please provide wekeo datasource id, thumbnailUrl, abstract about ${userRequest}" into "Enter your question"
click "Submit" 
wait 8 sec
// grab datasource id parameter value from response available from web form textbox
grab value of "(?<=datasource id = )[^\n ]+" from "textbox" and save it as "datasourceId"
// check that datasource id value matches some format word:word:word:word
check that stored value "datasourceId" itself matches regex "[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}"
// grab datasource abstract parameter value from response available from web form textbox
grab value of "(?<=datasource abstract = )[^\n]+" from "textbox" and save it as "datasourceAbstract"
// check that datasource abstract value is not empty
check that stored value "datasourceAbstract" itself is not empty
// check that datasource abstract value contains the dynamic user request category
check that stored value "datasourceAbstract" itself contains stored value "userRequest"
// grab datasource thumbnailUrl parameter value from response available from web form textbox
grab value of "(?<=thumbnailUrl = )[^\n ]+" from "textbox" and save it as "thumbnailUrl"
// check that datasource thumbnailUrl value ends with png or jpg or jpeg or gif 
check that stored value "thumbnailUrl" itself contains "png" or "jpg" or "jpeg" or "gif"
// try to browse the thumbnailUrl value to see whether it is a real image url 
open url from stored value "thumbnailUrl"  if exists
// check that the thumbnailUrl is accessible as expected 200 status code
check that the browser called api from stored value "thumbnailUrl" and response code was "200"

view test case run execution

++ invalid content for user request on web form

view test case run execution

++ unreachable web form url

view test case run execution

testing and validation check of hugging face api interface made with following TestRigor features
Test Suite : databot api interface test suite

Subroutine with dynamic parameter (12 steps with various commands) : test api interface with "requestData"

store value from the string with parameters "please provide wekeo datasource id, thumbnailUrl, abstract about ${requestCategory}" as "requestData"
store value from the string with parameters "{\"data\": [\"${requestData}\"]}"  as "bodyData"
// prepare body data with ocean category as user request and post the body data to  databot api
call api post "https://adrienchan94-databot.hf.space/api/predict" with headers "Content-Type:application/json" and "Accept:application/json" and body  from stored value "bodyData" and get "$.data[0]" and save it as "responseData"
// extract datasource id parameter value from response available from api call responseData
extract value of "(?<=datasource id = )[^\n\\ ]+" from stored value "responseData" and save it as "datasourceId"
// check that datasource id value matches some format word:word:word:word
check that stored value "datasourceId" itself matches regex "[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}:[a-zA-Z0-9_]{2,50}"
// extract datasource abstract parameter value from response available from api call responseData
extract value of "(?<=datasource abstract = )[^\n\\]+" from stored value "responseData" and save it as "datasourceAbstract"
// check that datasource abstract value is not empty
check that stored value "datasourceAbstract" itself is not empty
// check that datasource abstract value contains the dynamic user request category data
check that stored value "datasourceAbstract" itself contains stored value "requestCategory"
// extract datasource thumbnailUrl parameter value from response available from api call responseData
extract value of "(?<=thumbnailUrl = )[^\n\\ ]+" from stored value "responseData" and save it as "thumbnailUrl"
// check that datasource thumbnailUrl value ends with png or jpg or jpeg or gif 
check that stored value "thumbnailUrl" itself contains "png" or "jpg" or "jpeg" or "gif"
// try to browse the thumbnailUrl value to see whether it is a real image url 
open url from stored value "thumbnailUrl"  if exists
// check that the thumbnailUrl is accessible as expected 200 status code
check that the browser called api from stored value "thumbnailUrl" and response code was "200"

Test Cases :

++ happy path => test normal call of api interface

// call subroutine with  data category "ocean" as requestData parameter
test api  interface with "ocean"

view test case run execution

++ invalid GET method for POST API

view test case run execution

++ No Expected Body for POST API

view test case run execution

++ No Expected Header for POST API

view test case run execution

++ Invalid Body for POST API

view test case run execution

++ Unknown Category Data for POST API

view test case run execution

++ Invalid Endpoint for POST API

view test case run execution

++ Unreachable Url for POST API

view test case run execution

++ invalid json body attribute value for POST API

view test case run execution

++ not json body content for POST API

view test case run execution

all realized actions are illustrated in enclosed screenshots

What we learned

integration of LanceBot features on Python application
configuration and testing with TestRigor

What's next for databot

integrate with lancedb cloud version

Built With

gradio
huggingface
lancedb
langchain
llamaindex
openai
python
testrigor

Updates

Private user started this project — Aug 09, 2023 01:39 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.