ZeffClient RDBMS Example

In this example we will create a record builder that will access an SQL RDMS database for the information necessary to create the record.

QuickStart

This quickstart will download the example archive, unarchive it, change into the new directory, and then run a script in that directory that will do the rest of the example. At this point you will be asked some questions by the zeff init command — you will need to enter your org_id and user_id that your received from Zeff, but all other questions you may accept the defaults by hitting enter.

Steps

  1. Download: zeffclient_example_rdbms.tar.bz2

  2. Decompress: tar -xjf zeffclient_example_rdbms.tar.bz2

  3. Change directory: cd zeffclient_example_rdbms

  4. Run quickstart script: ./quickstart.sh

How it Works

Project Directory

The project directory has a virtual environment setup in .venv by the quickstart.sh script. This environment has had ZeffClient installed. This may be activated at any time by source .venv/bin/activate.

The steps taken to setup the directory are:

  1. python -m venv .venv

  2. source .venv/bin/activate

  3. pip install --upgrade pip

  4. python -m pip install ZeffClient

The main command to work with ZeffClient is zeff. To quickly see what options and subcommands are available use zeff --help.

Record Config Generator

The generator.HousePriceRecordGenerator in generator.py will yield each id that is selected from the properties table. For this example it only returns the string that is the id, but it is not limited to strings and could be a URL, file, etc.

For this particular example there is only one row in the properties table and the id for that row is 1395678.

To test the generator by itself use the command ./generator.py or python generator.py.

Record Builder

The builder.HousePriceRecordBuilder in builder.py will take the configuration string given by the record config generator and will yield a record.

The file builder.py may be executed from the command line directly, and has a basic command line interface using argparse. This will aid you in writing and debugging your record builder, because you may work with a single record without needing to run the entire ZeffClient system.

The module uses the zeffclient.record.builder logger to indicate various stages of the record building process. You should also use this logger while building records for error reporting, warnings, information, and debugging.

The file builder.py has a class HousePriceRecordBuilder where all the code to build a new record for house prices is contained. This class will create a callable object that takes a single argument that has been yielded by the record generator. It has three steps: create a new record, add structured data to the record, and add unstructured data to the record.

25     def __call__(self, model: bool, record_config: str) -> Optional[Record]:
26         """Build and return a record.
27 
28         :param model: Flag to indicate if the record builder is building
29             records for training or for prediction. If model is true then
30             it is for prediction, but if false then it is for training and
31             any records not to be used for training should be filtered.
32 
33         :param record_config: Record configuration string created by
34             the record configuration generator.
35         """
36         LOGGER.info("Begin building ``HousePrice`` record from %s", record_config)
37         record = Record(name=record_config)
38         target = self.add_structured_data(record, record_config)
39         if not model and not target:
40             return None
41         self.add_unstructured_data(record, record_config)
42         LOGGER.info("End building ``HousePrice`` record from %s", record_config)
43         return record

Adding structured data is done through a select on the properties table in the database and then converting each returned column (except id) into a structured data item.

45     def add_structured_data(self, record, id):
46         target_record = False
47 
48         # Select all the properties from the database for the record
49         sql = f"SELECT * FROM properties WHERE id={id}"
50         cursor = self.conn.cursor()
51         row = cursor.execute(sql).fetchone()
52 
53         # Process each column in the record except for `id` and
54         # add it as a structured data item to the structured data
55         # object
56         for key in row.keys():
57             if key == "id":
58                 continue
59             value = row[key]
60 
61             # Is the column a continuous or category datatype
62             if isinstance(value, (int, float)):
63                 dtype = DataType.CONTINUOUS
64             else:
65                 dtype = DataType.CATEGORY
66 
67             # Is this a target field
68             if key in ["estimate_mortgage"] and value is not None:
69                 target = Target.YES
70                 target_record = True
71             else:
72                 target = Target.NO
73 
74             # Create the structured data item and add it to the
75             # structured data object
76             sd = StructuredData(name=key, value=value, data_type=dtype, target=target)
77             sd.record = record
78 
79         # Clean up then add the structured data object to the record
80         cursor.close()
81         return target_record

Adding unstructured data is done through a select on the property_images table in the databse and then creating an unstructured data item.

 84         # Select all the property imaages for the record
 85         sql = f"SELECT * FROM property_images WHERE property_id={id}"
 86         cursor = self.conn.cursor()
 87 
 88         # Process each row returned in the selection, create an
 89         # unstructured data item, and add that to the unstructured
 90         # data object. Note that we are assuming that the file-type
 91         # for all of these images is a JPEG, but that may be different
 92         # in your system.
 93         for row in cursor.execute(sql).fetchall():
 94             url = row["url"]
 95             file_type = FileType.IMAGE
 96             group_by = row["image_type"]
 97             ud = UnstructuredData(url, file_type, group_by=group_by)
 98             ud.record = record
 99 
100         # Clean up then add the unstructured data object to the record
101         cursor.close()