Monday, March 12, 2012

I'm having issues with nested table...

Hey gang,

I'm having some issues with nested table. This is my setup. [ProductTable] is the case table, and [CustomersTable] is a nested table. I'm trying to organize my algorithms around products.

[ProductTable]<[CustomersTable]

[ProductTable] table only has product ID, and it is key.

[CustomersTable] table has variety of customer attributes (productID, customerID, location, demographics...) and CustomerRevenue is predict_only. ProductName is the key for the nested table.

I keep getting this error when I'm processing the mining models (Logical Regression and Neural Net).

Error (Data mining): In mining model, Estimate Neural Net, the algorithm does not allow table column as predictable.

Error (Data mining): Error validating attribute for the 'Estimate Neural Net' mining model.

When using Decision Tree, it processes OK, but the result is totally wrong. The model is empty.

Any ideas?

-Young K.

P.S. I'm trying to great a single model for multiple products. This is a label saving device that I'm trying. If this doens't work, I'll have to create a model for each product.

Just an idea... it might help to get all the columns needed in a single view in the database insted of using nested tables.

It's not the actual solution to your problem but it might be a workaround|||

I thought of that, and that lead me to my original question... How good are the estimation (or regression) type analysis if I build a single model for multiple products? For example, if I have data for customers buying cars, motorcycles and boats, should I put them all in one model ? Or should I build 3 different models for 3 seperate products?

Can I build a single model to predict who will buy a car and/or boats and/or motorcycle and/or boat? Should I build a seperate model for car, motorcycle and boat? Is there a difference in accuracy?

I assumed that I needed to build 3 models for 3 products. And I used nested table to create a single model for multiple products. With nested tables, I can clearly seperate data between different products.

Any thoughts?

-Young K.

|||

You can solve the problem you want to, but you are misusing nested tables.

The data you are analyzing is defined by your case table. The nested table simply describes attributes of your case. In your case you have "Products" as the case and "Customers" as an attribute of "Product". The key of the nested table indicates in individual attribute or set of attributes. For example, you will have an attribute "Customer 3's Gender". Of course, customer 3's gender is unlikely to change for each product in their basket.

Another way of thinking about the problem is that your case identifier indicates what is anonymous or "unimportant" about your model. You are trying to spot trends in product purchasing behavior across customers. An individual customer is anonymous or "unimportant", the information about the customer and the products they buy are important.

You could have customers as the case table and have a nested table of products that contains the product name and product revenue from that customer. You would need to make both the table and the product revenue "Predict Only" to ensure that the predictions are not influenced by other product revenues or the existence of other products. If you use Decision Trees in this case, you will get a tree for each product revenue based on customer demographics.

To get a prediction of a product revenure given customer info you would use a query something like this:

SELECT (SELECT Predict(Revenue) FROM Products WHERE [Product Name]='Car') as t FROM ...

|||

Thank you, Jamie. That helped a lot.

-Young K

No comments:

Post a Comment