Tutorial 3: Simulating Data
In the tutorial Analyzing Data, you learned how to explore a data set to look for interesting trends. In this tutorial, you will learn to:
- Build a sampler to model data
- Use a variety of sampler devices to best represent data distributions
- Use branching devices to reflect real data
- Create a sampler with a hidden or locked element
Exploring a Pre-Built Sampler
Before building your own "data factory," you'll explore a pre-made sampler built to simulate data about cats.
- Choose File | Open Sample Document. From the Tutorials folder, open Simulating Data.tp.
Take a few moments to understand what's in this document. The sampler at the top of the screen can be used to generate a data set about cats with four attributes: Gender, Name, Length, and EyeColor. The case table and plot at the bottom of the screen show 500 cases - in this instance, cats - that were generated by the sampler.
Now you'll create a new data set using the sampler.
- At the top-left corner of the sampler, click the Repeat number, 500, and change it to 5 to generate data for five cats.
- Click the RUN button at the top-left corner of the sampler.
Watch as the sampler generates data for five cats. First the cat receives a Gender. Depending on its gender, the cat gets its Name from the Female branch or the Male branch. Continuing through the sampler, the cat receives a Length (again from a different device for females and males), and then an EyeColor.
As the data for a cat is completed, a new case appears in the results table, and a new case icon appears on the plot. Click a case icon in the graph to see it highlighted in the results table.
Building a Data Factory
Now you'll build your own cat factory. Leave the sample document open for reference.
- Choose File | New to open a new TinkerPlots document.
- Drag a sampler into the document. Click and drag its corner to make it larger.
At the bottom of the sampler, you'll see six devices that can be used to generate attributes.
Mixers and stacks draw from a set of discrete elements. The Name attribute in the cat factory was chosen from a mixer. If you have many repeats of the same value, such as choosing from a set of 30 boys and 45 girls, stacks are a better option than a mixer.
Spinners and bars draw from discrete elements, but can have different probabilities for each value. The Gender and EyeColor attributes in the cat factors were determined by spinners, and the Length attribute was determined by bars.
Curves draw from a continuous range of numerical values, which can have different probabilities.
Counters select values systematically, rather than randomly. You'll work with a counter in the tutorial "Modeling Probability."
To model a cat factory, you will need to model both categorical attributes, such as gender, and numerical attributes, such as length.
The first attribute is Gender. Gender can be modeled best using a mixer or a spinner. The default device given in a new sampler is a mixer.
- To use a spinner instead, drag a spinner from the sampler's bottom toolbar and drag it into the sampler, releasing it above the pink dot that appears in the center of the current mixer. (Pink dots show places where you can drop the new device. A black rectangle highlights where you can drop the new device.)
- Click Attr1 above the spinner and rename it Gender.
- Click a in the spinner and change it to Male or M. Click b and change it to Female or F.
- Note the four buttons in the lower-left corner of the spinner device.
Clicking the first icon shows the Device options menu, clicking the + and - buttons adds values to and subtracts values from the device, and clicking the ... button allows you to enter a range of values into the device. Click each button to see the result.
- To make female cats populate the cat factory as likely as male cats, you'll need to change the position of the divider in the spinner. Click the divider and drag it so the spinner is divided into equal parts. Or, click the Device options menu and choose Equalize Angles.
The next attribute is Name. Because male and female cats tend to have different names, you will use two mixers.
- Drag a mixer from the lower sampler toolbar into the sampler, and drop it on the pink dot to the right of the Gender device. (A black rectangle will highlight when you're in the right position to drop the mixer.)
- Drag a second mixer into the sampler, but do not release it yet. You will see four pink dots that represent locations where you can place the mixer. If you drop the mixer on the pink dot in the Gender spinner or in the first mixer, it will replace that device. Because you want this to be a second Name mixer, drop it on the pink dot attached to the Gender mixer, and directly below the first Name mixer. Notice the Male and Female labels on the lines connecting the Gender device to the Name devices.
- Click Attr2 and change it to Name.
- Copy each list below, including the header. Click to select the appropriate mixer, and paste the values by choosing Edit | Paste Cases (or use the appropriate keyboard shortcuts). TinkerPlots will read the first value in each list as an attribute header.
Male Charlie Shadow Spot Jack Max Smokey Oliver Buddy Simbar Tiger Female Cali Peaches Coco Daisy Sugar Mitsy Bella Tasha Fluffy Carmel
The next attribute to add is Length. This is a numeric attribute with varying probability for each value, so you can use a bars or a curve device.
- Drag a bars device into the sampler and drop it just to the right of the Male Name mixer. Then drag a second bars device and drop it to the right of the Female Name mixer. (Males and females have different distributions of potential lengths, so their lengths are chosen from different devices.)
- Change Attr3 to Length.
Cats typically have lengths between 10 and 30 inches. There are several ways to specify a range.
- Hold your mouse over the top bars device, and click the + button. Then change a to 10.
- Click the + button again. Notice that it automatically adds the next number to the device. You can continue to click the + button until you have all the values from 10 to 30, but there is a faster way.
- Click the ... button and enter the range "10 to 30." Do the same with the lower bars device. Now, are all these lengths equally likely? Are 30-inch-long cats common?
Because it is more likely that cats have lengths in the middle of this range, let's shape the Length distribution so it has a bump in the middle.
- Position your cursor at the top of the far-left bar in the Length device for male cats. Click and drag the cursor over the bars in the shape of the desired distribution.
- Repeat for the Length of female cats. Female cats tend to be a little shorter than male cats, so you may want to make your bump a little to the left of the bump in the distribution for male cats.
The final attribute is Eye Color. Because this is a categorical attribute, and different eye colors may not be equally likely, a spinner is the best device.
- Drag a spinner into the sampler and drop it to the right of the bars device for male cats.
- Drag a second spinner into the sampler and drop it to the right of the bars device for female cats.
Eye color doesn't vary for male and female cats, so let's join these two devices.
- Hold your mouse over the upper spinner, and click the Device options menu. Choose Merge Device | Merge with Device Below.
- Change Attr4 to EyeColor.
- Click the + button to add three sections to the mixer. Label the values Yellow, Green, and Blue.
- Blue-eyed cats are less common than yellow or green-eyed cats, so make the Blue section smaller than the other two by dragging its boundaries.
You should now have a cat factory that resembles the one in Simulating Data.tp. Click RUN to generate five cats with randomized attributes. Notice that a results table automatically appears and is filled in.
Hiding Sampler Values
You can hide values in the sampler and have other people draw conclusions based on experimental results. Suppose you want someone to guess the least common eye color for cats, based on the sampler's results.
- Click the Device options menu in the EyeColor spinner, and choose Hide Contents. Click OK.
- To lock the sampler, click the Lock icon in the bottom-left corner of the sampler. You might use the correct solution as the password. (Note that the password is case sensitive.)
Now someone can make a conjecture about the least common cat eye color by generating data and plotting EyeColor.
- You can unlock the sampler by clicking the lock and entering the password. To reveal the hidden contents in the EyeColor spinner, choose Show Contents from the Device options menu.
In the next tutorial, Modeling Probability, you will learn to use a sampler to model a probability experiment.