SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Tenorio 1


                               IB Math SL Internal Assessment:
                                     Farmville Statistics
                                        Arielle Tenorio
                                           Period 6

Farmville is a popular computer game that is hosted by the social networking website, Facebook.
This game allows players to manage a virtual farm by plowing, planting, growing, and
harvesting on their virtual farmland. Crops, trees, and livestock can be purchased with the
“FarmCoins” that are earned by harvesting. There are also levels in this game that are achieved
by reaching a certain amount of experience points. Players of higher levels tend to have larger
farms, more crops, and more FarmCoins than those of lower levels.

This assignment will examine the relationship between the number of trees a Farmville player
has and what level they are on in the game. It is predicted that there will be a positive
relationship. This assumption can be confirmed or denied by analyzing and processing collected
data. First, a scatter plot will be produced with a line of linear regression to display the trend of
the data. The correlation coefficient value for the two variables will also be determined. A box
and whisker plot will compare the highest-ranking players out of those surveyed and the lowest-
ranking and the number of trees that both groups tend to own. A chi-squared test will test for
independence to find if the two factors occur as a result of one another or is they are unrelated
events.
Tenorio 2




Data samples were collected from 25 random Farmville players after logging onto Facebook and
opening the Farmville game. After visiting the virtual farms of 25 “Friends” and counting the
number of trees on each farm, a table was drawn up to organize the collected values.

                                  Figure 1: Collected Data
                 #            Farmville Level                Number of Trees
                 1                     7                              7
                 2                     8                              9
                 3                     9                              9
                 4                     9                             16
                 5                    10                             11
                 6                    13                             17
                 7                    13                             23
                 8                    13                             45
                 9                    15                             19
                10                    15                             20
                11                    16                             33
                12                    16                             35
                13                    16                             16
                14                    18                             21
                15                    19                             18
                16                    20                             20
                17                    22                             79
                18                    22                             35
                19                    23                             41
                20                    23                             28
                21                    25                             62
                22                    26                             35
                23                    28                             44
                24                    31                             94
                25                    34                             40
                   Figure 1: This table displays the data that was collected.


From the table, it can be observed that the number of trees generally increases as the level
increases. The values on this table will be generated onto a scatter plot.
Tenorio 3




A scatter plot is used to visually display the relationship between two variables on a two-
dimensional graph. A line of linear regression, or trend line, can be found to confirm the
observation of the relationship. A correlation between the variables occurs as a result of the
clustering of data points around the trend line.


                                       Figure 2: Scatter Plot and Linear Regression Line

                                  The Relationship Between Level and
                        100                Number of Trees
                         90
                         80
                         70
      Number of Trees




                         60
                         50
                         40
                         30
                         20
                                                                      y = 2.0783x - 6.4127
                         10
                          0
                              0    5      10      15      20         25     30       35      40
                                                   Farmville Level

Figure 2: This scatter plot shows a positive relationship between the level of Farmville and the
number of trees a player has. The line of linear regression is produced by using Microsoft Excel.
The calculations to find this equation manually is produced below.



Line of Linear Regression:
The formula for finding the linear regression line for y on x is
                                                 S xy
                                        y − y = 2 (x − x)
                                                 Sx
where y is the average of Y variables, x is the average of X variables, Sxy is the covariance of X
and Y and Sx2 is the standard deviation of X, squared.

In order to find these values, the data was organized into a table, below.
Tenorio 4




                           Figure 3: Table for Linear Regression Line
        #          Level (x)         Trees (y)           xy                x²              y²
        1                7                7               49               49              49
        2                8                9               72               64              81
        3                9                9               81               81              81
        4                9               16              144               81             256
        5               10               11              110              100             121
        6               13               17              221              169             289
        7               13               23              299              169             529
        8               13               45              585              169            2025
        9               15               19              285              225             361
       10               15               20              300              225             400
       11               16               33              528              256            1089
       12               16               35              560              256            1225
       13               16               16              256              256             256
       14               18               21              378              324             441
       15               19               18              342              361             324
       16               20               20              400              400             400
       17               22               79             1738              484            6241
       18               22               35              770              484            1225
       19               23               41              943              529            1681
       20               23               28              644              529             784
       21               25               62             1550              625            3844
       22               26               35              910              676            1225
       23               28               44             1232              784            1936
       24               31               94             2914              961            8836
       25               34               40             1360             1156            1600
           ∑  =        451              777            16671             9413           35299
        mean =        18.04            31.08           666.84           376.52         1411.96
Figure 3: The sums and averages of of x, y, xy, x² and y² were found and listed. By organizing the
data in this manner, it was easier to quickly find the values for Sxy and Sx2. The calculations are
shown below.


    ∑ x = 451           ∑ y = 777           ∑ xy = 16671          ∑x   2
                                                                           = 9413      n = 25


To find the average of x:

x=
    ∑ x = 451 = 18.04
      n     25

To find the average of y:
Tenorio 5


y=
     ∑ y = 777 = 31.08
         n     25


To find Sxy:

S xy =
       ∑ ( xy) − x y
         n

         (16671)
S xy =           − (18.04)(31.08)
            25

S xy ≈ 106.16

To find Sx2:

Sx =
      ∑ x2 − x 2
       n

      9413
Sx =        − 18.04 2
       25
s x ≈ 51.08

To find the equation of the line of linear regression:
         S xy
 y − y = 2 (x − x)
         Sx
              106.16                                      y = 31.08         x = 18.04
y – (31.08) =        ( x − 18.04)
               51.08                                   s xy = 106.16         2
                                                                           s x = 51.08
y – (31.08) = 2.078x – 37.493

y = 2.078x – 6.413



The correlation between the two values can also be found. Pearson’s correlation coefficient
formula is used to find this value. If r = 1, then it is said that the x and y values are perfectly
correlated. If r = 0, then x and y are not correlated. If r = -1, then x and y are perfectly negatively
correlated. By calculating the correlation coefficient, the degree of linearity between X and Y can
be determined.

Pearson’s Correlation Coefficient Formula:

The formula for finding the correlation coefficient is
Tenorio 6


                                     r=
                                                     ∑ ( xy) − nxy
                                                                                    .
                                              ∑ ( x ) − nx ∑ ( y
                                                   2      2          2
                                                                         ) − ny 2
Most of the values have already been determined while finding the linear regression line
equation.
To find the correlation coefficient, r:


 r=
             ∑ ( xy) − nxy
      ∑ ( x ) − nx ∑ ( y
              2      2         2
                                   ) − ny 2                                n = 25        ∑ xy = 16671
                                                                         ∑ x 2 = 9413    ∑ y = 35299
                                                                                            2


              (16671 − 25 ⋅ 18.04 ⋅ 31.08)                                 x = 18.04       y = 31.08
r=
      9413 − 25 ⋅ 325.44 35299 − 25 ⋅ 965.97                              x 2 = 325.44    y 2 = 965.97

r = 0.70334

r² = 0.49468


The correlation value can be rounded to 0.703. It can be stated that there is a moderate, positive
correlation between x and y. The positive r value means that the level of a Farmville player (x)
increases, then so does the number of trees (y). The graph also represents the positive
relationship. However, it will be noted that there are data points that do not cluster as closely to
the trend line as the other data points such as points (22, 79) and (31, 94). These points are
considered outliers. They might appear as a result of the freedom every player has to purchase a
wide variety of items other than trees (animals, seeds, decorations, buildings etc.). Not all players
have the same desire to purchase trees. Parallel boxplots can be used to display some of the
descriptive statistics of the data sets x and y.




The parallel boxplots will present a visual comparison of the distribution of the data as well as
the descriptive statistics. These descriptive statistics are median, range, interquartile range
minimum and maximum. The spread of data for the number of trees owned by the lowest-
ranking half of Farmville players surveyed (levels 7-15) will be compared to that of the highest-
ranking players from the group of 25 players (levels 16-34). It is predicted that the lower-level
players will less trees while higher-level players will have a greater number of trees, but there
may be some overlapping data.

                         Figure 4: Number of Trees for Levels 7-15 and 16-34
                            Statistic   Levels 7-15        Levels 16-34
                           Quartile 1        9                  21
                           Minimum           7                  16
Tenorio 7


                        Median             16.5                 35
                       Maximum              45                  94
                       Quartile 3           20                  44
Figure 4: This table shows the five number summaries for level and number of trees. The data
that is organized here will be shown in the box and whisker plot.

                                   Figure 5: Box and Whisker Plot

            100

             90

             80

             70
                                                                                   Quartile 1
             60
                                                                                   Minimum
             50                                                                    Median
                                                                                   Maximum
             40
                                                                                   Quartile 3
             30

             20

             10

              0
                            Levels 7-15                   Levels 16-34

Figure 5: The box and whisker plot compares the spread of data for Farmville players and the
number of trees they own. Fifty-percent of the highest ranking players out of the group that was
tested own anywhere from 21 to 45 trees, whereas the middle fifty-percent of lowest ranking
players own from 10 to 20 trees. Some beginner players, however, seem to own as many trees as
the higher-level players.

By comparing the descriptive statistics describing the number of trees that the highest ranking
players own versus the lower players, it can be seen that while higher-ranking players tend to
have more trees, it is not necessarily true that lower-ranking players cannot surpass them in
number of trees owned. This can be seen on the plot, as twenty-five percent of the lower level
players own about as much as the higher-level group’s middle fifty-percent. However, the
higher-level group has a greater median than that of the lower-level group, which suggests that
they own more trees than most of the beginner players.
Tenorio 8


A chi-squared test will now be performed to determine if the number of trees a player has and
their level in the game are independent or independent events. The equation for the chi-squared
test is
                                                 ( f − fe )2
                                        X2 =∑ o
                                                     fe
where fo is the observed frequency and fe is the expected frequency. Contingency tables will be
constructed to show the results of the 25 surveyed players. One table displays the observed
values, while another displays the expected values.

Observed values table:

                                                        Trees
                                           7-30         >30          Total
                             7-15           10            0           10
                            16-34            4           11           15
                  Level




                            Total           14          11            25

Expected values table:

                                                        Trees
                                           7-30         >30          Total
                             7-15           5.6         4.4           10
                            16-34           8.4         6.6           15
                  Level




                            Total           14          11            25


To find expected value (for box 7-15 x 7-30):
      10 ⋅ 14
fe =
        23
f e = 5.6

Before performing the chi-squared test, the null and alternative hypotheses are formed, the
degree of freedom is calculated, and the significance level is stated.
Ho (null hypothesis) states that game level and amount of trees are independent events.
H1 (alternative hypothesis) states that the two events are not independent.
There is 1 degree of freedom.
At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 .
                                                  2




To find degrees of freedom for a 2 x 2 contingency table:
df = (r-1)(c-1)
df = (2-1)(2-1)
df= 1
Tenorio 9


Using the contingency tables, X2 is found using the equation quoted above. The table below
organizes the values needed for the calculation.




                                      Figure 6: X2 Calculation
                                                                               ( fo − fe )2
              fo                  fe            fo − fe       ( fo − fe )2           fe
             10                 5.6               4.4            19.36        3.457142857
              0                 4.4              -4.4            19.36              4.4
              4                 8.4              -4.4            19.36        2.304761905
             11                 6.6               4.4            19.36        2.933333333
                                                                 Total=       13.0952381
                   Figure 6: This table shows how the chi-squared value was found.

X 2 ≈ 13.1
Because the X2 is greater than 5.99, we will reject the null hypothesis that states that the
Farmville player’s level and amount of trees are dependent events.


According to the scatter plot and the line of linear regression, there is a positive relationship
between the number of trees a Farmville player has and what level they are on in the game. By
finding Pearson’s correlation coefficient, it was determined that there is a moderate correlation
between the two variables. As stated before, this could be because more experienced players tend
to have more “FarmCoins” to purchase trees. Lower-level players and beginners are more likely
to buy smaller, cheaper plants. The boxplot also showed that higher-level players own more
trees, but also suggested that lower-level players have the ability to own more trees than high-
level players. The chi-square test showed that the two factors are dependent events. The level of
a Farmville player and the number of trees they own in the game are dependent events. They
have a positive correlation suggesting that as a player rises in level, they buy more trees.

There were a couple data samples that did not cluster as closely to the linear regression line as
the other data points did. These data points are considered to be outliers. Each player has the
freedom to use their “FarmCoins” on various accessories for their farms, such as animals, seeds,
and decorations, and not all players are interested in buying the same items for their virtual farm.
Some players may buy more trees than seeds or animals. To determine if these outliers skew the
data significantly, a chi-squared test will be performed on the data again with the outliers
removed. The table below displays the data samples without the two outliers, (22, 79) and (31,
94).
Tenorio 10




                               Figure 7: Data without Outliers
                             Farmville Level Number of Trees
                                     7                    7
                                     8                    9
                                     9                    9
                                     9                   16
                                    10                   11
                                    13                   17
                                    13                   23
                                    13                   45
                                    15                   19
                                    15                   20
                                    16                   33
                                    16                   35
                                    16                   16
                                    18                   21
                                    19                   18
                                    20                   20
                                    22                   35
                                    23                   41
                                    23                   28
                                    25                   62
                                    26                   35
                                    28                   44
                                    34                   40
             Figure 7: This data will be used to perform a second chi-squared test.

Observed values table:
                                                          Trees
                                                                      Tota
                                                 7-30        >30        l
                                    7-15          10          0        10
                                    16-34          4          9        13
                         Level




                                    Total         14          9        23

Expected values table:
Tenorio 11


                                                             Trees
                                                                         Tota
                                                   7-30      >30           l
                                        7-15     6.086957 3.913043        10
                                       16-34     7.913043 5.086957        13




                           Level
                                       Total        14        9           23




Ho (null hypothesis) states that game level and amount of trees are independent events.
H1 (alternative hypothesis) states that the two events are not independent.
There is 1 degree of freedom.
At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 .
                                                  2




Using the contingency tables, X2 is found using the equation quoted above. The table below
organizes the values needed for the calculation.


                             Figure 8: X2 Calculation without Outliers
                                                                              ( fo − fe )2
             fo                  fe            fo − fe       ( fo − fe )2           fe
            10                 6.1               3.9            15.21          2.493443
             0                 3.9              -3.9            15.21              3.9
             4                 7.9              -3.9            15.21          1.925316
             9                 5.1               3.9            15.21          2.982353
                                                                Total=        11.30111
                  Figure 8: This table shows how the chi-squared value was found.

 X 2 ≈ 11.3
Because the X2 is greater than 3.84, we will reject the null hypothesis that states that the
Farmville player’s level and amount of trees are dependent events. This concludes that the
outliers did not have a significant affect on the outcome of the processed data, and did not skew
the results.

Weitere ähnliche Inhalte

Andere mochten auch

TrackEat (Food processing tracking)
TrackEat (Food processing tracking)TrackEat (Food processing tracking)
TrackEat (Food processing tracking)paolo aldera
 
10 Imperatives for Charting a New B2B Marketing Course
10 Imperatives for Charting a New B2B Marketing Course10 Imperatives for Charting a New B2B Marketing Course
10 Imperatives for Charting a New B2B Marketing CourseMarketingProfs
 
What You Need to Know About Paid Search [Visual Summary]
What You Need to Know About Paid Search [Visual Summary]What You Need to Know About Paid Search [Visual Summary]
What You Need to Know About Paid Search [Visual Summary]MarketingProfs
 
Fm parfüm katalógus
Fm parfüm katalógusFm parfüm katalógus
Fm parfüm katalógusEvi Horvath
 
Web e promozione della salute
Web e promozione della saluteWeb e promozione della salute
Web e promozione della saluteMarco Vagnozzi
 
Presentatie LSV Joeri van Steenhoven
Presentatie LSV Joeri van SteenhovenPresentatie LSV Joeri van Steenhoven
Presentatie LSV Joeri van SteenhovenEveline van der Grift
 

Andere mochten auch (11)

Curriculum
CurriculumCurriculum
Curriculum
 
TrackEat (Food processing tracking)
TrackEat (Food processing tracking)TrackEat (Food processing tracking)
TrackEat (Food processing tracking)
 
10 Imperatives for Charting a New B2B Marketing Course
10 Imperatives for Charting a New B2B Marketing Course10 Imperatives for Charting a New B2B Marketing Course
10 Imperatives for Charting a New B2B Marketing Course
 
Why Captiv8?
Why Captiv8?Why Captiv8?
Why Captiv8?
 
What You Need to Know About Paid Search [Visual Summary]
What You Need to Know About Paid Search [Visual Summary]What You Need to Know About Paid Search [Visual Summary]
What You Need to Know About Paid Search [Visual Summary]
 
Data 2
Data 2Data 2
Data 2
 
Fm parfüm katalógus
Fm parfüm katalógusFm parfüm katalógus
Fm parfüm katalógus
 
Web e promozione della salute
Web e promozione della saluteWeb e promozione della salute
Web e promozione della salute
 
Festa kynning okt 2013
Festa kynning okt 2013Festa kynning okt 2013
Festa kynning okt 2013
 
Web e prevenzione
Web e prevenzioneWeb e prevenzione
Web e prevenzione
 
Presentatie LSV Joeri van Steenhoven
Presentatie LSV Joeri van SteenhovenPresentatie LSV Joeri van Steenhoven
Presentatie LSV Joeri van Steenhoven
 

Ähnlich wie Math ia farmville final

Scatter Diagrams
Scatter DiagramsScatter Diagrams
Scatter DiagramsNat Evans
 
Ch2.8 Display Data
Ch2.8 Display DataCh2.8 Display Data
Ch2.8 Display Datamdicken
 
Analyse and intepretation of test scores
Analyse and intepretation of test scoresAnalyse and intepretation of test scores
Analyse and intepretation of test scoresNik Bahirah
 
Output primitives computer graphics c version
Output primitives   computer graphics c versionOutput primitives   computer graphics c version
Output primitives computer graphics c versionMarwa Al-Rikaby
 
Graphs in physics
Graphs in physicsGraphs in physics
Graphs in physicssimonandisa
 

Ähnlich wie Math ia farmville final (7)

Scatter diagrams
Scatter diagramsScatter diagrams
Scatter diagrams
 
Scatter Diagrams
Scatter DiagramsScatter Diagrams
Scatter Diagrams
 
Ch2.8 Display Data
Ch2.8 Display DataCh2.8 Display Data
Ch2.8 Display Data
 
Scatter Diagrams
Scatter DiagramsScatter Diagrams
Scatter Diagrams
 
Analyse and intepretation of test scores
Analyse and intepretation of test scoresAnalyse and intepretation of test scores
Analyse and intepretation of test scores
 
Output primitives computer graphics c version
Output primitives   computer graphics c versionOutput primitives   computer graphics c version
Output primitives computer graphics c version
 
Graphs in physics
Graphs in physicsGraphs in physics
Graphs in physics
 

Math ia farmville final

  • 1. Tenorio 1 IB Math SL Internal Assessment: Farmville Statistics Arielle Tenorio Period 6 Farmville is a popular computer game that is hosted by the social networking website, Facebook. This game allows players to manage a virtual farm by plowing, planting, growing, and harvesting on their virtual farmland. Crops, trees, and livestock can be purchased with the “FarmCoins” that are earned by harvesting. There are also levels in this game that are achieved by reaching a certain amount of experience points. Players of higher levels tend to have larger farms, more crops, and more FarmCoins than those of lower levels. This assignment will examine the relationship between the number of trees a Farmville player has and what level they are on in the game. It is predicted that there will be a positive relationship. This assumption can be confirmed or denied by analyzing and processing collected data. First, a scatter plot will be produced with a line of linear regression to display the trend of the data. The correlation coefficient value for the two variables will also be determined. A box and whisker plot will compare the highest-ranking players out of those surveyed and the lowest- ranking and the number of trees that both groups tend to own. A chi-squared test will test for independence to find if the two factors occur as a result of one another or is they are unrelated events.
  • 2. Tenorio 2 Data samples were collected from 25 random Farmville players after logging onto Facebook and opening the Farmville game. After visiting the virtual farms of 25 “Friends” and counting the number of trees on each farm, a table was drawn up to organize the collected values. Figure 1: Collected Data # Farmville Level Number of Trees 1 7 7 2 8 9 3 9 9 4 9 16 5 10 11 6 13 17 7 13 23 8 13 45 9 15 19 10 15 20 11 16 33 12 16 35 13 16 16 14 18 21 15 19 18 16 20 20 17 22 79 18 22 35 19 23 41 20 23 28 21 25 62 22 26 35 23 28 44 24 31 94 25 34 40 Figure 1: This table displays the data that was collected. From the table, it can be observed that the number of trees generally increases as the level increases. The values on this table will be generated onto a scatter plot.
  • 3. Tenorio 3 A scatter plot is used to visually display the relationship between two variables on a two- dimensional graph. A line of linear regression, or trend line, can be found to confirm the observation of the relationship. A correlation between the variables occurs as a result of the clustering of data points around the trend line. Figure 2: Scatter Plot and Linear Regression Line The Relationship Between Level and 100 Number of Trees 90 80 70 Number of Trees 60 50 40 30 20 y = 2.0783x - 6.4127 10 0 0 5 10 15 20 25 30 35 40 Farmville Level Figure 2: This scatter plot shows a positive relationship between the level of Farmville and the number of trees a player has. The line of linear regression is produced by using Microsoft Excel. The calculations to find this equation manually is produced below. Line of Linear Regression: The formula for finding the linear regression line for y on x is S xy y − y = 2 (x − x) Sx where y is the average of Y variables, x is the average of X variables, Sxy is the covariance of X and Y and Sx2 is the standard deviation of X, squared. In order to find these values, the data was organized into a table, below.
  • 4. Tenorio 4 Figure 3: Table for Linear Regression Line # Level (x) Trees (y) xy x² y² 1 7 7 49 49 49 2 8 9 72 64 81 3 9 9 81 81 81 4 9 16 144 81 256 5 10 11 110 100 121 6 13 17 221 169 289 7 13 23 299 169 529 8 13 45 585 169 2025 9 15 19 285 225 361 10 15 20 300 225 400 11 16 33 528 256 1089 12 16 35 560 256 1225 13 16 16 256 256 256 14 18 21 378 324 441 15 19 18 342 361 324 16 20 20 400 400 400 17 22 79 1738 484 6241 18 22 35 770 484 1225 19 23 41 943 529 1681 20 23 28 644 529 784 21 25 62 1550 625 3844 22 26 35 910 676 1225 23 28 44 1232 784 1936 24 31 94 2914 961 8836 25 34 40 1360 1156 1600 ∑ = 451 777 16671 9413 35299 mean = 18.04 31.08 666.84 376.52 1411.96 Figure 3: The sums and averages of of x, y, xy, x² and y² were found and listed. By organizing the data in this manner, it was easier to quickly find the values for Sxy and Sx2. The calculations are shown below. ∑ x = 451 ∑ y = 777 ∑ xy = 16671 ∑x 2 = 9413 n = 25 To find the average of x: x= ∑ x = 451 = 18.04 n 25 To find the average of y:
  • 5. Tenorio 5 y= ∑ y = 777 = 31.08 n 25 To find Sxy: S xy = ∑ ( xy) − x y n (16671) S xy = − (18.04)(31.08) 25 S xy ≈ 106.16 To find Sx2: Sx = ∑ x2 − x 2 n 9413 Sx = − 18.04 2 25 s x ≈ 51.08 To find the equation of the line of linear regression: S xy y − y = 2 (x − x) Sx 106.16 y = 31.08 x = 18.04 y – (31.08) = ( x − 18.04) 51.08 s xy = 106.16 2 s x = 51.08 y – (31.08) = 2.078x – 37.493 y = 2.078x – 6.413 The correlation between the two values can also be found. Pearson’s correlation coefficient formula is used to find this value. If r = 1, then it is said that the x and y values are perfectly correlated. If r = 0, then x and y are not correlated. If r = -1, then x and y are perfectly negatively correlated. By calculating the correlation coefficient, the degree of linearity between X and Y can be determined. Pearson’s Correlation Coefficient Formula: The formula for finding the correlation coefficient is
  • 6. Tenorio 6 r= ∑ ( xy) − nxy . ∑ ( x ) − nx ∑ ( y 2 2 2 ) − ny 2 Most of the values have already been determined while finding the linear regression line equation. To find the correlation coefficient, r: r= ∑ ( xy) − nxy ∑ ( x ) − nx ∑ ( y 2 2 2 ) − ny 2 n = 25 ∑ xy = 16671 ∑ x 2 = 9413 ∑ y = 35299 2 (16671 − 25 ⋅ 18.04 ⋅ 31.08) x = 18.04 y = 31.08 r= 9413 − 25 ⋅ 325.44 35299 − 25 ⋅ 965.97 x 2 = 325.44 y 2 = 965.97 r = 0.70334 r² = 0.49468 The correlation value can be rounded to 0.703. It can be stated that there is a moderate, positive correlation between x and y. The positive r value means that the level of a Farmville player (x) increases, then so does the number of trees (y). The graph also represents the positive relationship. However, it will be noted that there are data points that do not cluster as closely to the trend line as the other data points such as points (22, 79) and (31, 94). These points are considered outliers. They might appear as a result of the freedom every player has to purchase a wide variety of items other than trees (animals, seeds, decorations, buildings etc.). Not all players have the same desire to purchase trees. Parallel boxplots can be used to display some of the descriptive statistics of the data sets x and y. The parallel boxplots will present a visual comparison of the distribution of the data as well as the descriptive statistics. These descriptive statistics are median, range, interquartile range minimum and maximum. The spread of data for the number of trees owned by the lowest- ranking half of Farmville players surveyed (levels 7-15) will be compared to that of the highest- ranking players from the group of 25 players (levels 16-34). It is predicted that the lower-level players will less trees while higher-level players will have a greater number of trees, but there may be some overlapping data. Figure 4: Number of Trees for Levels 7-15 and 16-34 Statistic Levels 7-15 Levels 16-34 Quartile 1 9 21 Minimum 7 16
  • 7. Tenorio 7 Median 16.5 35 Maximum 45 94 Quartile 3 20 44 Figure 4: This table shows the five number summaries for level and number of trees. The data that is organized here will be shown in the box and whisker plot. Figure 5: Box and Whisker Plot 100 90 80 70 Quartile 1 60 Minimum 50 Median Maximum 40 Quartile 3 30 20 10 0 Levels 7-15 Levels 16-34 Figure 5: The box and whisker plot compares the spread of data for Farmville players and the number of trees they own. Fifty-percent of the highest ranking players out of the group that was tested own anywhere from 21 to 45 trees, whereas the middle fifty-percent of lowest ranking players own from 10 to 20 trees. Some beginner players, however, seem to own as many trees as the higher-level players. By comparing the descriptive statistics describing the number of trees that the highest ranking players own versus the lower players, it can be seen that while higher-ranking players tend to have more trees, it is not necessarily true that lower-ranking players cannot surpass them in number of trees owned. This can be seen on the plot, as twenty-five percent of the lower level players own about as much as the higher-level group’s middle fifty-percent. However, the higher-level group has a greater median than that of the lower-level group, which suggests that they own more trees than most of the beginner players.
  • 8. Tenorio 8 A chi-squared test will now be performed to determine if the number of trees a player has and their level in the game are independent or independent events. The equation for the chi-squared test is ( f − fe )2 X2 =∑ o fe where fo is the observed frequency and fe is the expected frequency. Contingency tables will be constructed to show the results of the 25 surveyed players. One table displays the observed values, while another displays the expected values. Observed values table: Trees 7-30 >30 Total 7-15 10 0 10 16-34 4 11 15 Level Total 14 11 25 Expected values table: Trees 7-30 >30 Total 7-15 5.6 4.4 10 16-34 8.4 6.6 15 Level Total 14 11 25 To find expected value (for box 7-15 x 7-30): 10 ⋅ 14 fe = 23 f e = 5.6 Before performing the chi-squared test, the null and alternative hypotheses are formed, the degree of freedom is calculated, and the significance level is stated. Ho (null hypothesis) states that game level and amount of trees are independent events. H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom. At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 . 2 To find degrees of freedom for a 2 x 2 contingency table: df = (r-1)(c-1) df = (2-1)(2-1) df= 1
  • 9. Tenorio 9 Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation. Figure 6: X2 Calculation ( fo − fe )2 fo fe fo − fe ( fo − fe )2 fe 10 5.6 4.4 19.36 3.457142857 0 4.4 -4.4 19.36 4.4 4 8.4 -4.4 19.36 2.304761905 11 6.6 4.4 19.36 2.933333333 Total= 13.0952381 Figure 6: This table shows how the chi-squared value was found. X 2 ≈ 13.1 Because the X2 is greater than 5.99, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events. According to the scatter plot and the line of linear regression, there is a positive relationship between the number of trees a Farmville player has and what level they are on in the game. By finding Pearson’s correlation coefficient, it was determined that there is a moderate correlation between the two variables. As stated before, this could be because more experienced players tend to have more “FarmCoins” to purchase trees. Lower-level players and beginners are more likely to buy smaller, cheaper plants. The boxplot also showed that higher-level players own more trees, but also suggested that lower-level players have the ability to own more trees than high- level players. The chi-square test showed that the two factors are dependent events. The level of a Farmville player and the number of trees they own in the game are dependent events. They have a positive correlation suggesting that as a player rises in level, they buy more trees. There were a couple data samples that did not cluster as closely to the linear regression line as the other data points did. These data points are considered to be outliers. Each player has the freedom to use their “FarmCoins” on various accessories for their farms, such as animals, seeds, and decorations, and not all players are interested in buying the same items for their virtual farm. Some players may buy more trees than seeds or animals. To determine if these outliers skew the data significantly, a chi-squared test will be performed on the data again with the outliers removed. The table below displays the data samples without the two outliers, (22, 79) and (31, 94).
  • 10. Tenorio 10 Figure 7: Data without Outliers Farmville Level Number of Trees 7 7 8 9 9 9 9 16 10 11 13 17 13 23 13 45 15 19 15 20 16 33 16 35 16 16 18 21 19 18 20 20 22 35 23 41 23 28 25 62 26 35 28 44 34 40 Figure 7: This data will be used to perform a second chi-squared test. Observed values table: Trees Tota 7-30 >30 l 7-15 10 0 10 16-34 4 9 13 Level Total 14 9 23 Expected values table:
  • 11. Tenorio 11 Trees Tota 7-30 >30 l 7-15 6.086957 3.913043 10 16-34 7.913043 5.086957 13 Level Total 14 9 23 Ho (null hypothesis) states that game level and amount of trees are independent events. H1 (alternative hypothesis) states that the two events are not independent. There is 1 degree of freedom. At a 5% (0.05) significance level with df = 1, X 0.05 = 3.84 . 2 Using the contingency tables, X2 is found using the equation quoted above. The table below organizes the values needed for the calculation. Figure 8: X2 Calculation without Outliers ( fo − fe )2 fo fe fo − fe ( fo − fe )2 fe 10 6.1 3.9 15.21 2.493443 0 3.9 -3.9 15.21 3.9 4 7.9 -3.9 15.21 1.925316 9 5.1 3.9 15.21 2.982353 Total= 11.30111 Figure 8: This table shows how the chi-squared value was found. X 2 ≈ 11.3 Because the X2 is greater than 3.84, we will reject the null hypothesis that states that the Farmville player’s level and amount of trees are dependent events. This concludes that the outliers did not have a significant affect on the outcome of the processed data, and did not skew the results.