Você está na página 1de 23

STATISTICS STEP2

_____________________
TYPES OF EPIDEMOLOGICAL STUDIES:
________________________________
. There 2 main objective for epidemological studies; descriptive and analytic.
. Descriptive epidemology deals with rates, ratios and distributions, it explai
n the determinents of the disease in the form of time place and person.
. Analytical epidemological tests consist of observational studies and experime
ntal studies.
. Observational studies include: Case-Control, Cohort and cross-sectional studi
es.
. CASE CONTROL STUDY (Retrospective study):
__________________________________________
. The movement is from the effect to the disease.
. The researcher begins with a population with a certain outcome, and subjects
are classified into either "cases" or "controls" based on the outcome status.
. The cases and controls are assessed retrospectively to for the presence of ri
sk factor (Information is collected about exposure to risk factors).
. Is very popular in exploring an exposure - disease assossiation.
. Selection of control subjects based on exposure status (exposed diseased or e
ven non exposed non diseased) is inappropriate,
because comparing the frequency of exposure between the case and control grou
ps is an important part of case-control study.
. Optimal selection of control group is to provide an acurrate estimation of ex
posure frequency among non diseased general population (bothh exposed and non ex
posed).
. Independent variables (age, sex,...) are often selected to be the same (match
ed) between the case and control groups to decrease the effect of confounding.
. Subjects with the disease of interest (case group) are comapred with an other
wise similar group that is disease free (control group).
. It is retrospective study aiming at determining the association between risk
factors and disease occurance.
. The main measure of association is exposure Odds ratio can be calculated in t
he case control study but incidence of the disease can't.
. One of the drawbacks of case control study is that the risk can not be drived
directly from it's results.
. It is more cheap and easy than cohort study.
.N.B.:
- Incidence measures ( e.g. relative risk or relative rate) can't be directly
measured in case-control study,
- Because the people being studied are those who have already developed the di
sease.
- Relative risk and relative rate are calculated in cohort studies, where peop
le are followed over time for the occurance of the disease.
- prevalence odds ratio is calculated in cross-sectional studies to compare th
e prevalence of the disease in different populations.
. A Prospective or longitudinal COHORT STUDY:
__________________________________________
. Divides the study group into "exposed" and "non exposed" to the risk factors.
. Each subject is then follow prospectively till the presence of the disease.
. Is a prospective observational study in which groups are chosen based upon th
e presence or absence of one or more risk factors.
. All subjects are then observed over time for the development of the disease o
f interest.
. Thus allowing estimation of the incidence within the total population and com
parison of incidences between subgroups.
. It is best for determining the incidence of the disease & comparing the incid
ence of the disease in 2 populations,
. (one with and one without agiven risk) allows for calculation of a relative r
isk.
. It is stronger than case-control study and cross sectional study.
. Loss to follow-up in a prospective studies creates a potential for selection
bias (selective loss of high risk or low risk subjects).
. e.g. if a substanial number of subjects are lost to follow-up in exposed and/
or unexposed groups,
. It is possible that the lost subjects differ in their risk of developing the
outcome from the remaining,
. Such loss may result in either overestimation or underestimation of the assoc
iation between exposure and the disease.
. Example: if 30% of subjects were lost to follow-up in a prospective study for
the relation of alcohol and breast cancer,
. There is no information available on whether these subjects develop breast ca
ncer or not.
. The number (30%) is substanial and will influence the outcome if heterogeneit
y in developing breast cancer exists between the lost subjects and the remaining
subjects.
. for example if the subjects lost in the exposed group experienced more breast
cancer than those with follow-up (selective loss of high risk subjects).
. As a result, the measure of association might be underestimated.
. To reduce the potential for selection bias in prospective studies, investigat
ors try to acheive high rats of follow-up
.N.B.:
- Median survival: used to compare the median survival times in two or more gr
oups of patients (e.g. receiving new treatment or placebo).
- Median survival is calculated in cohort study or clinical studies.
.N.B.:
- Prevalence odds ratio: is calculated in cross-sectional studies to compare t
he prevalence of the disease between two different peoples.
INCIDENCE: (typical for USMLE)
__________
. It is the frequency of new cases of a disease arising in a population at ris
k over a specified time period.
. It is the measure of the appearance of new cases.
. PREVALENCE: is the measure of those with the disease in the population at a p
articular point in time.
. the relation between them in a stable population (little migration) can be de
monstrated by:
Prevalence = (incidence) (time).
. So if the incidence is fixed in a stabe population, the prevalence is increas
ed if there are factors,
that prolong survival (i.e. disease duration) e.g. improved quality of care.
. Prevalence of disease in a population = incidence of the disease / population
.
. A Retrospective cohort study: (cohort = a group of individuals)
______________________________
. Starts at some point between the exposure and the outcome.
. The researcher reviews the past records and classify subjects into "exposed"
and "non exposed" and then follow them untill the outcome.
. In a cohort study, the study subjects are free of the outcome at the time a s
tudy begins.
. CROSS SECTIONAL STUDY: (Prevalence study)
________________________
. Both the exposure and the outcome are studied at one point of time (at one cr
oss section of time).
. Since both exposure and outcome are present for sometime before the study, it
is not possible to,
determine the temporal association between the exposure and outcome from cros
s-sectional study.
. Takes a sample of individual from a population at one point in time.
. It allows determination of a disease prevelance (the total number of cases in
a population at a given time).
. Disease incidence can't be determined.

. A CASE SERIES:
________________
. A study involving only patients already diagnosed with the condition of inter
est.
. It is helpful in determining the natural history of uncommon conditions.
. But provids no information about the disease incidence.
. CLINICAL TRIALS:
__________________
. Compare the therapeutic benefit of different interventions in patient already
diagnosed with a particular disease.
. Usually subjects are randomly arranged into exposed (treatment group) & place
bo and then followed to detect the development of the outcome of interest.
. Can't be used to determine disease incidence.
. RANDOMIZED CONTROL TRIALS:
____________________________
. Type of expermental study.
. It is considered as the gold standard for studying the effecacy of a treatmen
t or a procedure.
. Compare two or more treatments.
. Subjects are randomly assigned to an expermental (experienced a specific expo
sure e.g medication) and a control group (non exposed i.e placebo).
. This type of study has the least bias and helps to show a strong causal relat
ionship.
. CLUSTER ANALYSIS:
___________________
. Is the grouping of differeny data point into similar categories.
. Usually involves randomization at the level of groups rather than at the leve
l of individuals.
. CROSS OVER STUDY:
___________________
. In which a group of participants is randomized to one treatment for a period
of time,
. and the other group is given analternate treatment for the same period of tim
e (interchanging the treatment),
with a washout (no ttt) period inbetween the treatment intervals to limit the
confounding effect of the prior treatment.
. At the end of the time period, the two groups then switch treatment for anoth
er set period of time.
. PARALLEL GROUP STUDY:
_______________________
. Randomizes one treatment to one group and another treatment to the other grou
p.
. Such as treatment drug to one group versus a palcebo to the other group.
. There are usually no other variables are meassured.
. EFFECT MODIFICATION:
_____________________
. Occurs when the effect a main exposure on an outcome is modified by another v
ariable.
. It is not a bias.
. It is a natural phenomenon that shoud be described not corrected as it is not
a bias or confoundation.
. Example: the effect of oral contraceptives on breast cancer is modified by he
family history.
. i.e. women with +ve family history have an increased risk, while women withou
t +ve family history don't have an increased risk.
. Other examples: studying the effect of estrogen on the risk of venous thrombo
sis (modified by smoking).
. Also studying of the risk of lung cancer in people exposed to asbestos (great
ly depends on / modified by smoking).
. For exampl:
. the effect of a new estrogen receptors agonist drug on the incidence of DVT i
s modified by smoking status:
. Smokers taking the drug have an increased risk of developing DVT, while nonsm
okers taking the drug don't.
. It may be confused with confounding, both can be diffrentiated by dividing th
e whole cohort into subgroups (stratified analysis).
. Imagin that smoking is a confounding that, by itself is associated with a hig
her risk of DVT, so if more smokers are taking the drug,
it might appear that the drug causes DVT, but when stratified analysis is per
formed by analyzing smokers and nonsmokers separately,
it will appear that the drug is no longer associated with DVT.
. LATENT PERIOD:
________________
. Is a time period required for an exposure to start the effect i.e the time re
quire from getting exposed to outcome.
. In infectious diseases it is relatively short, while in chronic diseases (e.g
. cancer or CAD),
it may be very long and extended period of exposure may be required to affect
the outcome.
. Latent period also can be applied to the exposure to risk modifier, as it may
need to be continous over a certain period of time before influencing the outco
me.
. Latent period is a natural phenomenon not a bias.
. OUTLIER: extreme observation
______________________________
. It is defined as an extreme and unusual observed in a dataset.
. It may be the result of a recording error, a measurement error or a natural p
henomenon.
. It affects the measures of centeral tendency as well as measures of dispersio
n for example:
. The mean: is extremely sensetive to the outliers and easily shifts towards th
em.
. The standard deviation is sensetive to outliers because it is the measure of
dispersion within the data set,
. and outliers significantly increase the dispersion (SD = deviation of values
around the mean).
. The rang = maximum value - minimal value (so it is definitely changed).
. The mode is not changed by outliers as they dont change the most frequent val
ue observed.
. The median is much more resistant to the outliers as is located in the middle
of the dataset where the observations usually dont differ much from each other.

. RELATIVE RISK (RR):
_____________________
. Is used as ameasure for association in a cohort studies.
. It is the ratio of the risk in an exposed group to that of the unexposed grou
p.
. The NULL value of RR is 1.0.
. A RR of 1 means that there is no association between the risk factor and the
disease.
. A relative risk > 1 means that there is a positive association between the ri
sk factor and the outcome.
. A relative risk < 1 means that there is a negative association between the ri
sk factor and the association.
. The farther the the value of the RR from 1, the stronger the association.
. Example: the RR of bronchogenic cancer in smokers is greater than 2 --> indic
ates,
. a strong association between smoking (risk factor) and bronchogenic carcinoma
(outcome).
. When exposure is measured on a continous scale (Number of smoked cigarettes p
er day or PPD),
. The classification into two or more ordinal categories enable the risk to be
assessed as a function of exposure.
. And the DOSE RESPONCE EFFECT can be calculated from the exposure and the outc
ome.
. The present example illustrates a dose responce relationship between smoking
and bronchogenic cancer,
. (the RR of bronchgenic lung cancer increases as the number of smoked PPD inc
reases).
. One weakness of the RR is that it gives no clue whether such finding can be e
xplained by chance alone.
. The confidence interval and the "P" value can help strengthen the finding of
the study.
. For the study to be statistically significant:
1- The confidence intrval must not contain null value (1).
2- The "p" value should be less than 0.05 (i.e < 5% chance the result obtaine
d were due to chance alone).
3- The RR is not Null value (1).
- The "p" value is used to strengthen the results of the study, it is defined
as the propability of obtaining the result by chance alone.
- e.g. "P" value is 0.01 means that (the probability of obtaining the result b
y chance alone is 1%).
- Ther commonly accepted upper limit (cut-off point) of the "P" value for the
study,
to be considered statisically significant is 0.05 (i.e. less than 5%).
- The "P" value deals with random variability, not bias.
- If the "P" value less than 0.05 (i.e the study is statistically significant)
, the 95% confidence interval doesn't contain 1.0 (the null value for RR).
- A relative risk of 0.71 shows that the drug decreased the risk of mortality
by 29% (the null value for RR is 1).
e.g.: Acase of RR 1.6 (greater than 1) & the confidence interval 1.02-2.15 (does
n't contain the null value 1),
so for the study to be statistically significant the "P" value must be les
s than 0.05.
N.B: Verrrrrry important to know how to calculate relative risk fron the 22 table
:
- Relative risk = {a/(a+b)}/{c/(c+d)}
.N.B.: Absolute Risk Reduction (ARR)
------------------------------------
. RR = event rate for the drug or test i.e = +ve cases/ total nuber examined by
the test or drug
. In case of 2 drugs or interventions study one drug reduce the relative risk (
RR) than the other.
. Absolute risk Reuction (ARR) = RR of first drug(placebo) - RR of second drug
(under test).
. Number needed to treat (NNT): is the number of people that should receive a t
reatment to prevent one defined event.
. Is calculated by inverse the absolute risk reduction.
. NNT = 1/ARR.
.N.B.:
- The power of a study is the ability to detect a difference between two groups
(treated versus non treated, exposed versus non exposed).
- Increasing the sample size --> increases the power of the studyand consequent
ly makes
the confidence interval of the point of estimate (e.g. relative risk) tighter
.
- If the sample size is small --> low power of study to detect the difference b
etween exposed and non exposed subjects &
this makes the confidence interval of the study wide (e.g. 0.8-3.1) and makes
the study statistically insignificant.
- And if we increase the sample size --> the confidence interval will be tighte
r and the study will be statistically significant.
- Relative risk reduction (RRR)= ARR(control group) - ARR (treatment group)/ AR
R (control group).
. Number needed to harm (NNH):
______________________________
. It is the number of people that must be treated for one adverse event to occu
r (similar to number needed to treat).
. NNH = 1/ Attributable risk.
. Attributable risk = Adverse event rate (treatment group) - Adverse event rate
(control group).
. Adverse event rate = Number of deaths / total number of the group.
. For example: drug X (deaths=60 & living=20) placebo drug (deaths=38 & living=
38).
. Adverse event rate in treatment group = 60/80= 0.75.
. Adverse event rate in placebo group = 38/76= 0.50.
. Attributable risk = 0.75 - 0.50 = 0.25.
. NNH 1/0.25 = 4.
. SELECTION BIAS:
_________________
. Results from the manner in which the subjects are selected for the study, fro
m the selective losses from the follow-up.
. BERKSONS BIAS:
_________________
. It is a selection bias that can be created by selecting a hospitalized patien
ts as the cotrol group.
. SAMPLE DISTORTION BIAS:
_________________________
. Due to a nonrandom sampling of a population.
. It can lead to a study population having characteristics that differ from the
target population.
. A common example; is that severely ill patients are most likely to enroll in
cancer trials leading to,
results that are not applicable to patients with less advanced cancer
. i.e. the study sample isn't represntative of the target population with respe
ct to the joint distribution of exposure and outcome.
. INFORMATION BIAS:
___________________
. Occurs due to imperfect assessment of the association between the exposure an
d outcome.
. As a result of erorrs in the measurements of exposure and outcome status.
. It can be minimized by using standarized techniques for surveillance and meas
urement of outcomes
as well as trained observers to measure the exposure and outcome.
. MEASUREMENT BIAS:
___________________
. Occurs from poor data collection with inaccurate results.
. LEAD-TIME BIAS:
_________________
. Lead-time bias should be considered while evaluating any screening test.
. It happens when two interventions are compared to diagnose a disease,
and one of them diagnose the disease earlier than the other without an effect
on the outcome (survival).
. What actually happens is that detection of the disease was made at an earlier
point of time,
. But the disease course itself or the prognosis did not change.
. So the screened patients appeared to live longer from the time of diagnosis t
ill the time of death.
N.B.: IN USMLE:
. Think of LEAD BIAS when you see " a new screening test" for poor prognosis di
seases like lung cancer or pancreatic cancer.
. OBSERVER'S BIAS, MEASUREMENT BIAS & ASCERTAIN BIAS:
_____________________________________________________
. when the observer maybe influenced by prior knowledge or details of the study
that can affect the results.
. Referrs to misclassification of an outcome and /or exposure.
. e.g.: labeling diseased subjects as non diseased and vice versa.
. Blinded studies usually avoid this bias by preventing the observer from knowi
ng which treatment or intervention the participants are receiving.
. Blinding can involve patients exclusively or both patients and physicians (do
uble blinding).
. and are related to the design of the study (the scenario will describe how th
e study was desgined).
. RECALL BIAS:
______________
. Occurs when a study participant is affected by prior knowledge to answer a qu
estion.
. Result from inaccurate recall of past exposure by people in the study and app
lies mostly to retrospective studies as case-control study.
. People who have suffered an advirse event (such as having a child with congen
ital anomalies) are more likely to recall previous risk factors than,
people who have not experienced a poor outcome.
. This is more common in case-control studies than in randomized clinical trial
s.
. Referral bias or admission rate:
__________________________________
. Occure when the case and control populations differ due to admission or refer
ral practices.
. For example: a study involving cancer risk factors performed at a hospital sp
ecialized in cancer research,
may enroll cases referred from all over the nation, however hospitalized cont
rol subjects without cancer may come from only the local area.
. Detection Bias:
_________________
. Refers to the fact that a risk factor itself may lead to extensive diagnostic
investigations and increase the probability that a disease is identified.
. For example: patients who smoke may undergo increased imaging surveillance du
e to their smoking status, which would detect more cases of cancer in general.
. RESPONDENT BIAS:
_________________
. Occurs when the outcome of the test is obtained by the patient's response not
by objective diagnostic methods (e.g. migrane headache).
. SUSCEPTABILITY BIAS:
______________________
. Is a type of selection bias where a treatment regimen is selected for a patie
nt based on the severity of their condition,
with out taking into account other possible confounding variables.
. Offline case 20.
. Allocation bias:
__________________
. It may result fro the way that treatment and control groups are assembled.
. It may occur if the subjects are assigned to the study groups of a clinical t
rial in a non random fashion.
. For example in a study group comparing oral NSAIDs and intra-articular cortic
osteroid injections for the treatment of osteoarthritis,
obese patients may be pereferentially assigned to the corticosteroid group (a
ffect the outcome).
. Beta erorr:
_____________
. Reffer to a conclusion that there is no difference between the groups studied
when a difference truely existing.
. It is a random erorr not a systemic erorr (i.e bias).
. CONFOUNDING:
______________
. Occurs when at least part of the exposure-disease relation ship can be explai
ned by another variable (confounding).
. Due to presence of one or more variables associated independently with both t
he exposure and the outcome.
. For example: cigarette smoking can be a aconfounding factor in studying the a
ssociation between maternal alcohol drinking and low birth weight babies.
. As cigarette smoking is independently associated with alcohol consumption and
low borth weight babies.
. Hawthorne effect:
___________________
. It is the tendency of a study population to affect the outcome because these
people are aware that they are being studied.
. This awareness leads to consequent change in behaviour while under observatio
n --> seriously affecting the validity of the study.
. It is usually seen in studies that concern behavioral outcomes or outcomes th
at can be influenced by behavioral changes.
. In order to minimize the Hawthorne effect, the studied subjects can be kept u
naware that they are being studied.
. Pygalion EFFECT:
___________________
. It describes researcher's beleifs in the efficacy of treatment that can poten
tially affect the outcome.
.N.B. all bias are considered as a threat to the validity of a study.
. HOW TO CONTROL BIAS:
_____________________
1- Slection bias can be cotrolled by choosing a representative sample of the po
pulation for the study & acheiving a high rate of follow up.
2- Observer's bias can be cotrolled by blinding technique.
3- Ascertainment bias can be cotrolled by selecting a strict protocol of case a
scertainment.
4- Confounders: can be avoided by 3 methods in the design stage of the study; m
atching restriction and randomization.
- Matching is used in case control study in which select variables that could
be confounders (age, race,..) then,
cases and controls are selected based on the matching variables.
- Randomization is commonly empolyed in clinical trials its purpose is to bala
nce various factors (confounders) that can
influence the estimate of association between the treatment and placebo grou
ps so that the unconfounded effect of the exposure can be isolated.
- A very important advantage of randomizatio when compared to other methods is
the possibility to control,
the known risk factors(as; Age, severity of the disease) as well as unknown
& difficult to measure confounders as
(level of stress, socioeconomic status) and make all confounders evenly dist
ributed between the treatment group and the placebo.
- In clinical trials, randomization is said to be successful, when there is si
milarity in the distribution of
the baseline charachteristics (age, race, prevalence...) between the treatme
nt and placebo groups
i.e the confounders are evenly distributed between the treatment and the pla
cebo groups.
. HAZARD RATIO:
_______________
. It is the ratio of the chance of an event occuring in the treatment arm (drug
or group of interest),
. compared to the chance of that event occuring in the control arm (the other d
rug or group) during a set period of time.
. Hazard ratio = event occuring in the test group / event occuring in the contr
ol group.
. So; the lower the hazaed ratio, the less likely the event will occur in the t
reatment arm.
. The higher the ratio, the more likely the event will occur in the treatment a
rm.
. A ratio close to 1 indicates no significant difference between the 2 groups,
. Example: Hazard ratio of 2 drugs A & B in bleeding complications:
. Hazard ratio for major bleeding = 0.93 i.e. close to 1 means that both groups
are similar to each others in this event.
. Hazard ratio for intracranial bleeding = 0.41 (indicates the lower chance of
drug "A" to cause intracranial bleeding than drug "B").
. Hazard ratio for GIT bleeding = 1.50 (indicates that drug "A" has a higher ch
ance to cause GIT than drug "B").
. Hazard ratio for life threating bleeding = 0.80 (indicates the lower chance o
f drug "A" to cause intracranial bleeding than drug "B").
. Hazard ratio for total bleeding = 0.91 (indicates the slight lower chance of
drug "A" to cause intracranial bleeding than drug "B").
. In case number (11 ofline) you should focus on the baseline value in the case
in take the corresponding hazard ratio in the study then
. decide, which one of them has the greater hazard of hyperkalemia (N.B. Ca cha
nnel blockersaffects GFR).
. You should learn case 19 in offline 2013. :)
. SUCCESSFUL RANDOMIZATION:
___________________________
. In any randomized clinical study, the goal of successful randamization is:
1- to eliminate bias in treatment assignments.
2- Blind the inestigators from the identity of the patients who receive the tr
eatment arm.
3- Minimize the confounding variables.
. Ideal randomization allows for adequate statistical power and should include:
1- equal patient group sizes.
2- Low selection bias.
3- Low propability of confounding variables.
. A listing of the base line characteristics of the patients in each armwould de
monstrate,
if the two arms had patients with similar characteristics and would insure the
proper randomization occured in the study
. Two SAMPLE "T test":
______________________
. It is commonly used to compare two means not proportions.
. The basic requirements needed to perform this test are:
--> the two mean values - the sample variances - the sample size.
. "T test" is then done to obtain the "P" value.
. If the "P" value is less than 0.005 --> the null hypothesis (that there is no
difference between the two groups) is rejected,
and the two means are assumed to be statistically different.
. If the "P" value is large --> the Null hypothesis is retained.
. TWO SAMPLE "Z test" :
_______________________
. Also can be used to compare two means, but
. Population (not sample) variances are employed in the calculations.
. Because the population variances are not usually known --> this test has limi
ted applicability.
. ANOVA test:
_____________
. I.e. Analysis of variances (ANOVA).
. Used to compare two or more means (determine whether there are significant di
fferences between the means of 2 or more independent groups.
.
. e.g. ANOVA can be used to assess for difference in mean blood pressure among
three samples of populations,
grouped by exercis status (never exercis, exercis occasionally and exercis fr
equently).
. Chi Square test:
__________________
. Used to test the association between two categoral variables.
. By compare proportions (of categorized outcome, e.g. high or low ) then prese
nted with the exposure (present or not present).
. A 22 table may be used (high or low outcome) and (exposed & non exposed) to co
mpare the observed values to the expected values.
. If the difference between the observed and expected values is large, this mea
ns there is association between the exposure and the outcome.
. For example: it is used to determine if the distribution of gender and smokin
g status is random or
if there is difference between the sexes regarding smoking status.
. META-ANALYSIS:
________________
. Is an epidemiologic meathod for pooling of the data from several studies to d
o an analysis having a relatively big statistical power.
. For example: individual studies assessing the effects of aspirin on certain c
ardiovascular events may be inconclusive,
. However analysis of data compiled from multiple clinical trials may reveale a
significant benefite.
. Multiple linear regression:
______________________________
. Is a method used to model the linear relationship between a dependent variabl
e and 2 or more non dependent variables.
. For example this test could be used to quantify the effects of alcohol use, t
obacco smoking and charred food consumption,
on the incidence of gastric ulcer.
. Pearson correlation coefficient:
__________________________________
. It is a measure of the strength and direction of a linear relationship betwee
n 2 variables.
. For example, a study may report a correlation coefficient describing the asso
ciation between hemoglobin A1c level and average blood glucose level.
. FACTORIAL DESIGN STUDY:
_______________________
. Involves two or more expermintal intervensions, each with two or more variabl
es that are studied independtly.
. For example:
. A study uses 3 different interventions beta blocker (metoprolol), calc. chann
el blocker (amlodipine) or ACEIs (ramipril)
with to two different variable bl pr. endpoints (102-107 mmHg or < 92 mmHg).
# Patient Randomization:
------------------------
1) ACEIs:
- higher bp goal - Lower bp goal.
2) Beta blocker:
- higher bp goal - Lower bp goal.
3) Ca channel blocker:
- higher bp goal - Lower bp goal.
. IN CASE OF NORMAL DISTRIBUTION:
______________________________
. The normal distribution is symetrical and bell shaped.
. All measures of central tendancy are equal i.e mean = median = mode.
. The degree of dispersion from the mean is determined by the standard deviatio
n.
. 68% of data --> within 1 Standard deviation from the mean ( mean +/- 1 SD).
. 95% of data --> within 2 standard deviation from the mean (mean +/- 2 SD).
. 99.7% of data --> within 3 standard deviation from the mean (mean +/- 3 SD).
N.B.:
- In contrary to normal distribution curve, most of data in real world statisti
cal analysis have asymeterical distributions:
1) Positivel skewed curve:
--------------------------
- Smaller numbers predominate in the dataset.
- The long slop of the curve "the tail" extends in the positive direction.
- The mean is the most shifted to the positive direction followed be the media
n then the mode.
- So the mean is greater than the median.
- In strongly skewed distributions, the median is a better measure for centera
l tendency than the mean.
2) Negatively sekewed curve:
----------------------------
- Larger numbers predominate in the dataset.
- The long slop of the curve "the tail" extends in the negative direction.
- The mean is the most shifted to the negative direction followed by the media
n then the mode.
- So the mode > the median > the mean (i.e the mean is the smallest).
- In strongly skewed distributions, the median is a better measure for centera
l tendency than the mean.
. SENSITIVITY:
______________
. Sensitivity --> the proportion of true +ve cases among all diseased cases (Se
nsitivity = true +ve by the test/all patients that are actually diseased).
. Indicates the ability of a test to detect those patient with disease.
. A higher sensitivity --> the higher the test detect patient with the disease
--> decrease false negatives.
. Screening tests (especially for diseases with severe sequally) should have a
high sensitivity.

. SPECIFICITY:
______________
. Specificity --> the proportion of true -ve cases among all non diseased cases
(Specificity = true -ve by the test/all patients that are actully free).
. Is a measure of the true negative rate and indicates how will a test can rule
out a given condition (exclude those without the disease).
. The higher the specificity the more likely that most healthy patients will ha
ve a -ve test results.
. The higher the specificity --> the less likely the false +ves.

. They are fixed values that are not vary with the pre-test probability of a di
sease or with th prevalence of the disease.
. The ideal diagnostic test should have high sensitivity and specificity.
N.B.:
- Raising the cutoff point of a diagnostic test --> decrease it's sensitivity b
ut increase it's specificity.
- Lowering the cutoff point of a diagnostic test --> increase it's sensitivity
but decrease it's specificity.

. Exposure Odds ratio: draw the 22 table (a,b,c,d)
______________________
. Is the measure of association in case control study.
. It compares the odds of exposure in cases to the odds of exposure in control.
. OR = (ad)/(bc).
. It is not the same as relative risk.
. RR can be calculated in follow up studies by comparing the risk of exposed in
dividuals to the risk of unexposed individuals.
. RR = [a/(a+b)]/[c/(c=d)].
. Direct calculation of RR in case-control study is not possible, because the s
tudy design doesn't include following peoples overtime.
. But sometimes the RR canbe approximately equal to the odd's ratio.
. If the prevelance of the disease is low --> the odd's ratio approximates the
Relative risk (RR).
. This is called ( the rare disease assumption).
. Increasing the sample size will decrease the "P" value of the odd's ratio and
make the confidence interval tighter.
.N.B:
- Attributable risk percent (ARP): represents the excess risk in a population t
hat can be attributed to the exposure to a particular risk factor.
. It can be calculated be subtracting the risk in the unexposed population (bas
line risk) from the risk from the exposed population,
and dividing the results by the risk in the exposed population.
. ARP = (Risk in exposed - Risk in nonexposed)/Risk in exposed.
or
. ARP can be calculated from the relative risk as follow:
. ARP = (RR-1)/RR
. Pre and post-test Probabilities (+ve perdictive value (PPV) & -ve predictive v
alue:
________________________________________________________________________________
____
A. Positive predictive value (ppv) test:
----------------------------------------
. describes the probability of having the disease if the test result is +ve,
(if the patient has a +ve test result, what is the liklehood that he actually
has a disease).
. The post-test probability of having the disease is directly related to the PP
v.
. If the PPV is 25% i.e low, consequently if the test result is positive, then
the post-test probability of having the disease is low.
. The post-test probability is also dependent on the sensitivity, specificty an
d pre-test probability of having the disease.
B. Negative predictive value (NPV) test:
----------------------------------------
. describes the probability of not having the disease if the test result is -ve
.
. NPV will vary with the pre-test probability of a disease (important) i.e,
. A patient with high probability of having a disease will have a low NPV.
. And a patient with a low probability of having a disease will have a high NPV
.
. If the NPV is 96 % this means that if the test result is -ve, the chances of
the patient to not have the disease is high (96%).
. And the chances of the patient to have the disease is low (100 - 96 = 4%).
Example:
--------
1- BREAST CANCER & FNA test results:
. a patient of a high pre-test probability for having the disease (1st degree r
elative having breast cancer or age > 40 ys), has a low NPV.
. a patient of a low pre-test probability for having breast cancer (less than 4
0 ys old), has a high NPV.
2- HIV & ELISA test results:
. A patient who belongs to a high risk group e.g. (multiple sexual partners, us
e no condoms, IV drug abuse)
--> has a high pre-test probability of having AIDS --> so he will have a low N
PV.
. On the other hand a patient who belongs to a low risk group (one sexual partn
er, using condom and no IV drug abuse)
--> has a low pre-test probability of having AIDS --> so has a high NPV.
NOTE:
----
. The prevalence of the disease is directly related to the pre-test probability
of having the disease (PPV) & inversely related to
the pre-test probability of not having the disease (NPV), so increased prevale
nce --> low NPV but high PPV and vice versa.
. Sensitivity and specificity are not affected by the prevalence of the disease
and so the likehood ratio positive i.e sensitivity (1-specificity),
as it depends on sensitivity and specificity.
N.B.:
----
. If the test result is -ve , the probability of the patient to have the diseas
e = 1 - NPV.
. Cases and diagnostic tests tha are high yield USMLE questions in probabilitie
s:
- coronary artery disease and ECG stress test.
- Pulmonary embolism and ventilation-perfusion scanning.
- Prostate cancer and serum PSA level.

. VALIDITY OF TEST = Acurracy:
______________________________
. Represents the appropriatness of the test (i.e. the test ability to measures
what is supposed to be measured).
. In order to determine the validity of a test, the results are compared to tho
se obtained from the gold standerd test.
. It doesn't depend on the pre-test probability of the disease.
N.B.: Also sensitivity and specificity of a test compare its results to the resu
lts obtained by the gold standard test
. RELIABILITY:
______________
. Test-retest reliability.
. A reliable test is reproducible; gives similar or very close results on repea
t measurements.
. Reliability is quantified in terms of Coefficient of variation (CV).
. COefficient of variation; is the standar deviation of the set of repeated mea
surements divided by their mean & expressed as a percentage.
. Reliabilty is maximal when random error is minimal.

. Receiver Operating Characteristic (ROC) curve:
________________________________________________
. It emphasizes the importance of choosing the apropriate cutoff value, althoug
h overlapping of normal & abnormal resultes make it difficult.
. Any cutoff point demonstrates a trade-off between SENSITIVITY and 1-SPECIFICI
TY.
. Sensitivity (positivity in disease) --> is the proportion of subjects who hav
e the target condition and gives positive results.
. Sensitivity = TP/(TP + FN).CLINICALLY
. Specificity (Negativity in health) --> is the proportion of subjects without
the target condition and gives negative results.
. Specificity = TN/(TN + FP).CLINICALLY
. ++ Sensitivity --> ++ true +ve & -- false -ve (diagnosed as normal but he is
diseased).
. ++ Sensitivity --> allaw not to miss any diseased patient (not to miss any tr
ue +ve).
. ++ Specificity --> ++ true -ve & -- false +ve (diagnosed as diseased but he i
s normal).
. ROC --> Aiming at decrease false -ve and false +ve results (i.e increase sens
itivity and specificity).
.N.B.:
- In ROC curve: sensitivity = true positive while (1-specificity) = false positi
ve.
. Positive predictive value (ppv) --> is the probability of having the disease
if the test results are +ve.
. PPV = TP/(TP + FP).
. Negative predictive value (NPV) --> is the probability of not having the dise
ase if the test result is -ve.
. NPV = TN/(TN + FN).
. Positive likelihood ratio (LR+) = sensitivity/(1-specificity).
. (LR+) --> is the ratio of the proportion of patients who have the target cond
ition & test positive to,
. the proportion of patients without the target condition & who also test posit
ive.
Negative likelihood ratio (LR-) = (1-specificity)/sensitivity.
. (LR-) --> is the ratio of the proportion of patients who have the target cond
ition who test negative to,
the proportion of patients without the target condition who also test negati
ve.
. ROC curve has 2 lines; vertical line (Y) for sensitivity and horizontal line
(X) for specificity
. Large Y values --> Indicates High sensitivity.
. Small X values --> Indicates High specificity.
. Low cutoff --> Increase sensitivity (better ability to identify patients with
the disease i.e increase true positive),
Although this causes decrease specificity (the test falsely identifies more s
ubjects as diseased also they are not) and vice versa.
. High cutoff --> Decrease sensitivity and Increase specificity.
. Low cutoff --> High Sensitivity --> higher negative predictive value (NPV) --
> decrease false -ve results (Ruling out probability).
. High cutoff --> Higher Specificity --> higher positive predictive value (PPV)
--> decrease false +ve results (Ruling in probability).
N.B: Draw the overlap curve:
. A shift of the ROC curve upwards for a given cutoff indicates increased sensi
tivity and vice versa.
. A shift of the curve to the right for a given cutoff (higher value)indicates
decreased sensitivity and vice versa.
. The curve usually shows that an increase in sensitivity is offset by decrease
in specificity.
. As mentioned before sensitivity= TP/(TP+FN) & specificity= TN/(TN+FP), so dec
reased overlap between the healthy and diseased population curves -->
--> decrease both the number of FP & FN (i.e decreses the dominator) --> thus
increase both sensitivity and specificity (i.e allow for a test with both
higher sensitivity and specificity.
. In overlap curve: moving the cutoff vlaue to the right (higher value) would i
ncrease specificity at the expense of sensitivity, while
moving the cutoff to the left (lower value) would increase sensitivity at the
expense of specificity.
. A cutoff value just outside the overlapping portion would maximize the sensit
ivity (if to the left) or specificity (if to the right) at 100%.
. Both sensitivity and specificity depend on the cutoff value of a given test f
or example:
. Raising the cutoff value makes it more difficult to diagnose the condition i.
e
it makes it harder to obtain +ve results and easier to obtain -ve results -->
this will increase specificity but decrease sensitivity.
. Lowering the cutoff value makes it easier to obtain +ve results and harder to
obtain -ve results,
i.e increase sensitivity and decrease specificity.
. Increase sensitivity --> increase -ve predictive value (NPV) due to (decrease
false -ve results).
. Increase specificity --> increase +ve predictive value (PPV) due to (decrease
false +ve results).
. PERCISION:
___________
. Is the proportion of the true +ve results out of the total number of the true
results of the test (-ve results are not taken into account).
. Percision is equvalent to +ve predictive value i.e. true +ve/all true.
. It is the measure of the random error in the study.
. The study is percised if the results are not scaterred widely, this is reflec
ted by a tight confidence interval.
. So, if the first study has a wider confidence interval than the second study
--> the second study is more percised.
. ACCURACY:
___________
. Is the proportion of the true results (true +ve and true -ve) out of all resu
lts that are predicted by the test.
. The closer the ploted curve approaches the left and top borders of the ROC cu
rve, the more accurate the test.
. Accuracy can also be measured by the total area under the plotted curve on RO
C curve.
. Increase of the total area under the curve --> increases the accuracy of the
test.
.N.B:
. Both accuracy and percision depend upon sensitivity and specificity of the te
st as well as the prevalence of the condition in the population tested.
. Validity and accuracy are measures of systematic errors (bias).
. Accuracy is reduced if the sample doesn't reflect the true value of the param
eter measured.
. Increasing the sample size --> increases the percision of the study, but does
n't affect the accuracy.
. CORRELATION COEFFEICIENT (r):
_______________________________
. It assesses a linear relationship between two variables.
. The nul value for the correlation coefficient is 0 (no association).
. And the range of plausible values is from -1 to 1.
. The sign (mark) of crrelation coefficient indicates a positive or negative as
sociation.
. The closer the value to its margins (-1 or 1), the stronger the association.
. The correlation coefficient shows the strength of association but does not ne
cessarily imply causality (cause of it).
. The association is statistically significant if P value is low.
. Risk:
_______
. It measures the incidence of the disease.
. It is calculated by divide the number of diseased subjects by the number of p
eople at risk or of interest.
. No of diseased/people at risk.
. Prevalence of disease in a population = incidence of the disease / population
.
. MEASURES OF CENTERAL TENDENCY:
________________________________
. Mean --> is the sum of observations divided by the number of observations.
. Mean (X') = E X/N. i.e = sum of obs./ N. of obs.
. Median --> is the middle observation in a series of observations after arrang
ing them in an ascending or descending manner.
. If number of observations is odd --> Median = (n+1)/2.
. If the number of oservations is even --> Median = n/2
. Mode --> is the most frequent occuring value in the data.
. EXAMPLE: 5,6,7,5,10,3
. Mean = (5+6+7+5+10+3)/5 = 36/6 = 6.
. Mode --> 5.

.EX2: 5,6,8,9,11.
. Median = (5+1)/2 = 3. so Median is the 3rd observation --> median = 8.
. EX3: 5,6,8,9
. Median = 4/2 = 2. so median is the 2nd obsrvation.
. Median will be the mean of observations 2&3 --> (6+8)/2 = 7.
.N.B.:
. Range: is a measure of variation (dispersion).
. Range: is the difference between the largest and the smallest values
. Range = lagest value - smallest value
e.g.Range = 9-5 = 4.
.N.B.:
. Average: it is the summation of the total number of observations divided by t
he sample size.
. e.g. in random sample of children the number of episods of UTIs are as follow
(50 child (0), 30 child (1), 10 child (2), 10 child (3)).
. The average number of UTIs episods per year in a child is;
- the number of UTIs episods per years is: (500) + (301) + (102) + (103) = 80 UTIs
episod per year.
- The average number of UTIs episods per year in a child = 80/100 = 0.8 (betwe
en 0 and 1)
- i.e the child experiences less than one attack of UTIs per year.
. Confidence interval (CI):
___________________________
. A 95% confidence interval is the range of values in which we can be 95% confi
dent that the true mean of the underlying population falls in.
. In order to calculate the confidence interval we need to know the (mean, SD,
Z- score and sample size).
_
_
. Standard error of the mean (SEM): is calculated using the formula SEM = SD//n
.
. Notice that the sample size (n) is a part of the calculation.
. Thus the confidence interval (CI) will tighten as the sample size increases.
. The next step is to multiply the SEM with the corresponding z-score.
. For 95% CI, the Z-score is 1.96 (for 99% CI the Z-score is 2.58).
_
_
. The final step is to obtain the confidence limit as shown: Mean +_ 1.96*SD//n
.
. SCATTER PLOTS:
________________
. They are useful for crude analysis of data.
. They can demonstrate the type of association (linear or non linear).
. If a linear association is oresent, the correlation coefficient can be calcul
ated.
. The association is positive (if the outcome increases with the increase in th
e exposure) -> +ve correlation coefficient while
the association is negative (if outcome decreases with the increase in exposu
re) -> -ve correlation coefficient.
. the correlation coefficient in an almost perfect linear association is close
to 1.
. Crude analysis of association using the scatter plots doesn't account for pos
sible confounders.

.N.B:
1- It is very important to consider the natural history of a disease when evalu
ating the effectiveness of a druge in a trial,
e.g. common cold --> natural esolution within one week should be taken in con
sideration while evaluating,
an anti-viral drug used in treatment of common cold.
2- It is difficult to comment on a druge effectiveness, unless a comparison is
made with the control group and
statisical significance is made to know the power of the study.
. NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS:
_____________________________________________
. NULL HYPOTHESIS:
-------------------
. Is always the statement of NO relationship between the exposure and the outco
me.
. To state the null hypothesis correctly you should recognize the study design
first.
. In cross-sectional study: the 2 variables (CRP & cancer colon) are studied at
the same point of time so,
the temporal relationship between the 2 variable can't be evaluated.
. So you can't measure the relationship between the 2 variables --> Null hypoth
esis is better considered.
. ALTERNATIVE HTPOTHESIS:
--------------------------
. It Opposes the Null hypothesis.
. It States that there is a relationship between the exposure and the outcome.
. It is better for studies in which a relationship between the 2 variables is e
xisting to consider the Alternative hypothesis.
. GENERALIZABILITY or EXTERNAL VALIDITY OF A STUDY:
___________________________________________________
. It is the applicability of the obtained results beyond the cohort that was st
udied.
. External validity answer the question "how the generalizabe are the results o
f a study to other populations.
. For example: if the cohort is restricted to middle aged women, the results of
the study are applicable only to middle aged women & not applicable to eldery
men.
================================================================
=========================================================
N.B.: Very high yeild:
1- Smoking cessation the single most effective preventive intervention in almos
t every patient or
(most effective modifier of mortality including aspirin and tight glucose co
ntrol) in nearly every disease.
2- How to caculate:
. Sensitivity = true +ve by the test / (true +ve + false -ve) all patients tha
t are actually diseased.
. True positive = sensitivity (true +ve + false -ve) i.e(N. of patients actual
ly with the disease).
. True negative = (1- sensitivity) (true +ve + false -ve) i.e. (N. of patiets
actually with the disease).
. Specificity = true -ve by the test/ (true -ve + false +ve) all patients tha
t are actully free.
. True negative = specificty (true -ve + false +ve) i.e all patients that are
actully free.
. Statistical power = (1-Beta):
_____________________________
. (1-B): is the probability of rejecting the null hypothesis when it is truly f
alse.
. i.e. it is the probability of finding a true relationship (the probability of
seeing difference when there is one truly existing).
. So if the researchers need to find a difference between a tested drug and the
standard of care if exists, they need to maximiz the power (1-B).
. Power depends on sample size and the difference in outcome between the 2 grou
ps being tested.
. Type II erorr:
_______________
. Occurs when the researchers fail to reject the null hypothesis when the null
hypothesis is really false,
(they say there is no difference when actually there is (one) difference).
. It causes the investigators to miss true relationships.
. An example: a study finding that doesn't affect platelet function when, in fa
ct it does.
. Beta (B): is the probability of committing a type II erorr.
. If (B) is set at 0.2 (20%) i.e there will be a 20% chance to accept the null
hypothesis when it is false -->
the power (1-B) will be 0.8 (8o%) i.e there will be a 80% chance of rejecting
the null hypothesis when it is truly false.
. Type I erorr:
_______________
. Occurs when the researchers reject the null hypothesis when the null hypothes
is is really true,
(they say there is difference when actually there is no difference)..
. i.e the study finds a statistically significant difference between 2 groups w
hen it is actually not existing.
. An example: If a study concluded that hard candy improves heart failure morta
lity, when it doesn't.
. Alpha (a): is the maximum probability of making type I erorr a researcher is
willing to accept.
. It corresponds with the 'P" value or the probability of making a type I erorr
.
. The (a) is typically set at P= 0.05, meaning that the researchers accept a 5%
possibility that the difference preceived as true is actually due to chance.
N.B.: in a,b,c,d table:
- type I erorr = b/(b+d).
- type II erorr = c (a+c).
.N.B:
- There are 4 basic payment methods that exist between health insurance and phy
sicians:
1) Capitation:
______________
. Physicians are paid fixed amount of money per enrollee, not per service (i.e
paid by capitation).
. So they have incentives to contain (decrease) costs per enrollee due to the f
ixed budget allocated for them.
. If many enrollees seek care or there are enrollees need extensive care, physi
cians costs may be greater than their payments.
. So physicians are motivated to provied more preventive care to catch illness
early so patients stay healthier and need fewer tests and procedures as they age
.
2) Free for service (FFS):
__________________________
. Physicians are paid fixed amount of money for every service and diagnostic te
st they provide.
. They face little financial risk and they enticed (tend to) increase the numbe
r of service they provide on each visit,
as well as the number of visit per each patient.
. There is no incentive to avoid costly tests or procedures.
3) Discounted fee for service :
_______________________________
. Discounted FFS works smilarly to FFS except that physicians are reimbursed (r
epay) a discounted amount.
. So physicians paid under this model may be more conservative when ordering te
sts and providing services compared to those paid by FFS,
especially if expensive tests or services are greatly discounted.
4) Salary:
__________
. Physicians are paid a fixed amount and their pay is not tied to number of enr
ollees or services rendered (provided).
. Unless their contracts includ withholds or bonuses, salaried physicians face
no financial risk.
. So they have no financial incentive to change their treatment patterns, eithe
r in service provided or number of follow up visits.
- Capitation is often used in health maintenance organization insurance plans.
- FFS and discount FFS are commonly used in preferred provider organization insu
rance plans.
N.B.:
- A state with a population of 4,000,000 contains 20,000 people who have diseas
e A, a fatal neurodegenerative condition. there are 7,000 new cases of the disea
se,
a year and 1000 deaths attributable to disease A. there are 40,000 deaths per
year from all causes, what is the ....??
1- Incidence of the disease: is the number of new cases of a disease per year di
vided by population at risk.
incidence = 7000 / (4,000,000 - 20,000).
2- The disease specific mortality: is the number of deaths attributable to the d
isease per year divided by the total population.
The disease specific mortality = 1000/4,000,000.
3- The rate of increase of a disease: is the number of new cases per year minus
the number of deaths (or cures) per year divided by the total popeolation.
The rate of increase of a disease = (7000-1000)/4,000,000.
4- The prevalence of a disease: is the number of persons with the disease divide
d by the total population at a specfic point of time.
The prevalence of a disease = 20,000 / 4,000,000.
5- The mortality rate: is the number of deaths per year divided by the total pop
ulation.
The mortality rate = 40,000 / 4,000,000.
.N.B.: when you see it as a graph:
1- An increase in lung cancer incidence and mortality has been observed in women
over the last four decades due to increased cigarette smoking.
2- Breast cancer is the most common non skin cancer among women in USA, but brea
st cancer mortality is comperatively low,
3- Mortality from breast cancer has stayed relatively stable overtime, where as
colon cancer mortality decreased some what over the last decades.
4- Stomach cancer is now uncommon, so it's incidence and mortality have been dra
stically decreased in the last decades.
5- Mortality of ovarian cancer is stabe over time.
6- A part from skin cancer, the most common women cancer are ordered in descendi
ng according to incidence: Breast cancer, Lung cancer then colon cancer.
7- In order of mortality: Lung cancer followed by Breast cancer then colon cance
r.
N.B.:
- case-Fatality rate: is calculated by dividing the fatal cases by the total nu
mber of people with the disease.
- Case-fatality = Number of fatal cases/total number of people with the disease
.
N.B:
- If events are independent, the probability that all events will turn out the
same (e.g. -ve) is the product of the separate probabilities for each event.
- The probability of at least 1 event turnning out differently is given as: 1-
(the probability of all events being the same).
- For example:
A new seriological test for detecting prostate cancer is negative in 95% of p
atients who dont have the disease, if the test is used on 8 blood samples
taken from patients with out prostate cancer, what is the probability of gett
ing at least 1 positive test.
- In this case a 0.95 (95%)probability of giving a true negative result and 0.0
5 (5%) probability of giving false positive result.
8
- To calculate the chance of all 8 tests being negative: probability (all negat
ive)= (0.95).
- you have to to know that the total probability is always equal to 1.0 (100%).
so
- The probabilty that at least 1 test turns out positive is: 8
- Probability (at least 1 positive) = 1-probility (all negative) = 1- (0.95)

Dr. Hisham Elkilany.
_______________________
__

Você também pode gostar