DataRockie

เริ่มเรียน Data Science ง่ายๆด้วยตัวเอง 🍟

รู้จัก Central Limit Theorem ขุมพลังแห่งโลกสถิติ

February 6, 2026

บทความนี้มาทำความรู้จักกับทฤษฏีที่ยิ่งใหญ่ (อันดับต้นๆ) ในโลกสถิติ – Central Limit Theorem การสร้าง sampling distribution และความหมายของ standard error

Table Of Contents

The Heart of Statistics
The Most Powerful Theorem
[1] Sampling Distribution
[2] Standard Error
What’s The Point?
Example R Code

The Heart of Statistics

หัวใจของวิชาสถิติคือการสุ่มตัวอย่าง ถ้าเราสุ่มตัวอย่าง (sample) ขนาดใหญ่พอ หน้าตาของกลุ่มตัวอย่างจะเหมือนกับประชากร (population) ที่เราสนใจ โดยการสุ่มต้องเป็นไปอย่างแรนดอม หรือที่นักสถิติเรียกกันว่า random sampling i.e. สุ่มโดยใช้หลักความน่าจะเป็น ประชากรที่เราสนใจทั้งหมดมีโอกาสในการถูกสุ่มเท่ากัน

การสุ่มตัวอย่างที่ดีที่สุดในทางสถิติคือ random sampling

การสรุปผลจาก sample กลับไปที่ population นักสถิติเรียกว่าการทำ “Inference” และการทำ inference จะเกิดขึ้นไม่ได้เลยถ้าเราไม่รู้จัก [su_label type=”important”]Central Limit Theorem[/su_label] – The most powerful theorem in statistics

The Most Powerful Theorem

Central Limit Theorem (CLT) บอกว่า ถ้าเราสุ่มตัวอย่างซ้ำเรื่อยๆ และบันทึกค่าสถิติที่ได้จากการสุ่มตัวอย่างแต่ละครั้ง เช่น ค่าเฉลี่ย (mean) หรือค่าร้อยละ (%) และนำค่าเหล่านั้นมาสร้างกราฟ histogram

กราฟจะออกมาเป็น normal distribution เสมอ 😛 โดยเงื่อนไขสองข้อที่จะทำให้ CLT เป็นจริงคือ [1] การสุ่มตัวอย่างต้องเป็นไปอย่าง random และ [2] กลุ่มตัวอย่างต้องมีขนาดใหญ่พอ n >= 30

ที่มา – https://www.simplypsychology.org/z-score.html

and that’s the POINT! ถ้าเรารู้ว่าการกระจายตัวของ scores เป็นแบบ normal เราจะสามารถพูดเรื่องความน่าจะเป็นเกี่ยวกับ scores นั้นได้แบบนี้ “~95% ของ scores จะวิ่งอยู่ในช่วง +/- 2SD จากค่าเฉลี่ยตรงกลาง”

ตัวเลขสามตัวที่ทุกคนควรจำให้ได้เกี่ยวกับ normal distribution คือ 68.2, 95.4 และ 99.7

+/- 1SD	68.2% ของ scores จะวิ่งอยู่ระหว่าง +/- 1SD จาก mean score
+/- 2SD	95.4% ของ scores จะวิ่งอยู่ระหว่าง +/- 2SD จาก mean score
+/- 3SD	99.7% ของ scores จะวิ่งอยู่ระหว่าง +/- 3SD จาก mean score

และสอง elements สำคัญของ central limit theorem คือ Sampling Distribution และ Standard Error

[1] Sampling Distribution

Histogram ที่ได้จากการทำ repeated samples (central limit theorem) มีชื่อทางการในโลกสถิติว่า “Sampling Distribution” ถ้าเราเพิ่มจำนวน sample size ในการสุ่มตัวอย่างแต่ละครั้ง sampling distribution จะมีช่วงแคบลงเรื่อยๆ แปลว่าค่า estimate ของเราจะมีความแม่นยำขึ้น i.e. ค่าเฉลี่ยของ sampling distribution จะมีค่าเข้าใกล้ค่าเฉลี่ยของ population (หรือที่เรียกกันสั้นๆว่า mu)

สังเกต histogram สองรูปด้านล่าง รูปซ้ายเราสุ่ม n=30 และรูปขวา n=1000 และทำ repeated samples ทั้งหมด 1,000 รอบ (note – การสุ่มตัวอย่างของ CLT เป็นแบบ sampling without replacement)

central limit theorem ทำให้ histogram ทั้งสองรูป approximately normal

[2] Standard Error

Standard error คือส่วนเบี่ยงเบนมาตรฐานของ sampling distribution (means) คิดง่ายๆจากสูตร se = sd/ sqrt(n) โดย sd คือส่วนเบี่ยงเบนมาตรฐานของ sample ที่เราสุ่มมาจากประชากร

Note – ความแตกต่างของ sd และ se คือ sd วัดการกระจายตัวของ sample distribution แต่ se วัดการกระจายตัวของ sampling distribution ที่ได้มาจากการทำ repeated samples

What’s The Point?

Implication ของ CLT คือในชีวิตจริงไม่มีใครว่างทำ repeated samples เป็นร้อยเป็นพันครั้ง แต่นักสถิติสุ่มตัวอย่างด้วย random sampling แค่ครั้งเดียวและ apply CLT สร้าง sampling distribution ไปครอบ sample mean คำนวณค่า se และสร้าง interval estimate ที่นักสถิติเรียกว่า Confidence Interval

ขั้นตอนการสร้างช่วงความเชื่อมั่น 95% CI แค่เรารู้ mean, sd และ n ของกลุ่มตัวอย่าง (sample)

คำนวณ se จากสูตร se = sd/ sqrt(n)
z = 1.96 สำหรับระดับความมั่นใจ 95%
คำนวณ margin error จากสูตร me = se*z
สร้างช่วงความเชื่อมั่น [mean – me, mean + me]

Alright! ส่วนตัวแอดคิดว่าถ้าอยากเรียนสถิติให้รู้เรื่อง อย่างแรกต้องเข้าใจ central limit theorem ก่อนเลย รวมถึงความหมายของ sampling distribution และ standard error บทความหน้าเราจะเขียนอธิบายเรื่อง confidence interval ให้อ่านแบบเต็มๆอีกครั้ง (hint – CI ใช้ทดสอบสมมติฐานทางสถิติได้ด้วย!)

Example R Code

ตัวอย่างการทำ simulation ง่ายๆเพื่อทดสอบทฤษฏี central limit theorem ใน R เราใช้ฟังชั่น replicate() เพื่อสุ่มตัวอย่างซ้ำ 1000 รอบ (line 9-11)

	## look at dataframe diamonds
	library(tidyverse)
	glimpse(diamonds)

	## assume this is all prices (population)
	pop_prices <- diamonds$price

	## sample n=30, repeats 1000 times
	sam_prices <- replicate(
	1000, mean(sample(pop_prices, size = 30))
	)

	## check if the mean of sample means is close to the population mean
	summary(pop_prices)
	summary(sam_prices)

	## look at the histogram
	hist(sam_prices, xlim = c(1000, 7000))

	## let's change line 10, size=1000. Look at the histogram again.

view raw central_limit_theorem.R hosted with ❤ by GitHub

11 responses to “รู้จัก Central Limit Theorem ขุมพลังแห่งโลกสถิติ”

Nat Tueku

July 15, 2019

ขอบคุณค่ะแอด

Reply
1. Kasidis Satangmongkol
  
  July 15, 2019
  
  ขอบคุณที่ติดตามจ้า
  
  Reply
Chatupol Tigertiger

July 15, 2019

ขอบบคุณครับ รออ่าน Confidence Interval อยู่นะครับ

Reply
1. Kasidis Satangmongkol
  
  July 16, 2019
  
  ขอบคุณครับ เด๋วแอดเขียนต่อเลย 🙂
  
  Reply
ทดสอบสมมติฐานทางสถิติด้วย Confidence Interval ไม่ง้อ p-value | DataRockie

July 17, 2019

[…] ลองอ่านบทความของเราได้ที่นี่ – Mind […]

Reply
Anonymous

July 17, 2019

ขอบคุณครับพี่ 🙂

Reply
Anonymous

July 18, 2019

Thanks krub

Reply
Anonymous

July 22, 2019

ขอบคุณครับ

Reply
ppmild

August 22, 2019

ทำไมต้องมีกลุ่มตัวอย่างแค่ 30 อ่ะคะ

Reply
1. Kasidis Satangmongkol
  
  August 22, 2019
  
  ไม่ต้อง 30 ก็ได้ครับ จริงๆแล้วยิ่งเยอะยิ่งดีครับ (30 เหมือนเป็นตัวเลขที่ทฤษฏีบอกต่อกันมา)
  
  Reply
ทดสอบสมมติฐานทางสถิติด้วย Confidence Interval ไม่ง้อ p-value – DataRockie

July 13, 2021

[…] เรา assume ว่าทุกคนรู้แล้วว่า Central Limit Theorem คืออะไร ถ้ายังไม่ชัวร์ […]

Reply