Escolar Documentos
Profissional Documentos
Cultura Documentos
SCIENCE
Operating Systems Lab ( CL205)
Lab Session 08
The /proc file system
The /proc file system
What is Journaling?
A journaling file system is more reliable when it comes to data storage. Journaling file systems
do not necessarily prevent corruption, but they do prevent inconsistency and are much faster
at file system checks than non-journaled file systems. If a power failure happens while you are
saving a file, the save will not complete and you end up with corrupted data and an inconsistent
file system. Instead of actually writing directly to the part of the disk where the file is stored, it
first writes it to another part of the hard drive and notes the necessary changes to a log, then in
the background it goes through each entry to the journal and begins to complete the task, and
when the task is complete, it checks it off on the list. Thus the file system is always in a
consistent state (the file got saved, the journal reports it as not completely saved, or the journal
is inconsistent (but can be rebuilt from the file system)). Some journaling file systems can
prevent corruption as well by writing data twice.
Max
File
Size
Max
Partition
Size
Journaling
Notes
Fat16
2 GB
2 GB
No
Legacy
Fat32
4 GB
8 TB
No
Legacy
NTFS
2 TB
256 TB
Yes
ext2
2 TB
32 TB
No
Legacy
ext3
2 TB
32 TB
Yes
ext4
16 TB
1 EiB
Yes
reiserFS 8 TB
16 TB
Yes
No longer well-maintained.
JFS
4PB
32PB
Yes
(metadata)
XFS
8 EB
8 EB
Yes
(metadata)
GB = Gigabyte (1024 MB) :: TB = Terabyte (1024 GB) :: PB = Petabyte (1024 TB) :: EB = Exabyte
(1024 PB)
Above you'll see a brief comparison of two main attributes of different filesystems, the max file
size and the largest a partition of that data can be.
Of the above file systems the only one you cannot install Linux on is the NTFS. It is not
recommended to install Linux on any type of FAT file system, because FAT does not have any of
the permissions of a true Unix FS.
Another common Windows practice that is not needed in Unix is defragmenting the hard drive.
When NTFS and FAT write files to the hard drive, they don't always keep pieces (known as
blocks) of files together. Therefore, to maintain the performance of the computer, the hard
drive needs to be "defragged" every once in a while. This is unnecessary on Unix File systems
due to the way it was designed. When ext3 was developed, it was coded so that it would keep
blocks of files together or at least near each other.
No true defragmenting tools exist for the ext3 file system, but tools for defragmenting will be
included with the ext4 file system.
What is partitioning?
Usually partitions refer to the physical disks partitions (primary, logical and extended), but it
may seem strange that Linux uses more than one partition on the same disk, even when using
the standard installation procedure, so some explanation is called for.
One of the goals of having different partitions is to achieve higher data security in case of
disaster. By dividing the hard disk in partitions, data can be grouped and separated. When an
accident occurs, only the data in the partition that got the hit will be damaged, while the data
on the other partitions will most likely survive.
This principle dates from the days when Linux didn't have journaled file systems and power
failures might have lead to disaster. The use of partitions remains for security and robustness
reasons, so a breach on one part of the system doesn't automatically mean that the whole
computer is in danger. This is currently the most important reason for partitioning. A simple
example: a user creates a script, a program or a web application that starts filling up the disk. If
the disk contains only one big partition, the entire system will stop functioning if the disk is full.
If the user stores the data on a separate partition, then only that (data) partition will be
affected, while the system partitions and possible other data partitions keep functioning.
Mind that having a journaled file system only provides data security in case of power failure
and sudden disconnection of storage devices. This does not protect your data against bad
blocks and logical errors in the file system. In those cases, you should use a RAID (Redundant
Array of Inexpensive Disks) solution.
There are two kinds of major partitions on a Linux system:
data partition: normal Linux system data, including the root partition containing all the
data to start up and run the system; and
swap partition: expansion of the computer's physical memory, extra memory on hard
disk.
Most systems contain a root partition, one or more data partitions and one or more swap
partitions. Systems in mixed environments may contain partitions for other system data, such
as a partition with a FAT or VFAT file system for MS Windows data.
The standard root partition (indicated with a single forward slash, /) is about 100-500 MB, and
contains the system configuration files, most basic commands and server programs, system
libraries, some temporary space and the home directory of the administrative user. A standard
installation requires about 250 MB for the root partition.
Swap space (indicated with swap) is only accessible for the system itself, and is hidden from
view during normal operation. Swap is the system that ensures, like on normal UNIX systems,
that you can keep on working, whatever happens. On Linux, you will virtually never see
irritating messages like Out of memory, please close some applications first and try again,
because of this extra memory. The swap or virtual memory procedure has long been adopted
by operating systems outside the UNIX world by now.
Using memory on a hard disk is naturally slower than using the real memory chips of a
computer, but having this little extra is a great comfort.
Linux generally counts on having twice the amount of physical memory in the form of swap
space on the hard disk. When installing a system, you have to know how you are going to do
this. An example on a system with 512 MB of RAM:
The last option will give the best results when a lot of I/O is to be expected.
The kernel is on a separate partition as well in many distributions, because it is the most
important file of your system. If this is the case, you will find that you also have a /boot
partition, holding your kernel(s) and accompanying data files.
The rest of the hard disk(s) is generally divided in data partitions, although it may be that all of
the non-system critical data resides on one partition, for example when you perform a standard
workstation installation. When non-critical data is separated on different partitions, it usually
happens following a set pattern:
Once the partitions are made, you can only add more. Changing sizes or properties of existing
partitions is possible but not advisable.
Mount Points
All partitions are attached to the system via a mount point. The mount point defines the place
of a particular data set in the file system. Usually, all partitions are connected through the root
partition. On this partition, which is indicated with the slash (/), directories are created. These
empty directories will be the starting point of the partitions that are attached to them. An
example: given a partition that holds the following directories:
videos/
cd-images/
pictures/
We want to attach this partition in the filesystem in a directory called /opt/media. In order to
do this, the system administrator has to make sure that the directory /opt/media exists on the
system. Preferably, it should be an empty directory. Then, using the mount command, the
administrator can attach the partition to the system. When you look at the content of the
formerly empty directory /opt/media, it will contain the files and directories that are on the
mounted medium (hard disk or partition of a hard disk, CD, DVD, flash card, USB or other
storage device).
During system startup, all the partitions are thus mounted, as described in the file /etc/fstab.
Some partitions are not mounted by default, for instance if they are not constantly connected
to the system, such like the storage used by your digital camera. If well configured, the device
will be mounted as soon as the system notices that it is connected, or it can be user-mountable,
i.e. you don't need to be system administrator to attach and detach the device to and from the
system.
On a running system, information about the partitions and their mount points can be displayed
using the df command (which stands for disk full or disk free). In Linux, df is the GNU version,
and supports the -h or human readable option which greatly improves readability.
The df command only displays information about active non-swap partitions. These can include
partitions from other networked systems, like in the example below where the home
directories are mounted from a file server on the network, a situation often encountered in
corporate environments.
df -h
Filesystem
/dev/hda8
Size
496M
/dev/hda1
/dev/hda5
/dev/hda6
/dev/hda7
fs1:/home
124M
19G
7.0G
3.7G
8.9G
8.4M
15G
5.4G
2.7G
3.7G
109M
2.7G
1.2G
867M
4.7G
8%
85%
81%
77%
44%
/boot
/opt
/usr
/var
/.automount/fs1/root/home
The only information not included in an inode, is the file name and directory. These are stored
in the special directory files. By comparing file names and inode numbers, the system can make
up a tree-structure that the user understands. Users can display inode numbers using the -i
option to ls. The inodes have their own separate space on the disk.
/proc/apmThis file provides information about the state of the Advanced Power
Management (APM) system
/proc/buddyinfoThis file is used primarily for diagnosing memory fragmentation
issues.
/proc/cmdlineThis file shows the parameters passed to the kernel at the time it is
started.
/proc/cpuinfoThis virtual file identifies the type of processor used by your system
/proc/cryptoThis file lists all installed cryptographic ciphers used by the Linux kernel,
including additional details for each.
/proc/devicesThis file displays the various character and block devices currently
configured
/proc/dmaThis file contains a list of the registered ISA DMA channels in use.
/proc/execdomainsThis file lists the execution domains currently supported by the
Linux kernel, along with the range of personalities they support.
/proc/fbThis file contains a list of frame buffer devices, with the frame buffer device
number and the driver that controls it.
/proc/filesystemsThis file displays a list of the file system types currently supported
by the kernel.
/proc/interruptsThis file records the number of interrupts per IRQ on the x86
architecture.
/proc/iomemThis file shows you the current map of the system's memory for each
physical device
/proc/ioportsThe output of /proc/ioports provides a list of currently registered port
regions used for input or output communication with a device.
/proc/kcoreThis file represents the physical memory of the system and is stored in
the core file format.
/proc/kmsgThis file is used to hold messages generated by the kernel.
/proc/loadavgThis file provides a look at the load average in regard to both the CPU
and IO over time, as well as additional data used by uptime and other commands.
/proc/locksThis file displays the files currently locked by the kernel.
/proc/mdstatThis file contains the current information for multiple-disk, RAID
configurations.
/proc/meminfoThis is one of the more commonly used files in the /proc/ directory, as
it reports a large amount of valuable information about the systems RAM usage.
Directories in /proc
cat /proc/version
this is the view of the virtual sequence file dynamically created in RAM on your demand of view
static int
hz_open(struct inode *inode, struct file *file)
{
return single_open(file, hz_show, NULL);
}
= THIS_MODULE,
.open
= hz_open,
.read
= seq_read,
.llseek
= seq_lseek,
.release
= single_release,
};
module_init(hz_init);
module_exit(hz_exit);
MODULE_LICENSE("GPL");
4. gedit Makefeile
obj-m += jiff.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
5. make
6. insmod jiff.ko
7. Note: add the appropriate Makefile, compile and load this module, and check the contents of the
file /proc/hz. On a stock Ubuntu desktop system, you should expect to see the value 250
The contents lists here are maintained using the sequences defined in seq_file.h in the form of
a structure written as:
struct seq_operations {
void * (*start) (struct seq_file *m, loff_t *pos);
void (*stop) (struct seq_file *m, void *v);
void * (*next) (struct seq_file *m, void *v, loff_t *pos);
int (*show) (struct seq_file *m, void *v);
};
The sequence uses the following two systems defined attributes to access elements in the list:
void *v it represents the address of the current element in the list, by default address
type is void*, since data at each element is not pre-decided, the pointer is not of any
particular data type, so you need to type cast if you want to access the value at that
address
loff_t *pos it represents the loop variable, the iterator that is initialized by system to
zero, but the increment is duty of programmer.
start makes any necessary allocations and assignments as needed by the sequence,
and returns the address to the first element in the list
stop frees any assignments and allocations on finishing the task by checking the value
of v
next this function takes the responsibility of setting the value of v and pos to the next
element in the list
show prints the current element in the list
static int
hz_show(struct seq_file *m, void *v)
{
seq_printf(m, "%d\n", HZ);
return 0;
}
As you can tell, the purpose of your "show" routine is to take, as the first argument, a pointer to a sequence file
structure, and use seq_printf() to write to it as much data formatted any way you want, then return zero to show
success.
You can see that your show routine also has a second parameter of type void* but that has no relevance for us so
you can ignore it for now. Just don't use it for anything with simple examples like this.
static int
hz_open(struct inode *inode, struct file *file)
{
return single_open(file, hz_show, NULL);
}
Again, we'll skip some of the more obscure details and possibilities, and you can see how simple it is to initially
"open" your proc file -- ignore that first inode parameter for now, and pass a third parameter of NULL to
the single_open() routine.
3.The file_operations structure
Moving on, we now have the structure of "file operations" that you need to define and this will take a bit more
explanation:
= THIS_MODULE,
.open
= hz_open,
.read
= seq_read,
.llseek
= seq_lseek,
.release
= single_release,
};
The file_operations structure is declared in the header file fs.h and represents a collection of file operations
that are defined for any type of file, not just sequence files. But since sequence files are such a simple type of file,
you're free to ignore most of the fields in that structure when you define an example for a sequence file.
In fact, if you look closely, you really need to define only the open member of the structure, since all the other
members can be set to the default values associated with sequence files. (In fact, in the above case, you can
probably do away with the llseek value, since your example is so short, you really shouldn't be planning to do any
seeking on your file. In short, your definition of that file_operations structure is pretty well self-evident.)
<linux/fs.h>
<linux/init.h>
<linux/kernel.h>
<linux/mm.h>
<linux/hugetlb.h>
<linux/mman.h>
<linux/mmzone.h>
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
<linux/proc_fs.h>
<linux/quicklist.h>
<linux/seq_file.h>
<linux/swap.h>
<linux/vmstat.h>
<linux/atomic.h>
<linux/vmalloc.h>
<asm/page.h>
<asm/pgtable.h>
"internal.h"
/*
* Estimate the amount of memory available for userspace
allocations,
* without causing swapping.
*
* Free memory cannot be taken below the low watermark, before
the
* system starts swapping.
*/
available = i.freeram - wmark_low;
/*
* Not all the page cache can be freed, otherwise the system will
* start swapping. Assume at least half of the page cache, or the
* low watermark worth of cache, needs to stay.
*/
pagecache = pages[LRU_ACTIVE_FILE] + pages[LRU_INACTIVE_FILE];
pagecache -= min(pagecache / 2, wmark_low);
available += pagecache;
/*
* Part of the reclaimable swap consists of items that are in
use,
* and cannot be freed. Cap this estimate at the low watermark.
*/
available += global_page_state(NR_SLAB_RECLAIMABLE) min(global_page_state(NR_SLAB_RECLAIMABLE) / 2,
wmark_low);
if (available < 0)
available = 0;
/*
* Tagged format, for
*/
seq_printf(m,
"MemTotal:
"MemFree:
"MemAvailable:
"Buffers:
"Cached:
"SwapCached:
"Active:
"Inactive:
"Active(anon):
%8lu
"Inactive(anon):
"Active(file):
"Inactive(file):
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
kB\n"
%8lu kB\n"
%8lu kB\n"
%8lu kB\n"
"Unevictable:
%8lu kB\n"
"Mlocked:
%8lu kB\n"
#ifdef CONFIG_HIGHMEM
"HighTotal:
%8lu kB\n"
"HighFree:
%8lu kB\n"
"LowTotal:
%8lu kB\n"
"LowFree:
%8lu kB\n"
#endif
#ifndef CONFIG_MMU
"MmapCopy:
%8lu kB\n"
#endif
"SwapTotal:
%8lu kB\n"
"SwapFree:
%8lu kB\n"
"Dirty:
%8lu kB\n"
"Writeback:
%8lu kB\n"
"AnonPages:
%8lu kB\n"
"Mapped:
%8lu kB\n"
"Shmem:
%8lu kB\n"
"Slab:
%8lu kB\n"
"SReclaimable:
%8lu kB\n"
"SUnreclaim:
%8lu kB\n"
"KernelStack:
%8lu kB\n"
"PageTables:
%8lu kB\n"
#ifdef CONFIG_QUICKLIST
"Quicklists:
%8lu kB\n"
#endif
"NFS_Unstable:
%8lu kB\n"
"Bounce:
%8lu kB\n"
"WritebackTmp:
%8lu kB\n"
"CommitLimit:
%8lu kB\n"
"Committed_AS:
%8lu kB\n"
"VmallocTotal:
%8lu kB\n"
"VmallocUsed:
%8lu kB\n"
"VmallocChunk:
%8lu kB\n"
#ifdef CONFIG_MEMORY_FAILURE
"HardwareCorrupted: %5lu kB\n"
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
"AnonHugePages: %8lu kB\n"
#endif
,
K(i.totalram),
K(i.freeram),
K(available),
K(i.bufferram),
K(cached),
K(total_swapcache_pages()),
K(pages[LRU_ACTIVE_ANON]
+ pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_ANON] + pages[LRU_INACTIVE_FILE]),
K(pages[LRU_ACTIVE_ANON]),
K(pages[LRU_INACTIVE_ANON]),
K(pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_FILE]),
K(pages[LRU_UNEVICTABLE]),
K(global_page_state(NR_MLOCK)),
#ifdef CONFIG_HIGHMEM
K(i.totalhigh),
K(i.freehigh),
K(i.totalram-i.totalhigh),
K(i.freeram-i.freehigh),
#endif
#ifndef CONFIG_MMU
K((unsigned long) atomic_long_read(&mmap_pages_allocated)),
#endif
K(i.totalswap),
K(i.freeswap),
K(global_page_state(NR_FILE_DIRTY)),
K(global_page_state(NR_WRITEBACK)),
K(global_page_state(NR_ANON_PAGES)),
K(global_page_state(NR_FILE_MAPPED)),
K(global_page_state(NR_SHMEM)),
K(global_page_state(NR_SLAB_RECLAIMABLE) +
global_page_state(NR_SLAB_UNRECLAIMABLE)),
K(global_page_state(NR_SLAB_RECLAIMABLE)),
K(global_page_state(NR_SLAB_UNRECLAIMABLE)),
global_page_state(NR_KERNEL_STACK) * THREAD_SIZE / 1024,
K(global_page_state(NR_PAGETABLE)),
#ifdef CONFIG_QUICKLIST
K(quicklist_total_size()),
#endif
K(global_page_state(NR_UNSTABLE_NFS)),
K(global_page_state(NR_BOUNCE)),
K(global_page_state(NR_WRITEBACK_TEMP)),
K(vm_commit_limit()),
K(committed),
(unsigned long)VMALLOC_TOTAL >> 10,
vmi.used >> 10,
vmi.largest_chunk >> 10
#ifdef CONFIG_MEMORY_FAILURE
,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
HPAGE_PMD_NR)
#endif
);
hugetlb_report_meminfo(m);
arch_report_meminfo(m);
return 0;
#undef K
}
static int meminfo_proc_open(struct inode *inode, struct file *file)
{
return single_open(file, meminfo_proc_show, NULL);
}
static const struct file_operations meminfo_proc_fops = {
.open
= meminfo_proc_open,
.read
= seq_read,
.llseek
= seq_lseek,
.release
= single_release,
};
static int __init proc_meminfo_init(void)
{
proc_create("meminfo", 0, NULL, &meminfo_proc_fops);
return 0;
}
fs_initcall(proc_meminfo_init);
Lab Task:
Q. Create a sequence file to generate even numbers?
{Hint: Reference jiffies code}