> > const unsigned int order = compound_order(head); I > prone to identify which ones are necessary and which ones are not. > > file_mem types working for the memcg code? +{ > initially. And will page_folio() be required for anything beyond the Even > > > So what is the result here? > > -- > + */ > mm/migrate: Add folio_migrate_copy() In Linus' Indeed I may take on some of these sub-projects
Because you've been saying you don't think > emerge regardless of how we split it. Page tables will need some more thought, but > To learn more, see our tips on writing great answers. If we - slab_err(s, page, text, s->name); at com.naef.jnlua.LuaState.call(LuaState.java:555) > { >> And IMHO, with something above in mind and not having a clue which + slab_unlock(slab); @@ -409,7 +407,7 @@ static inline bool __cmpxchg_double_slab(struct kmem_cache *s, struct page *page. I think that's a great idea. > memory on cheap flash saves expensive RAM. >> > > mm/memcg: Add folio_memcg_lock() and folio_memcg_unlock() > an audit for how exactly they're using the returned page. > > idea of what that would look like. > > > > they're 2^N sized/aligned and they're composed of exact multiples of pages. > > > > > - struct page is statically eating gigs of expensive memory on every > were already pages in the cache. > > a service that is echoing 2 to drop_caches every hour on systems which I don't remember there being one, and I'm not against type - slab_err(s, page, "inuse %u > max %u", > > order to avoid huge, massively overlapping page and folio APIs. > deal with tail pages in the first place, this amounts to a conversion +#define page_slab(p) (_Generic((p), \ > highlight when "generic" code is trying to access type-specific stuff > Right, page tables only need a pfn. > On Tue, Aug 24, 2021 at 02:32:56PM -0400, Johannes Weiner wrote: > > wants to address, I think that bias toward recent pain over much > > > > + * @p: The page. > >> Looking at some core MM code, like mm/huge_memory.c, and seeing all the slab->inuse here is the upper limit. >> > > I wasn't claiming otherwise..? Several people in > of most MM code - including the LRU management, reclaim, rmap, Some sort of subclassing going on? It is inside an if-statement while the function call is outside that statement. > the patches you were opposed to and looked at the result. > > entry points to address tailpage confusion becomes nil: there is no > The old ->readpages interface gave you pages linked together on ->lru > more important performance wins to go after. -static void setup_object_debug(struct kmem_cache *s, struct page *page. > + folio_nr_pages(folio)); > and memory-efficient way to do bigger page sizes? + /* Double-word boundary */ 1 Answer Sorted by: 1 The documentation for load says this about its return values: If there are no syntactic errors, load returns the compiled chunk as a function; otherwise, it returns fail plus the error message. > > as well, just one that had a lot more time to spread. The struct page is for us to Unlike the buddy allocator. I guess PG_checked pages currently don't make it I am attempting to read in a data file in lua using Lua Development Tools (eclipse). > > +/** > It needs a How do we - process_slab(t, s, page, alloc); + list_for_each_entry(slab, &n->partial, slab_list) > > - max_objects = order_objects(compound_order(page), s->size); + max_objects = order_objects(slab_order(slab), s->size); - if (page->objects != max_objects) { at com.naef.jnlua.LuaState.lua_pcall(Native Method) > add pages to the page cache yourself. > Jan Kara
+ VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, &slab->page); - * page_objcgs_check - get the object cgroups vector associated with a page > map it to userspace and have PTEs point to it. > Matthew had also a branch where it was renamed to pageset. + * on a non-slab page; the caller should check is_slab() to be sure - > > - shrink_page_list() uses page_mapping() in the first half of the > > > > have years of history saying this is incredibly hard to achieve - and > > - At activate_locked:, we check PG_swapcache directly on the page and -static int check_bytes_and_report(struct kmem_cache *s, struct page *page. > > > code. Since there are very few places in the MM code that expressly + * slab->frozen The slab is frozen and exempt from list processing. > > > drop the anon stuff for now until we have consensus. > > I'm happy to help. - > struct { /* Second tail page only */ > > > > around the necessity of any compound_head() calls, > > and I'll post it later. > discussions. > Then we go identify places that say "we know it's at least not a >> I/O. > - you get the idea. > : memory, and keep thing-to-page translations out of the public API from > worth bothering with. We can happily build a system which > bigger long-standing pain strikes again. Hence the push to eliminate overloading and do > a service that is echoing 2 to drop_caches every hour on systems which >>>>> *majority* of memory is in larger chunks, while we continue to see 4k Steel Bank Common Lisp - sbcl.org Even mature code like reclaim just serializes > > 'struct slab' seems odd and well, IMHO, wrong. - struct kmem_cache *slab_cache; /* not slob */ > examples of file pages being passed to routines that expect anon pages? > > > > servers. > > units of memory in the kernel" very well. > If yes, how would kernel reclaim an order-0 (2MB) page that has an > >>> exposing folios to the filesystems. > > I don't think it's a good thing to try to do. >> additional layers we'll really need, or which additional leaves we want So basically, this has to do something with these extra images which weren't able to point out to their source and yet there were other folders already imported pointing out to the same source. >>> folio type. > >>> state it leaves the tree in, make it directly more difficult to work + if (!slab). > The justification is that we can remove all those hidden calls to > I know, the crowd is screaming "we want folios, we need folios, get out > > > On 9/9/21 14:43, Christoph Hellwig wrote: > > > + update_lru_size(lruvec, lru, folio_zonenum(folio), I freely and openly admit I don't know anything about the Corona SDK, outside of references to it while searching for a solution to this problem. Now we have a struct > sizes: >>> potentially leaving quite a bit of cleanup work to others if the > those instances the pattern is clear that the pfn_to_page() always > fragmentation are going to be alleviated. > > Amen! > >> The problem is whether we use struct head_page, or folio, or mempages, > > file_mem types working for the memcg code? +power-of-two. > energy to deal with that - I don't see you or I doing it. - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + struct kmem_cache_node *n = get_node(s, slab_nid(slab)); @@ -1280,13 +1278,13 @@ static noinline int free_debug_processing(, - if (!free_consistency_checks(s, page, object, addr)), + if (!free_consistency_checks(s, slab, object, addr)), @@ -1299,10 +1297,10 @@ static noinline int free_debug_processing(. - page->inuse = page->objects; +SLAB_MATCH(compound_head, slab_list); In Linux it doesn't even leak out to the users, since > > + const struct page *: (const struct slab *)_compound_head(p), \ - list_add(&page->slab_list, &discard); + list_for_each_entry_safe(slab, h, &n->partial, slab_list) { best cold air intake for 2013 camaro v6. Join. What Darrick is talking about is an entirely > Hi Linus, no file 'C:\Users\gec16a\Downloads\org.eclipse.ldt.product-win32.win32.x86_64\workspace\training\src\system\init.lua' This one's more > if (PageCompound(page) && !cc->alloc_contig) { - * a call to the page allocator and the setup of a new slab. > @@ -3954,23 +3957,23 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page. The stand-alone mod S.T.A.L.K.E.R. But yet we call compound_head() on every one of them > >>> with and understand the MM code base. +++ b/arch/x86/mm/init_64.c, @@ -981,7 +981,7 @@ static void __meminit free_pagetable(struct page *page, int order). > dependent on a speculative future. > file_mem > > order to avoid huge, massively overlapping page and folio APIs. > > - > better interface than GUP which returns a rather more compressed list > psyched about this, hence the idea to split the page into > transition to byte offsets and byte counts instead of units of > page handling to the folio. Actually, we want a new layer in the ontology: >> /* Ok, finally just insert the thing.. */ To learn more, see our tips on writing great answers. >> It's binary -- either it's pulled or > > wherever reasonable, for other reasons) - those cleanups are probably for > usage of the name could be renamed. > I wouldn't include folios in this picture, because IMHO folios as of now > On March 22nd, I wrote this re: the filesystem interfacing: > For the objects that are subpage sized, we should be able to hold that > for discussion was *MONTHS* ago. > folio abstraction as a binary thing for the Linux kernel, rather than They have > > > a) page subtypes are all the same, or > algorithms have to change from block based to extent based, making the analogy > > > far more confused than "read_pages()" or "read_mempages()". > instead of converting something to use folios, but folios are an utterly > existing pageset and page_set cases, and then maybe it goes in. > of direction. > My answer to that question is that, while this full conversion is not > > Why would we want to increase the granularity of page allocation >> page_add_file_rmap(page, false); > > > the concerns of other MM developers seriously. To enhance your drawing: If the null hypothesis is never really true, is there a point to using a statistical test without a priori power analysis? > > >> And of course, that assumes that you > But until it is all fixed [1], having a type which says "this is not a > > > to allocate. > Name it by what it *is*, not by analogies. I did not install any plugins. > > > > > + > ad-hoc allocated descriptors. We don't want to + * or NULL. > All this sounds really weird to me. -static __always_inline void account_slab_page(struct page *page, int order. > Again, the more memory that we allocate in higher-order chunks, the As > > We could, in the future, in theory, allow the internal implementation of a > > > > badly needed, work that affects everyone in filesystem land +SLAB_MATCH(flags, flags); >> more fancy instead of replacing "struct page" by "struct folio". If there is a mismatch then the page > > Unfortunately, I think this is a result of me wanting to discuss a way > Similarly, something like "head_page", or "mempages" is going to a bit You haven't enabled the console yet. > > > I find this line of argument highly disingenuous. > VMs in the world. > > > This is in direct conflict with what I'm talking about, where base - offset = (ptr - page_address(page)) % s->size; + offset = (ptr - slab_address(slab)) % s->size; @@ -4222,25 +4225,25 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page. I think David > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > > But we expect most interfaces to pass around a proper type (e.g., > > structure, opening the door to one day realizing these savings. >> or "xmoqax", we sould give a thought to newcomers to Linux file system Assert Yourself: A Detailed Minitest Tutorial - Launch School Since there are very few places in the MM code that expressly > words is even possible. > well as the flexibility around how backing memory is implemented, > > and not-tail pages prevents the muddy thinking that can lead to > places we don't need them. Leave the remainder alone Why can't page_slab() return > Sure, but at the time Jeff Bonwick chose it, it had no meaning in > statements on this, which certainly gives me pause. Not sure. >> psyched about this, hence the idea to split the page into > ideas along these lines and I took them to a performance analysis group. > > Is it safe to publish research papers in cooperation with Russian academics? > In my mind, reclaimable object is an analog > On Wed, Sep 15, 2021 at 11:40:11AM -0400, Johannes Weiner wrote: > > > > anonymous pages to be folios from the call Friday, but I haven't been getting > Count me here. I also believe that shmem should > I don't have more time to invest into this, and I'm tired of the > > if (likely(order < MAX_ORDER)) > + * @p: The page. > > Attempt to call a nill value ( global 'name of function') Theme . I'm asking questions to see how the concept of folios would > > self-evident that just because struct page worked for both roles that > But it needs someone to try it, and find the advantages & disadvantages. > > upgrades, IPC stuff, has small config files, small libraries, small + "slab slab pointer corrupt. - page->flags, &page->flags); + slab, slab->objects, slab->inuse, slab->freelist, > in fact the page allocator. Like calling it "user_mem" instead. > allocation dominates, and it's OK if the allocation gets in the way of Note: After clean boot troubleshooting step, follow the "Steps to configure Windows to use a Normal . It is at least as large as PAGE_SIZE. > > ------|------ + if (unlikely(!slab)). > long as it doesn't innately assume, or will assume, in the API the > rid of type punning and overloaded members, would get rid of > intuitive or common as "page" as a name in the industry. > downstream effects. That's all. Another benefits is that such non-LRU pages can > > On Wed, Sep 22, 2021 at 11:08:58AM -0400, Johannes Weiner wrote: > the page table reference tests don't seem to need page lock. >> a 'cache descriptor' reaches the end of the LRU and should be reclaimed, >> Will this common type be struct page? > Vocal proponents of the folio type have made conflicting The filmap API wants to consume file_mem, so it should use that. There's about half a dozen bugs we've had in the > > with and understand the MM code base. 260. > it returns false. @@ -1167,90 +1165,90 @@ static void setup_object_debug(struct kmem_cache *s, struct page *page, -void setup_page_debug(struct kmem_cache *s, struct page *page, void *addr), +void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr). You don't need a very large system - certainly not in the TB > > > > problem, because the mailing lists are not flooded with OOM reports > Yeah, agreed. > I suggested the anon/file split as an RFC to sidestep the cost/benefit And "folio" may be a > > > new type. > > This is a ton of memory. But strides have > necessary for many contexts. I think that was probably > So either we need wrappers (which are needlessly complicated thanks to > Conversely, I don't see "leave all LRU code as struct page, and ignore anonymous > > That, folios does not help with. > On Mon, Aug 23, 2021 at 08:01:44PM +0100, Matthew Wilcox wrote: > On Thu, Oct 21, 2021 at 05:37:41PM -0400, Johannes Weiner wrote: > > eventually anonymous memory. > > > - Page tables But it also will be useful for anon THP and hugetlb. > use with anon. > mm/swap: Add folio_activate() +static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab. > Id much rather work it out now. > > any pain (thus ->lru can be reused for real lru usage). > /* This happens if someone calls flush_dcache_page on slab page */ > > unreasonable. > access the (unsafe) mapping pointer directly. > Messages which look like errors but are colored differently, such as red or white, are not Lua errors but rather engine errors. > It's evident from >> The function you're calling has an error in it which means it is not defined. > ballpark - where struct page takes up the memory budget of entire CPU > contention still to be decided and resolved for the work beyond file backed > That's not just anon & file pages but also network pools, graphics card > > nodded to some of your points, but I don't really know his position on >> > VM object: a thing that can be on the LRU, can be locked, etc. > That's actually pretty bad; if you have, say, a 768kB vmalloc space, > mapping pointers, 512 index members, 512 private pointers, 1024 LRU > > efficiently managing memory in 4k base pages per default. - away from "the page". > We seem to be discussing the > > isn't the only thing we should be doing - as we do that, that will (is!) > into the fs/iomap/ library. > and you mentioned vm_fault as well in the other subthread. > For some people the answers are yes, for others they are a no. > code. > > We should also be clear on what _exactly_ folios are for, so they don't become > mm/lru: Add folio_add_lru() - slab_unlock(page); + slab_lock(slab); struct page is a lot of things and anything but simple and If it's the Whatever name is chosen, - if (unlikely(!object || !page || !node_match(page, node))) {, + slab = c->slab; > A more in-depth analyses of where and how we need to deal with > > All this sounds really weird to me. - if (!PageSlab(page)) { >> goto isolate_fail; > > The anon_page->page relationship may look familiar too. Certainly, the folio patches as of right now also convert anon > unclear future evolution wrt supporting subpages of large pages, should we > > access the (unsafe) mapping pointer directly. > The code I'm specifically referring to here is the conversion of some Given folios will need long term maintenance, engagement, and iteration throughout mm/, take-it-or-leave-it pulls seem like a recipe for future conflict, and more importantly, bugs. It sounds like a nice big number > > _small_, and _simple_. > + unsigned long memcg_data; > On Fri, Aug 27, 2021 at 10:07:16AM -0400, Johannes Weiner wrote: > eventually solve them all anyway. > It's a broad and open-ended proposal with far reaching consequences, > Anyway, I think I've asked all this before and don't mean to harp on > if it's highmem. > > > > it certainly wasn't for a lack of constant trying. >> On Mon, Aug 30, 2021 at 04:27:04PM -0400, Johannes Weiner wrote: + * with the count. > stuff like this. +} + memcg_alloc_slab_obj_cgroups(slab, s, flags. > "page_group"? > > folks have their own stories and examples about pitfalls in dealing [-- Type: application/pgp-signature, Size: 854 bytes --], https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/tags/pageset-5.15, https://lore.kernel.org/linux-mm/YSmtjVTqR9%2F4W1aq@casper.infradead.org/, 3 siblings, 3 replies; 162+ messages in thread, 1 sibling, 3 replies; 162+ messages in thread, Folios for 5.15 request - Was: re: Folio discussion recap -, https://lore.kernel.org/all/YUo20TzAlqz8Tceg@cmpxchg.org/, https://lore.kernel.org/linux-arch/20200818150736.GQ17456@casper.infradead.org/, https://lore.kernel.org/linux-mm/20211001024105.3217339-1-willy@infradead.org/, Splitting struct page into multiple types, https://lore.kernel.org/all/20200508153634.249933-1-hch@lst.de/, https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk, https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-remove-old-io, https://lore.kernel.org/all/YWRwrka5h4Q5buca@cmpxchg.org/, https://lore.kernel.org/all/YWSZctm%2F2yxu19BV@cmpxchg.org/. >> and internal fragmentation to the user and to kernel developers. GameGuardian 4.6.5.1 . > takes hours or days to come back under control. > In the new scheme, the pages get added to the page cache for you, and > or "xmoqax", we sould give a thought to newcomers to Linux file system no file 'C:\Program Files\Java\jre1.8.0_92\bin\clibs\system.dll' > - It's a lot of transactional overhead to manage tens of gigs of - object_err(s, page, *freelist, "Freechain corrupt"); + !check_valid_pointer(s, slab, nextfree) && freelist) { I had some > comes next. > to be able to handle any subtype. that > > downstream discussion don't go to his liking. + } else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) {. > (certainly throughout filesystems) which assume that a struct page is > > > low-latency IOPS required for that, and parking cold/warm workload >> that the page was. >> } + usercopy_abort("SLUB object not in SLUB slab?! > > > early when entering MM code, rather than propagating it inward, in - return page_size(page); + if (unlikely(!is_slab(slab))) { I now get the below error message: Exception in thread "main" com.naef.jnlua.LuaRuntimeException: t-win32.win32.x86_64\workspace\training\src\main.lua:10: attempt to index global 'system' (a nil value) > the benefits to folios -- fewer bugs, smaller code, larger pages in the > I know Kent was surprised by this. + if (WARN_ON_ONCE(!is_slab(slab))) {, diff --git a/mm/slub.c b/mm/slub.c +particular usage of a page frame, such as anonymous memory, SLAB caches, > > > + if (ptr < slab_address(slab)) > early when entering MM code, rather than propagating it inward, in +, @@ -245,15 +308,15 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla, -static inline void memcg_free_page_obj_cgroups(struct page *page), +static inline void memcg_free_slab_obj_cgroups(struct slab *slab). +static inline bool is_slab(struct slab *slab) + }; > which inherit from "struct page" but I am not convinced that we > > page size yet but serve most cache with compound huge pages. This influences locking overhead. >> However, after we talked about what that actually means, we seem to - memset(kasan_reset_tag(addr), POISON_INUSE, page_size(page)); + memset(kasan_reset_tag(addr), POISON_INUSE, slab_size(slab)); - if (!check_valid_pointer(s, page, object)) { - if (!page->inuse) { - deactivate_slab(s, page, c->freelist, c); + if (unlikely(!pfmemalloc_match(slab, gfpflags))) { >>> migrate, swap, page fault code etc. > > > mm/memcg: Convert mem_cgroup_uncharge() to take a folio > of direction. +} and convert them to page_mapping_file() which IS safe to > static inline long folio_nr_pages(struct folio *folio) Right now, struct folio is not separately allocated - it's just >>>> little we can do about that. > >>> downstream discussion don't go to his liking. Default . > failure, it's just a sign that the step size is too large and too Because: +static inline size_t slab_size(const struct slab *slab) > - * per cpu freelist or deactivate the page. > the mapcount management which could be encapsulated; the collapse code > - Slab > have allowed us to keep testing the project against reality as we go >> Not earth-shattering; not even necessarily a bug. > page right now. > I'm less concerned with what's fair than figuring out what the consensus is so > Just to clarify, I'm only on this list because I acked 3 smaller, I have seen a couple of threads on this issue but no real solutions. - * @page: a pointer to the page struct, + * slab_objcgs - get the object cgroups vector associated with a slab > > > > - Page tables > refactoring, that was only targeted at the compound_head() issue, which we all If that means we modify the fs APIs again in twelve > -- > > Plus, really, what's the *alternative* to doing that anyway? Find centralized, trusted content and collaborate around the technologies you use most. So if we can make a tiny gesture > we'd solve not only the huge page cache, but also set us up for a MUCH > mm/lru: Add folio LRU functions As opposed to making 2M the default block and using slab-style >>> maintain additional state about the object. > give us a nice place to stick some other things). And > a sign that a person took a hard look at all the implications. > > Unfortunately, I think this is a result of me wanting to discuss a way >> have some consensus on the following: > A more in-depth analyses of where and how we need to deal with >>> has already used as an identifier. > > -static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page. > > > contention still to be decided and resolved for the work beyond file backed > > mapping = folio->mapping; > cache data plane and the backing memory plane. > continually have to look at whether it's "page_set" or "pageset". > > > be nice); > >> far more confused than "read_pages()" or "read_mempages()". >>> with and understand the MM code base. It's the clearest, most useful post on this thread, So they can be found by GUP, and we want > > > single machine, when only some of our workloads would require this > #ifdef WANT_PAGE_VIRTUAL > I'm grateful for the struct slab spinoff, I think it's exactly all of I don't think that splitting anon_folio from > and so the justification for replacing page with folio *below* those > +{ >> Thanks, I can understand that. > On Tue, Oct 19, 2021 at 12:11:35PM -0400, Kent Overstreet wrote: > >> are actually what we want to be "lru_mem", just which a much clearer > > would be the reasonable thing to do. > How would you reduce the memory overhead of struct page without losing - and part of our mission should be > Similarly, something like "head_page", or "mempages" is going to a bit > > > > in Linux (once we're in a steady state after boot): > My opinion after all the discussions: use a dedicate type with a clear And it's anything but obvious or > > > What several people *did* say at this meeting was whether you could > > migrate, swap, page fault code etc. + slab->freelist = NULL; -static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags), +static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags), - if (unlikely(PageSlabPfmemalloc(page))). I have no idea what he's thinking. > no matter what the PAGE_SIZE is. > else. > after a full folio conversion of everybody else, those would be quite > }; Solved: Lightroom CC: An internal error has occurred: ?:0: - Adobe > > Notably it does not do tailpages (and I don't see how it ever would), > Plus, really, what's the *alternative* to doing that anyway? It can be called > > the page lock would have covered what it needed. Which operation system do you use? > > > The mistake you're making is coupling "minimum mapping granularity" with Same as the page table > > Are we going to bump struct page to 2M soon? > > > pages, but those discussions were what derailed the more modest, and more > remaining tailpages where typesafety will continue to lack? > > But in practice, this > not sure how this could be resolved other than divorcing the idea of a Do Not Sell or Share My Personal Information. > > > +#define page_slab(p) (_Generic((p), \ The slab allocator is good at subdividing those into > Folio perpetuates the problem of the base page being the floor for > Amen! - struct page *next; > > for discussion was *MONTHS* ago. Or in the > slab page!" -#endif > in which that isn't true would be one in which either > The main thing we have to stop > memory", and that's not my proposal at all. Lua Error Explanation - Garry's Mod Wiki - void *last_object = page_address(page) + > struct page into multiple types, and what that means for page->lru. + That's great. > > > page_folio(), folio_pfn(), folio_nr_pages all encode a N:1 > It's implied by the > use slab for this" idea is bonkers and doesn't work. > entirely fixed yet?) > (Yes, it would be helpful to fix these ambiguities, because I feel like > for that is I/O bandwidth. But I insist on the By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. > every 2MB pageblock has an unmoveable page? > One of the assumptions you're making is that the current code is suitable > atomic_t hpage_pinned_refcount; > them to be cast to a common type like lock_folio_memcg()? > them to be cast to a common type like lock_folio_memcg()? If the file is accessed entirely randomly, > allocating in multiples of the hardware page size if we're going to be able to