Big_Stat aims to develop the use of new and recent administrative and survey datasets as they become available for population research.  The project has three main components.

Research. A growing volume of Big Data is becoming available for research. Our knowledge of specific family forms that are difficult to observe through surveys conducted by the national research and statistical agencies can be expanded by using and linking diverse data sources. First, the assignment of each inhabitant to one and only one dwelling is complex for people who share their time between two usual residences. This leads to omissions and double counts in the census, and a partial and biased image of family forms. Research will first focus on two family situations: children of separated parents who share their time between their mother’s and father’s homes, and young adults who have only partially left the parental home or live with other young adults in “complex households” without any precise information on their partnership status or family situation. These situations will be analysed from a large set of available data, ranging from quantitative surveys and qualitative interviews to census and administrative data (tax and welfare benefits data). INSEE’s demographic panel, which includes a wide set of data on a very large population sample, provides a means to validate each of the data sources and to analyse their strengths and shortcomings.  

Data dissemination and documentation. The second component of the project concerns the dissemination of available information on new datasets. For each source, specific web pages provide a brief presentation of the source, data documentation and analysis tools: variables constructed for specific research projects, and data quality assessments based on users’ experience. All materials are available and reusable. This site is under construction, and any remarks or contributions are welcome. If you are using an interesting data source, do not hesitate to propose a set of dedicated web pages, with the following headings: “presentation, procedures for access, users’ feedback, constructed variables”. Via this project, we can help you to make this information visible.  Our aim is to produce input manuals for many new datasets that are useful for research, both in France and abroad.

Training. Lastly, user training will be organised for students and researchers wishing to use these data. 

Project’s team

  • Laurent Toulemon : Responsible for the project
  • Martyna Wojcik : Agreements and contracts
  • Annie Carré : Mini websites’ setting-up and management
  • Sorya Le-Forestier : Secretarial work and project administration
  • Giulia Ferrari : General administration, Big_Stat’s and other sources’ website management, mailing list management
  • Benjamin Marteau : EDP’s website management and participation to the project administration

    The full particpants’ list is available on the project submitted to ANR downlodable on this page (on the right)

The project is funded by the French National Research Agency